Data is exploding at a massive pace across the globe and needs sophisticated algorithms, software, and powerful compute that can handle, manage, and provide real-time artificial intelligence (AI) inferencing. The tremendous growth in data and the need for better insights, has led to the rapid adoption of AI. A common approach to implement AI algorithms for enterprises is through using machine learning and it’s subset, deep learning. They both use huge quantities of data to train AI models and then deploy those models across various use cases, including image classification and recognition, object detection, and recommender systems among others.
When training a model, compute time can be extended across a number of hours, or even into days or weeks depending on the amount of data involved, the software stack being used, and the algorithm being trained. Conversely, time to insight from inference is equally critical. Models need to be able to arrive at a decision quickly. A solution with both high throughput and low latency allows the inference engine to ingest and process data faster, increasing response time and enabling real-time processing for unique and interactive experiences that can be applied across a number of use cases from endpoint devices to data centers.
Native Intel® FPGA capabilities for AI
Intel® Field Programmable Gate Arrays (Intel® FPGAs) are inherently designed for low latency, high throughput, power efficiency, and flexibility. Intel FPGAs make real-time inference possible by providing completely customizable hardware acceleration while retaining the flexibility to evolve with rapidly changing machine learning (ML) and deep learning (DL) models. The FPGA core architecture has many features which naturally fit with machine learning and deep learning applications:
- Highly parallel architecture: Facilitates efficient low-batch video stream processing and reduces latency
- Configurable distributed floating-point DSP blocks: FP32, FP16, FP11, INTx, bfloat – accelerates computation by tuning compute performance. Whatever you choose from lower precision integers to high precision floating point numerics, you can continue to adjust along the performance/power curve.
- Tightly-couple high-bandwidth memory: >50 TBps on chip SRAM bandwidth, random access, reduces latency, minimizes external memory access
- Programmable datapath: Reduces unnecessary data movement, improving latency and efficiency
- Adaptable and future proof: Intel FPGAs provide customizable hardware acceleration that can be programmed and tuned again and again to achieve maximum performance.
Real-time Inferencing in Action
Large CSPs need to support diverse and complex workloads, including AI, high-performance computing (HPC), and analytics among others. Offering CPU-based instances as part of their product portfolio gives CSPs a flexible, cost-effective use of architecture. Many are now looking at additional options for accelerating AI inferencing workloads and are turning to pairing Intel® Xeon® processors in their existing infrastructure with Intel FPGAs. Generally, Intel FPGAs are employed with Intel® Xeon® processors to perform efficient and easier compute acceleration.
For instance, Microsoft turned to Intel® Arria® and Stratix FPGAs to power their Bing* search engine to quickly read and analyze billions of documents across the entire web and provide the best answer to a question in less than a fraction of a second. Unlike other architectures, Intel FPGAs don’t require extensive batch calculations—enabling Microsoft to use 8- and 9-bit floating point data types to decrease model latency while also increasing model size.
The company also developed Project Brainwave – a deep learning acceleration platform for cloud customers. Deployed on Intel FPGAs, Project Brainwave allows customers to access dedicated hardware that can accelerate real-time AI calculations giving a competitive cost benefit with low latency. Microsoft makes changes on a weekly basis, rolling them out to thousands of FPGAs at once. Using this iterative approach, it developed and tested a custom 9-bit floating-point format (FP9) before settling on an 8-bit format (FP8) that doubles performance over standard INT8. To meet the needs of its data centers, Microsoft also optimized Brainwave for low latency, maintaining high efficiency even with small numbers of requests. These customizations demonstrate the advantages of using FPGAs for Deep Neural Network implementations.
And finally, Baidu and Intel recently announced that Baidu is developing a heterogeneous computing platform based on Intel’s latest FPGA technology. Intel FPGAs will accelerate performance and energy efficiency, add flexibility to data center workloads, and enable workload acceleration as a service on Baidu Cloud.
Intel FPGAs are versatile multi-function accelerators that allow maximum programming flexibility, and easier reconfiguration. The architecture is inherently parallel, offering high-performance capabilities with throughput, execution speed, and energy efficiency fine tuning.
When combined with the OpenVINOTM toolkit Intel FPGAs become an even better option to deploy and accelerate real-time inferencing solutions. Learn more about Intel AI solutions, including Intel FPGAs at the AIDC 2018 homepage. Visit the OpenVINOTM toolkit to learn how to bring your pre-trained models from Caffe* or TensorFlow* to Intel FPGAs and get started on taking your AI inference application faster to the market. And to learn more about FPGA hardware and AI, visit the Intel FPGA Web page.
Notices and Disclaimers
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.
Intel, the Intel logo, Xeon, Stratix, and Arria are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© Intel Corporation