FPGA vs. GPU for Deep Learning

FPGAs are an excellent choice for deep learning applications that require low latency and flexibility

Artificial intelligence (AI) is evolving rapidly, with new neural network models, techniques, and use cases emerging regularly. While there is no single architecture that works best for all machine and deep learning applications, FPGAs can offer distinct advantages over GPUs and other types of hardware in certain use cases.

What Is an FPGA?

Field programmable gate arrays (FPGAs) are integrated circuits with a programmable hardware fabric. Unlike graphics processing units (GPUs) or ASICs, the circuitry inside an FPGA chip is not hard etched—it can be reprogrammed as needed. This capability makes FPGAs an excellent alternative to ASICs, which require a long development time—and a significant investment—to design and fabricate.

The tech industry adopted FPGAs for machine learning and deep learning relatively recently. In 2010, Microsoft Research demonstrated one of the first use cases of AI on FPGAs as part of its efforts to accelerate web searches.1FPGAs offered a combination of speed, programmability, and flexibility—delivering performance without the cost and complexity of developing custom application-specific integrated circuits (ASICs). Five years later, Microsoft’s Bing search engine was using FPGAs in production, proving their value for deep learning applications. By using FPGAs to accelerate search ranking, Bing realized a 50 percent increase in throughput.1

Why Choose an FPGA for Deep Learning?

Early AI workloads, like image recognition, relied heavily on parallelism. Because GPUs were specifically designed to render video and graphics, using them for machine learning and deep learning became popular. GPUs excel at parallel processing, performing a very large number of arithmetic operations in parallel. In other words, they can deliver incredible acceleration in cases where the same workload must be performed many times in rapid succession.

However, running AI on GPUs has its limits. GPUs don’t deliver as much performance as an ASIC, a chip purpose built for a given deep learning workload.

FPGAs offer hardware customization with integrated AI and can be programmed to deliver behavior similar to a GPU or an ASIC. The reprogrammable, reconfigurable nature of an FPGA lends itself well to a rapidly evolving AI landscape, allowing designers to test algorithms quickly and get to market fast. FPGAs offer several advantages for deep learning applications and other AI workloads:

Great performance with high throughput and low latency: FPGAs can inherently provide low latency as well as deterministic latency for real-time applications like video streaming, transcription, and action recognition by directly ingesting video into the FPGA, bypassing a CPU. Designers can build a neural network from the ground up and structure the FPGA to best suit the model.
Excellent value and cost: FPGAs can be reprogrammed for different functionalities and data types, making them one of the most cost-effective hardware options available. Furthermore, FPGAs can be used for more than just AI. By integrating additional capabilities onto the same chip, designers can save on cost and board space. FPGAs have long product life cycles, so hardware designs based on FPGAs can have a long product life, measured in years or decades. This characteristic makes them ideal for use in industrial defense, medical, and automotive markets.
Low power consumption: With FPGAs, designers can fine-tune the hardware to the application, helping meet power efficiency requirements. FPGAs can also accommodate multiple functions, delivering more energy efficiency from the chip. It’s possible to use a portion of an FPGA for a function, rather than the entire chip, allowing the FPGA to host multiple functions in parallel. 

AI and Deep Learning Applications on FPGAs 

FPGAs can offer performance advantages over GPUs when the application demands low latency and low batch sizes—for example, with speech recognition and other natural language processing workloads. Due to their programmable I/O interface and highly flexible fabric, FPGAs are also well suited to the following tasks:

Overcoming I/O bottlenecks. FPGAs are often used where data must traverse many different networks at low latency. They’re incredibly useful at eliminating memory buffering and overcoming I/O bottlenecks—one of the most limiting factors in AI system performance. By accelerating data ingestion, FPGAs can speed the entire AI workflow.
Integrating AI into workloads. Using FPGAs, designers can add AI capabilities, like deep packet inspection or financial fraud detection, to existing workloads.
Enabling sensor fusion. FPGAs excel when handling data input from multiple sensors, such as cameras, LIDAR, and audio sensors. This ability can be extremely valuable when designing autonomous vehicles, robotics, and industrial equipment.
Providing acceleration for high performance computing (HPC) clusters. FPGAs can help facilitate the convergence of AI and HPC by serving as programmable accelerators for inference. 2
Adding extra capabilities beyond AI. FPGAs make it possible to add security, I/O, networking, or pre-/postprocessing capabilities without requiring an extra chip.

Intel® FPGA Software and Hardware

One of the few hurdles to overcome when using FPGAs is that the hardware typically requires specialized programming expertise. Intel is reducing the amount of expertise needed with a software-based programming model. This higher-level FPGA programming model allows a data scientist or model developer to create a neural network using a common AI framework—such as TensorFlow or Caffe—and deploy it on an FPGA without knowing the details of the FPGA architecture. Intel has developed several tools that make programming FPGAs much easier:

Intel® Distribution of OpenVINO™ toolkit gives computer vision developers a single tool to accelerate models across several hardware platforms, including FPGAs.
Intel® FPGA Deep Learning Acceleration Suite provides tools and optimized architectures to accelerate inference with Intel® FPGAs. It interfaces with the OpenVINO™ toolkit, offering scalability to support custom networks.
Intel® FPGA SDK for OpenCL™ software technology accelerates development by targeting both Intel® CPUs and Intel® FPGAs. Developers can leverage the unique capabilities of Intel® FPGAs to deliver acceleration with power efficiency and low latency.
Intel® FPGA deep learning technology solutions span a range of product families and software tools to help reduce development time and cost. The following hardware products are of particular value for deep learning use cases:
Intel® Stratix® 10 NX FPGA is Intel’s first AI-optimized FPGA. It embeds a new type of AI-optimized block, the AI Tensor Block, tuned for common matrix-matrix or vector-matrix multiplications.

Intel® Agilex™ FPGAs and SoCs deliver up to 40 percent higher performance3 or up to 40 percent lower power3  for applications in data center, networking, and edge compute.

Intel Portfolio for AI

As AI adoption grows, the range of applications and environments in which it runs—from endpoint devices, to edge servers, to data centers—will become incredibly diverse. No single architecture, chip, or form factor will be qualified to meet the requirements of all AI applications. Infrastructure architects must have access to their choice of architecture.

Intel offers four types of silicon enabling the proliferation of AI: FPGAs, GPUs, and ASICs for acceleration, and CPUs for general-purpose computing. Each architecture serves unique needs, so infrastructure architects can choose the exact architecture they need to support any AI application. With a breadth of compute types, optimized for power and performance, they’ll always get the right tools for the job at hand.

More Resources on Artificial Intelligence

Explore the latest technologies for deploying AI, including computer vision, machine learning, and deep learning, across a range of hardware types.

Intel® FPGAs for AI

Intel® FPGAs help enable fast-to-market, scalable, and customizable solutions.

Learn more

Intel® FPGA Technology Solutions for AI

Read about Intel® FPGA hardware, the Intel® Distribution of OpenVINO™ toolkit, and the Intel® FPGA Deep Learning Acceleration Suite.

Learn more

Intel® AI Technologies

Explore the Intel® hardware and software tools that help you seamlessly build and deploy AI applications at scale.

Learn more

Intel® technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

Intel® compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel® microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel® microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product user and reference guides for more information regarding the specific instruction sets covered by this notice. 

Product and Performance Information

3 This comparison is based on Intel® Agilex™ FPGA and SoC family vs. Intel® Stratix® 10 FPGA using simulation results and is subject to change. This document contains information on products, services, and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications, and road maps.