FPGA vs. GPU for Deep Learning

FPGAs are an excellent choice for deep learning applications that require low latency and flexibility

FPGA Deep Learning Benefits:

  • FPGAs offer incredible flexibility and cost efficiency with circuitry that can be reprogrammed for different functionalities.

  • Compared with GPUs, FPGAs can deliver superior performance in deep learning applications where low latency is critical.

  • FPGAs can be fine-tuned to balance power efficiency with performance requirements.



What Is an FPGA?

Field programmable gate arrays (FPGAs) are integrated circuits with a programmable hardware fabric. Unlike graphics processing units (GPUs) or ASICs, the circuitry inside an FPGA chip is not hard etched—it can be reprogrammed as needed. This capability makes FPGAs an excellent alternative to ASICs, which require a long development time—and a significant investment—to design and fabricate.

The tech industry adopted FPGAs for machine learning and deep learning relatively recently. In 2010, Microsoft Research demonstrated one of the first use cases of AI on FPGAs as part of its efforts to accelerate web searches.1FPGAs offered a combination of speed, programmability, and flexibility—delivering performance without the cost and complexity of developing custom application-specific integrated circuits (ASICs). Five years later, Microsoft’s Bing search engine was using FPGAs in production, proving their value for deep learning applications. By using FPGAs to accelerate search ranking, Bing realized a 50 percent increase in throughput.1

Why Choose an FPGA for Deep Learning?

Early AI workloads, like image recognition, relied heavily on parallelism. Because GPUs were specifically designed to render video and graphics, using them for machine learning and deep learning became popular. GPUs excel at parallel processing, performing a very large number of arithmetic operations in parallel. In other words, they can deliver incredible acceleration in cases where the same workload must be performed many times in rapid succession.

However, running AI on GPUs has its limits. GPUs don’t deliver as much performance as an ASIC, a chip purpose built for a given deep learning workload.

FPGAs offer hardware customization with integrated AI and can be programmed to deliver behavior similar to a GPU or an ASIC. The reprogrammable, reconfigurable nature of an FPGA lends itself well to a rapidly evolving AI landscape, allowing designers to test algorithms quickly and get to market fast. FPGAs offer several advantages for deep learning applications and other AI workloads:

Great performance with high throughput and low latency: FPGAs can inherently provide low latency as well as deterministic latency for real-time applications like video streaming, transcription, and action recognition by directly ingesting video into the FPGA, bypassing a CPU. Designers can build a neural network from the ground up and structure the FPGA to best suit the model.
Excellent value and cost: FPGAs can be reprogrammed for different functionalities and data types, making them one of the most cost-effective hardware options available. Furthermore, FPGAs can be used for more than just AI. By integrating additional capabilities onto the same chip, designers can save on cost and board space. FPGAs have long product life cycles, so hardware designs based on FPGAs can have a long product life, measured in years or decades. This characteristic makes them ideal for use in industrial defense, medical, and automotive markets.
Low power consumption: With FPGAs, designers can fine-tune the hardware to the application, helping meet power efficiency requirements. FPGAs can also accommodate multiple functions, delivering more energy efficiency from the chip. It’s possible to use a portion of an FPGA for a function, rather than the entire chip, allowing the FPGA to host multiple functions in parallel. 

AI and Deep Learning Applications on FPGAs 

FPGAs can offer performance advantages over GPUs when the application demands low latency and low batch sizes—for example, with speech recognition and other natural language processing workloads. Due to their programmable I/O interface and highly flexible fabric, FPGAs are also well suited to the following tasks:

Overcoming I/O bottlenecks. FPGAs are often used where data must traverse many different networks at low latency. They’re incredibly useful at eliminating memory buffering and overcoming I/O bottlenecks—one of the most limiting factors in AI system performance. By accelerating data ingestion, FPGAs can speed the entire AI workflow.
Integrating AI into workloads. Using FPGAs, designers can add AI capabilities, like deep packet inspection or financial fraud detection, to existing workloads.
Enabling sensor fusion. FPGAs excel when handling data input from multiple sensors, such as cameras, LIDAR, and audio sensors. This ability can be extremely valuable when designing autonomous vehicles, robotics, and industrial equipment.
Providing acceleration for high performance computing (HPC) clusters. FPGAs can help facilitate the convergence of AI and HPC by serving as programmable accelerators for inference. 2
Adding extra capabilities beyond AI. FPGAs make it possible to add security, I/O, networking, or pre-/postprocessing capabilities without requiring an extra chip.

Intel® FPGA Software and Hardware

One of the few hurdles to overcome when using FPGAs is that the hardware typically requires specialized programming expertise. Intel is reducing the amount of expertise needed with a software-based programming model. This higher-level FPGA programming model allows a data scientist or model developer to create a neural network using a common AI framework—such as TensorFlow or Caffe—and deploy it on an FPGA without knowing the details of the FPGA architecture. Intel has developed several tools that make programming FPGAs much easier:

Intel® Distribution of OpenVINO™ toolkit gives computer vision developers a single tool to accelerate models across several hardware platforms, including FPGAs.
Intel® FPGA Deep Learning Acceleration Suite provides tools and optimized architectures to accelerate inference with Intel® FPGAs. It interfaces with the OpenVINO™ toolkit, offering scalability to support custom networks.
Intel® FPGA SDK for OpenCL™ software technology accelerates development by targeting both Intel® CPUs and Intel® FPGAs. Developers can leverage the unique capabilities of Intel® FPGAs to deliver acceleration with power efficiency and low latency.
Intel® FPGA deep learning technology solutions span a range of product families and software tools to help reduce development time and cost. The following hardware products are of particular value for deep learning use cases:
Intel® Stratix® 10 NX FPGA is Intel’s first AI-optimized FPGA. It embeds a new type of AI-optimized block, the AI Tensor Block, tuned for common matrix-matrix or vector-matrix multiplications.

Intel® Agilex™ FPGAs and SoCs deliver up to 40 percent higher performance3 or up to 40 percent lower power3  for applications in data center, networking, and edge compute.

Intel Portfolio for AI

As AI adoption grows, the range of applications and environments in which it runs—from endpoint devices, to edge servers, to data centers—will become incredibly diverse. No single architecture, chip, or form factor will be qualified to meet the requirements of all AI applications. Infrastructure architects must have access to their choice of architecture.

Intel offers four types of silicon enabling the proliferation of AI: FPGAs, GPUs, and ASICs for acceleration, and CPUs for general-purpose computing. Each architecture serves unique needs, so infrastructure architects can choose the exact architecture they need to support any AI application. With a breadth of compute types, optimized for power and performance, they’ll always get the right tools for the job at hand.