Use HBM2-Enabled FPGAs for 2D FFT Acceleration


FPGAs combine HBM2 memory and reconfigurable pipeline logic to efficiently perform memory access patterns that other parallel architectures might struggle with. This is demonstrated by using a 2D FFT that requires a matrix transpose. Overlap the transpose operation with a minimal-buffering computation that makes it almost free from a throughput perspective. Heavily optimized FPGA FFT can significantly benefit from reduced precision, but for easier comparison with other technologies, this example uses a floating point.

BittWare* created a 2D FFT kernel for FPGAs using OpenCL™ technology from Intel. The code was rewritten for the Intel® Stratix® 10 FPGA 520N-MX card to take advantage of the Intel oneAPI programming model, specifically its SYCL* programming language.

The peak HBM2 performance on an Intel® Stratix® 10 FPGA for a batch one implementation, with two independent 2D FFT kernels in the same device, is 291 gigabytes per second. When pipelining or batching, a peak bandwidth of 337 gigabytes per second is possible.

The key benefit for using high-level tools is the significant reduction in development time.



Richard Chamberlain started his career at MBDA UK before joining Nallatech in 2001. For the last 20 years he has pioneered using FPGAs for HPC and is a trusted industry expert in the field of heterogenous acceleration. Richard currently works as a principal systems engineer in the applications team at BittWare, a part of Molex.