This example contains a high-performance implementation of the fundamental matrix multiplication operation and demonstrates optimizations that can be described in Open Computing Language (OpenCL™) to achieve significantly improved performance. On an algorithmic level, the kernel in this example shows how to describe loop tiling to take advantage of the data reuse inherent in the computation.
This example also demonstrates how to use loop unrolling and SIMD-style compiler optimizations to easily increase the performance of the kernel. As part of the example package, the parameters for each precompiled device binary have been chosen to maximize performance on that particular board. Additional details are available in the example package that show how easy it is to parameterize the kernel to target different performance and resource requirements.
Also, the host application is set up to automatically take advantage of multiple OpenCL devices by distributing the computation and achieving even more parallelism.
Peak Matrix Multiplication Performance
- Optimized implementation of fundamental operation
- Local memory buffering
- Compiler optimizations (loop unrolling, num_simd_work_items attribute)
- Floating-point optimizations
- Multiple device execution
The design example provides source code for the OpenCL device (.cl) as well as the host application. For compiling the host application, the Linux package includes a Makefile and the Windows package includes a Microsoft Visual Studio 2010 project.
The following downloads are provided for this example:
The use of this design is governed by, and subject to, the terms and conditions of the hardware reference design license agreement.
Software and Hardware Requirements
This design example requires the following tools:
- Intel FPGA Software v17.1 or later
- Intel FPGA SDK for OpenCL™ v17.1 or later
- On Linux: GNU Make and gcc
- On Windows: Microsoft Visual Studio 2010
To download the Intel design tools, visit the OpenCL download page. The requirements for the underlying operating system are the same as those of the Intel FPGA SDK for OpenCL.
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.
* Product is based on a published Khronos Specification, and has passed the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance.