Intel® FPGA SDK for OpenCL™ Standard Edition: Best Practices Guide
ID
683176
Date
9/24/2018
Public
1. Introduction to Standard Edition Best Practices Guide
2. Reviewing Your Kernel's report.html File
3. OpenCL Kernel Design Best Practices
4. Profiling Your Kernel to Identify Performance Bottlenecks
5. Strategies for Improving Single Work-Item Kernel Performance
6. Strategies for Improving NDRange Kernel Data Processing Efficiency
7. Strategies for Improving Memory Access Efficiency
8. Strategies for Optimizing FPGA Area Usage
A. Additional Information
2.1. High Level Design Report Layout
2.2. Reviewing the Report Summary
2.3. Reviewing Loop Information
2.4. Reviewing Area Information
2.5. Verifying Information on Memory Replication and Stalls
2.6. Optimizing an OpenCL Design Example Based on Information in the HTML Report
2.7. HTML Report: Area Report Messages
2.8. HTML Report: Kernel Design Concepts
3.1. Transferring Data Via Channels or OpenCL Pipes
3.2. Unrolling Loops
3.3. Optimizing Floating-Point Operations
3.4. Allocating Aligned Memory
3.5. Aligning a Struct with or without Padding
3.6. Maintaining Similar Structures for Vector Type Elements
3.7. Avoiding Pointer Aliasing
3.8. Avoid Expensive Functions
3.9. Avoiding Work-Item ID-Dependent Backward Branching
4.3.4.1. High Stall Percentage
4.3.4.2. Low Occupancy Percentage
4.3.4.3. Low Bandwidth Efficiency
4.3.4.4. High Stall and High Occupancy Percentages
4.3.4.5. No Stalls, Low Occupancy Percentage, and Low Bandwidth Efficiency
4.3.4.6. No Stalls, High Occupancy Percentage, and Low Bandwidth Efficiency
4.3.4.7. Stalling Channels
4.3.4.8. High Stall and Low Occupancy Percentages
7.1. General Guidelines on Optimizing Memory Accesses
7.2. Optimize Global Memory Accesses
7.3. Performing Kernel Computations Using Constant, Local or Private Memory
7.4. Improving Kernel Performance by Banking the Local Memory
7.5. Optimizing Accesses to Local Memory by Controlling the Memory Replication Factor
7.6. Minimizing the Memory Dependencies for Loop Pipelining
1. Introduction to Standard Edition Best Practices Guide
Updated for: |
---|
Intel® Quartus® Prime Design Suite 18.1 |
The Standard Edition Best Practices Guide provides guidance on leveraging the functionalities of the FPGA Software Development Kit (SDK) for OpenCL™ 1 Standard Edition to optimize your OpenCL 2 applications for FPGA products.
This document assumes that you are familiar with OpenCL concepts and application programming interfaces (APIs), as described in the OpenCL Specification version 1.0 by the Khronos Group™. It also assumes that you have experience in creating OpenCL applications.
To achieve the highest performance of your OpenCL™ application for FPGAs, familiarize yourself with details of the underlying hardware. In addition, understand the compiler optimizations that convert and map your OpenCL application to FPGAs.
For more information on the OpenCL Specification version 1.0, refer to the OpenCL Reference Pages on the Khronos Group website. For detailed information on the OpenCL APIs and programming language, refer to the OpenCL Specification version 1.0.
Section Content
FPGA Overview
Pipelines
Single Work-Item Kernel versus NDRange Kernel
Multi-Threaded Host Application
1 The is based on a published Khronos Specification, and has passed the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance.
2 OpenCL and the OpenCL logo are trademarks of Apple Inc. and used by permission of the Khronos Group™.