Intel® FPGA SDK for OpenCL™ Standard Edition: Best Practices Guide

ID 683176
Date 9/24/2018
Document Table of Contents

4. Profiling Your Kernel to Identify Performance Bottlenecks

The generates data that helps you assess OpenCL™ kernel performance. The instruments the kernel pipeline with performance counters. These counters collect kernel performance data, which you can review via the profiler GUI.

Consider the following OpenCL kernel program:

__kernel void add (__global int * a,
                   __global int * b, 
                   __global int * c)
    int gid = get_global_id(0);
    c[gid] = a[gid]+b[gid];

As shown in the figure below, the Profiler instruments and connects performance counters in a daisy chain throughout the pipeline generated for the kernel program. The host then reads the data collected by these counters. For example, in PCI Express® (PCIe®)-based systems, the host reads the data via the PCIe control register access (CRA) or control and status register (CSR) port.

Figure 62.  : Performance Counters Instrumentation

Work-item execution stalls might occur at various stages of an pipeline. Applications with large amounts of memory accesses or load and store operations might stall frequently to enable the completion of memory transfers. The Profiler helps identify the load and store operations or channel accesses that cause the majority of stalls within a kernel pipeline.

For usage information on the , refer to the Profiling Your OpenCL Kernel section of the Standard Edition Programming Guide.