Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 9/26/2022

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

6.3. Triggering Collection Profiling Data During Kernel Execution

The Intel® FPGA dynamic profiler for OpenCL™ can be used to collect performance information from the hardware when the design is executed. For instructions about how to add the profiler to your hardware design and how to view the collected data, refer to Chapter 5 of Intel® FPGA SDK for OpenCL™ Best Practices Guide.

In cases where kernel execution finishes after the host application completes and temporal profiling is disabled, you can query the FPGA explicitly to collect profile data during kernel execution.

Note: All information in this section are written with the assumption that temporal profiling is not enabled. For more information about temporal profiling, refer to Temporal Performance Collection in the Intel® FPGA SDK for OpenCL™ Best Practices Guide.
Tip: For oneAPI SYCL-specific instructions, refer to Intel® FPGA Dynamic Profiler for DPC++ topic in the FPGA Optimization Guide for Intel® oneAPI Toolkits.
When you profile your OpenCL™ kernel during compilation, a profile.mon file is generated automatically. The profile data is then written to profile.mon after kernel execution completes on the FPGA. However, if kernel execution completes after the host application terminates, no profiling information for that kernel invocation is available in the profile.mon file. In this case, you can modify your host code to acquire profiling information during kernel execution.
Important: Collecting profile data during kernel execution may add overhead to kernel executions by increasing the latency in your kernel.
To query the FPGA to collect profile data while the kernel is running, call the following host library call:
extern CL_API_ENTRY cl_int CL_API_CALL


where cl_event is the kernel event. The kernel event you pass to this host library call must be the same one you pass to the clEnqueueNDRangeKernel call.

  • If kernel execution completes before the invocation of clGetProfileInfoIntelFPGA, the function returns an event error message.
  • Host programs that use clGetProfileInfoIntelFPGA and clGetProfileDataDeviceIntelFPGA function calls must include the CL/cl_ext_intelfpga.h header file.
Invoking the clGetProfileInfoIntelFPGA function during kernel execution disables the profile counters momentarily so that the Intel® FPGA dynamic profiler for OpenCL™ can collect data from the FPGA. As a result, you lose some profiling information during this interruption. If you call this function at very short intervals, the profile data might not accurately reflect the actual performance behavior of the kernel.
Consider the following example host code:
int main()
{   ...
    clEnqueueNDRangeKernel(queue, kernel, ..., NULL);
    clEnqueueNDRangeKernel(queue, kernel, .. , NULL);

This host application runs on the assumption that a kernel launches twice and then completes. In the profile.mon file, there are two sets of profile data, one for each kernel invocation. To collect profile data while the kernel is running, modify the host code in the following manner:

int main()
    clEnqueueNDRangeKernel(queue, kernel, ..., &event);

    //Get the profile data before the kernel completes

    //Wait until the kernel completes

    clEnqueueNDRangeKernel(queue, kernel, ..., NULL);

The call to clGetProfileInfoIntelFPGA adds a new entry in the profile.mon file.