Intel® FPGA SDK for OpenCL™ Standard Edition: Programming Guide

ID 683342
Date 4/22/2019
Public
Document Table of Contents

6.3. Collecting Profile Data During Kernel Execution

In cases where kernel execution finishes after the host application completes, you can query the FPGA explicitly to collect profile data during kernel execution. The default behavior of automatic readback of profile data upon the completion of kernel execution is sufficient for most applications.
When you profile your OpenCL™ kernel during compilation, a profile.mon file is generated automatically. The profile data is then written to profile.mon after kernel execution completes on the FPGA. However, if kernel execution completes after the host application terminates, no profiling information for that kernel invocation will be available in the profile.mon file. In this case, you can modify your host code to acquire profiling information during kernel execution.
Important: Collecting profile data during kernel execution can add significant overhead to kernel executions by increasing the latency in your kernel.
To query the FPGA to collect profile data while the kernel is running, call the following host library call:
extern CL_API_ENTRY cl_int CL_API_CALL

clGetProfileInfoIntelFPGA(cl_event);

where cl_event is the kernel event. The kernel event you pass to this host library call must be the same one you pass to the clEnqueueNDRangeKernel call.

Important: If kernel execution completes before the invocation of clGetProfileInfoIntelFPGA, the function returns an event error message.
CAUTION:
Invoking the clGetProfileInfoIntelFPGA function during kernel execution disables the profile counters momentarily so that the Intel® FPGA dynamic profiler for OpenCL™ can collect data from the FPGA. As a result, you will lose some profiling information during this interruption. If you call this function at very short intervals, the profile data might not accurately reflect the actual performance behavior of the kernel.
Consider the following example host code:
int main()
{   ...
    clEnqueueNDRangeKernel(queue, kernel, ..., NULL);
    ...
    clEnqueueNDRangeKernel(queue, kernel, .. , NULL);
    ...
}

This host application runs on the assumption that a kernel launches twice and then completes. In the profile.mon file, there will be two sets of profile data, one for each kernel invocation. To collect profile data while the kernel is running, modify the host code in the following manner:

int main()
{
    ...
    clEnqueueNDRangeKernel(queue, kernel, ..., &event);

    //Get the profile data before the kernel completes
    clGetProfileInfoIntelFPGA(event);

    //Wait until the kernel completes
    clFinish(queue);

    ...
    clEnqueueNDRangeKernel(queue, kernel, ..., NULL);
    ...
}
 

The call to clGetProfileInfoIntelFPGA adds a new entry in the profile.mon file. The Intel® FPGA dynamic profiler for OpenCL™ GUI then parses this entry in the report.

For more information on the Intel® FPGA dynamic profiler for OpenCL™ , refer to the following sections:
  • Profile Your Kernel to Identify Performance Bottlenecks in the Intel® FPGA SDK for OpenCL™ Best Practices Guide
  • Profiling Your OpenCL Kernel