Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 9/26/2022

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

6.3.1. Profiling Autorun Kernels

Note: All information in this section are written with the assumption that temporal profiling is not enabled. For more information about temporal profiling, refer to Temporal Performance Collection in the Intel® FPGA SDK for OpenCL™ Best Practices Guide.

Unlike enqueued kernels that automatically generate profiler data on completion (if the compiler flag is set), autorun kernels never complete. Hence, you must explicitly indicate when to profile kernels by calling the clGetProfileDataDeviceIntelFPGA host library call. All profiler data is output to a profile.mon file. Data collected by the host library call is a snapshot of the autorun profile data.

Following is the code snippet of the clGetProfileDataDeviceIntelFPGA host library call:

cl_int clGetProfileDataDeviceIntelFPGA (cl_device_id device_id,
                                        cl_program program,
                                        cl_bool read_enqueue_kernels,
                                        cl_bool read_auto_enqueued,
                                        cl_bool clear_counters_after_readback,
                                        size_t param_value_size,
                                        void *param_value,
                                        size_t *param_value_size_ret,
                                        cl_int *errcode_ret);


  • read_enqueue_kernels parameter profiles enqueued kernels. In this release, this parameter has no effect.
  • read_auto_enqueued parameter profiles autorun kernels.
  • Following are the placeholder parameters for the future releases:
    • clear_counters_after_readback
    • param_value_size
    • param_value
    • param_value_size_ret
    • errcode_ret
Note: Only autorun kernels are supported by this host library call. You can enter TRUE for the read_enqueue_kernels parameter, but the boolean is ignored. This does not mean that enqueued kernels are not profiled. If the compiler profile flag is set to include enqueued kernels, the profile data is captured normally at the end of execution. The only difference is that the clGetProfileDataDeviceIntelFPGA host library call does not profile enqueued kernels in addition to the profiling already done automatically for the enqueued kernels.

The clGetProfileDataDeviceIntelFPGA host library call returns CL_SUCCESS on success. Else, it returns one of the following errors:

  • CL_INVALID_DEVICE if the device is not a valid device.
  • CL_INVALID_PROGRAM if the program is not a valid program.
The clGetProfileDataDeviceIntelFPGA host library call does not trigger a programming operation of the provided program on the provided device. If the program is not already programmed to the device at the time of the host library call, then the host library call returns CL_INVALID_PROGRAM error.
Table 9.   clGetProfileDataDeviceIntelFPGA Host Library Call Parameter Combinations
Profile only enqueued kernels
Note: Automatically outputs profile information once the execution is completed.
Profile only autorun kernels True
Profile both enqueued and autorun kernels True