Intel® FPGA SDK for OpenCL™ Standard Edition: Programming Guide

ID 683342
Date 4/22/2019
Document Table of Contents

10.1. Instrumenting the Kernel Pipeline with Performance Counters (-profile)

To instrument the OpenCL kernel pipeline with performance counters, include the -profile=(all|autorun|enqueued) option of the aoc command when you compile your kernel.
Attention: Instrumenting the Verilog code with performance counters increases hardware resource utilization (that is, increases FPGA area usage) and typically decreases performance.
  • To instrument the Verilog code in the <your_kernel_filename>.aocx file with performance counters, invoke the aoc -profile=(all|autorun|enqueued) <your_kernel_filename>.cl command, where:
    • all argument instruments all kernels in the <your_kernel_filename>.cl file with performance counters. This is the default option if no argument is provided.
    • autorun argument instruments only the autorun kernels with performance counters.
    • enqueued argument instruments only the non-autorun kernels with performance counters.
    • When profiling multiple, different kernels, do not use the same kernel names across different .aocx files. If the kernel names are the same, the profile data will be wrong for these kernels.
    • Regardless of the input to the clGetProfileDataDeviceIntelFPGA host library call , the Intel® FPGA dynamic profiler for OpenCL™ only profiles kernel types that you indicate during compilation.
    Profiling autorun kernels results in some hardware overhead for the counters. For large designs, the overhead can cause fmax and design frequency degradation. It can also lead to designs that cannot fit on the chip if the Intel® FPGA dynamic profiler for OpenCL™ profiles every kernel.
  • Run your host application from a local disk to execute the <your_kernel_filename>.aocx file on your FPGA. During kernel execution, the performance counters throughout the kernel pipeline collects profile information. The host saves the information in a profile.mon monitor description file in your current working directory.
    Because of slow network disk accesses, running the host application from a networked directory might introduce delays between kernel executions. These delays might increase the overall execution time of the host application. In addition, they might introduce delays between kernel launches while the runtime stores profile output data to disk.