Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Invoke the Profiler Runtime Wrapper to Obtain Profiling Data

After compiling your SYCL* program using the Intel® oneAPI DPC++/C++ Compiler, you can profile your FPGA design using the Profiler Runtime Wrapper. The Profiler Runtime Wrapper calls your executable and collects profile information at a given sample rate. The performance counter data is saved in a profile.mon monitor description file that the Profiler Runtime Wrapper post-processes and outputs into a readable profile.json file. You are encouraged to use the profile.json for further data processing instead of the profile.mon file. However, both are available for use after host execution completes.

To invoke the Profiler Runtime Wrapper, execute the following command:

aocl profile [options] /path/to/executable [executable options]

where:

  • [options] are any additional flags you want to pass to the wrapper. Refer to aocl profile –help for a list of options and their uses.
  • /path/to/executable is the path to the executable generated by the compiler.
  • [executable options] are any options or arguments that need to be passed along to the executable.
CAUTION:

Because of slow network disk accesses, running the host application from a networked directory might introduce delays between kernel executions. These delays might increase the overall execution time of the host application. In addition, they might introduce delays during kernel executions while the runtime stores profile output data to disk.

Split the Execution and Data Post-Processing

By default, the Profiler Runtime Wrapper automatically runs a post-processing step on your profile.mon monitor file to produce a readable profile.json file. In some situations, the post-processing step may take longer than expected. Because of this, you can choose to separate the execution and data post-processing steps into two separate manual steps. To do this, use the --no-json and --no-run <path to profile.mon file> Profiler Runtime Wrapper options.
  • The --no-json flag only runs your executable and produces a profile.mon monitor file without post-processing it.
  • The --no-run <path to profile.mon file> flag does not invoke your executable and instead just calls the post-processing step on the supplied profile.mon file.

Temporal Performance Collection

During the run of your host application, the Profiler collects performance counter data at a given sample rate n. After n cycles, the Profiler collects the performance counter data and outputs it to the profile.mon monitor file.

  • You can control the rate at which the Profiler counters are sampled by setting the Profiler Runtime Wrapper's -period flag. The specified period is the minimum number of kernel pipeline clock cycles between profiling samples. If you do not set a period, the default behavior is to profile as often as possible.
    CAUTION:

    For particularly large or long-running designs, the amount of data generated by the default temporal period might result in very large profile.mon and profile.json files. To reduce this file size, increase the sampling period or turn off temporal profiling.

  • To turn off temporal profiling and instead collect performance data only once a kernel has finished executing, you can set the Profiler Runtime Wrapper's -no-temporal flag.
    NOTE:

    If you collect the performance data only at the end of execution, the data is an average representation of the kernel's overall execution.