Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 12/19/2022
Public
Document Table of Contents

5.6. Performance Data Types

The Intel® FPGA dynamic profiler for OpenCL™ provides various types of performance data and information that you can view using the Intel® VTune Profiler .

The following tables describe these information types:

Table 18.  Types of Performance Data
Column Description Access Type
Attributes Memory or channel attributes information such as memory type (local or global), corresponding memory system (DDR or quad data rate (QDR)), and read or write access. All memory and channel accesses
Stall% Percentage of time the memory or channel access is causing pipeline stalls. It is a measure of the ability of the memory or channel access to fulfill an access request. All memory and channel accesses
Occupancy% Percentage of the overall profiled time frame when a valid work-item executes the memory or channel instruction. All memory and channel accesses
Bandwidth Average memory bandwidth that the memory access uses and its overall efficiency.

For each global memory access, FPGA resources are assigned to acquire data from the global memory system. However, the amount of data a kernel program uses might be less than the acquired data. The overall efficiency is the percentage of total bytes, acquired from the global memory system, that the kernel program uses.

Global memory accesses
Channel Depth 3

Occupancy of the channel FIFO (in bytes) when the channel is not idling. This is measured in the following ways:

  • Average Channel Depth measures the average occupancy of the channel in the measured sample time-slice.
  • Maximum Channel Depth measures the fill level of the channel, indicating the maximum occupancy of the channel in the sample time-slice.
All channel accesses
Idle 3 Percentage of the overall profiled time frame when there are no valid work item executing or stalling the memory or channel instruction. All memory and channel accesses
Note: If your kernel undergoes memory optimization that consolidates hardware resources and implements multiple memory operations, statistical data might not be available for each memory operation. One set of statistical data maps to the point of consolidation in hardware.
3 Intel® VTune Profiler will show this information in a future release.