Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Use Intel® VTune™ Profiler

To view performance data, you can upload your profile.json file to the CPU/FPGA Interaction View in the Intel® VTune™ Profiler. For more information about how to upload the file and open the correct views, refer to CPU/FPGA Interaction Analysis (Preview) in the Intel® VTune™ Profiler User Guide.

You can use the CPU/FPGA Interaction View in the Intel® VTune™ Profiler to determine performance information about your design in various graphical representations. You can view the following:

  • Summarized or average data about your SYCL* kernels.
  • A graphical representation of the overall kernel program execution process, including both host and device side events.
  • Detailed statistics about memory and pipe accesses in both a source view format and timeline format.

The following tables describe types of performance data and information available in the CPU/FPGA Interaction View in the Intel® VTune™ Profiler:

Types of Performance Data

Column

Description

Access Type

Attributes Memory or pipe attributes information such as memory type (local or global), corresponding memory system (DDR or quad data rate (QDR)), and read or write access. All memory and pipe accesses
Stall% Percentage of time the memory or pipe access is causing pipeline stalls. It is a measure of the ability of the memory or pipe access to fulfill an access request. All memory and pipe accesses
Occupancy% Percentage of the overall profiled period when a valid work-item executes the memory or pipe instruction. All memory and pipe accesses
Bandwidth Average memory bandwidth that the memory access uses and its overall efficiency. For each global memory access, FPGA resources are assigned to acquire data from the global memory system. However, the amount of data a kernel program uses might be less than the acquired data. The overall efficiency is the percentage of total bytes acquired from the global memory system that the kernel program uses. Global memory accesses