Programming of heterogeneous platforms requires a deep understanding of system architecture on all levels, which helps application design to take advantage of the best data and work decomposition between CPUs and accelerating hardware like GPUs. However, in many cases the applications are being converted from a conventional CPU programming language (like C++) or from an accelerator-friendly but still low-level language (like OpenCL™ code). The main problem is to determine which part of the application benefits from being offloaded to a GPU. Another problem is to estimate how much performance increase one might gain due to the acceleration in the particular GPU device. Each platform has its unique limitations that affect the performance of offloaded computing tasks, for example: data transfer tax, task initialization overhead, memory latency, and bandwidth limitations. To take into account these constraints, software developers need tools to collect the right information and produce recommendations to make the best design and optimization decisions.

This presentation introduces two new GPU performance analysis types in Intel® VTune™ Profiler, and a methodology of heterogeneous applications performance profiling supported by the analyses. Intel VTune Profiler is an established tool for performance characterization on CPUs. It includes GPU offload analysis and GPU hot spot analysis of applications, written on most offloading models with OpenCL code, SYCL* (Data Parallel C++), and OpenMP* Offload.

Vladimyr Tsmbal

Senior technical consulting engineer, Intel Corporation

Vladimir specializes in teaching customers how to use various Intel® Software Development Tools to develop, tune, and optimize their parallel applications on Intel® architecture. In particular, his focus is on the Intel® Parallel Studio XE product suite and the analysis tools it contains, including Intel VTune Profiler (which he helped develop), Intel® Advisor, and Intel® Inspector.

Prior to joining Intel in 2005, Vladimir worked as a research assistant, and developed hardware graphics accelerators and software and hardware systems for medical diagnostics. He holds a PhD in mathematics and computer science from Taganrog State University of Radio Engineering, Russia.



Intel® VTune™ Profiler

Find and optimize performance bottlenecks across CPU, GPU, and FPGA systems. Part of the Intel® oneAPI Base Toolkit.

Download the Base Toolkit

See All Tools