Developers who deploy applications across CPUs to GPUs are often challenged to find the best methods for analyzing and optimizing offload performance.

In Part 2 of this webinar series, technical consulting engineer Kevin O’Leary focuses on tuning software for optimal performance once hardware is available. He uses Intel® VTune™ Profiler, a performance analyzer that takes the guesswork out of cross-architecture improvements. (Part 1 of this series focuses on designing software for efficient offload even before hardware is available.)

Using a sample application written in Data Parallel C++ (DPC++), Kevin demonstrates how Intel VTune Profiler can:

  • Profile DPC++, OpenMP* offload, and code running on host and GPU processors
  • Collect the right data and turn it into rich, interpretable analysis
  • Identify the hot spots in your compute kernels, including which are key areas for optimization
  • Show how the GPU resources are being used and locate hardware bottlenecks

Get the Software

  • Get Intel VTune Profiler as part of the Intel® oneAPI Base Toolkit—a foundational set of tools and libraries for developing high-performance, data-centric applications across diverse architectures.
  • Get the stand-alone version of Intel VTune Profiler.


  • Sign up for an Intel® Developer Cloud account—a free development sandbox with access to the latest Intel® hardware and oneAPI software.
  • Explore oneAPI including developer opportunities and benefits.
  • Subscribe to Code Together— an interview series that explores the challenges at the forefront of cross-architecture development. Each biweekly episode features industry VIPs who are blazing new trails through today's data-centric world. Available wherever you get your podcasts.



Intel® Developer Cloud

Get what you need to build, test, and optimize your oneAPI projects for free.

Get It Now   

See All Tools