Developers who deploy applications across CPUs to GPUs are often challenged to find the best methods for analyzing and optimizing offload performance.
In Part 2 of this webinar series, technical consulting engineer Kevin O’Leary focuses on tuning software for optimal performance once hardware is available. He uses Intel® VTune™ Profiler, a performance analyzer that takes the guesswork out of cross-architecture improvements. (Part 1 of this series focuses on designing software for efficient offload even before hardware is available.)
Using a sample application written in Data Parallel C++ (DPC++), Kevin demonstrates how Intel VTune Profiler can:
Profile DPC++, OpenMP* offload, and code running on host and GPU processors
Collect the right data and turn it into rich, interpretable analysis
Identify the hot spots in your compute kernels, including which are key areas for optimization
Show how the GPU resources are being used and locate hardware bottlenecks
Get the Software
Get Intel VTune Profiler as part of the Intel® oneAPI Base Toolkit—a foundational set of tools and libraries for developing high-performance, data-centric applications across diverse architectures.
Sign up for an Intel® Developer Cloud account—a free development sandbox with access to the latest Intel hardware and oneAPI software.
Explore oneAPI including developer opportunities and benefits.
Subscribe to Code Together—an interview series that explores the challenges at the forefront of cross-architecture development. Each biweekly episode features industry VIPs who are blazing new trails through today's data-centric world. Available wherever you get your podcasts.