High-Performance GPU Acceleration—Part 2: Offload Performance
Developers who deploy applications across CPUs to GPUs are often challenged to find the best methods for analyzing and optimizing offload performance.
In Part 2 of this webinar series, technical consulting engineer Kevin O’Leary focuses on tuning software for optimal performance once hardware is available. He uses Intel® VTune™ Profiler, a performance analyzer that takes the guesswork out of cross-architecture improvements. (Part 1 of this series focuses on designing software for efficient offload even before hardware is available.)
Using a sample application written in Data Parallel C++ (DPC++), Kevin demonstrates how Intel VTune Profiler can:
- Profile DPC++, OpenMP* offload, and code running on host and GPU processors
- Collect the right data and turn it into rich, interpretable analysis
- Identify the hot spots in your compute kernels, including which are key areas for optimization
- Show how the GPU resources are being used and locate hardware bottlenecks
Get the Software
- Get Intel VTune Profiler as part of the Intel® oneAPI Base Toolkit—a foundational set of tools and libraries for developing high-performance, data-centric applications across diverse architectures.
- Get the stand-alone version of Intel VTune Profiler.
Resources
- Sign up for an Intel® DevCloud for oneAPI account—a free development sandbox with access to the latest Intel® hardware and oneAPI software.
- Explore oneAPI including developer opportunities and benefits.
- Subscribe to Code Together— an interview series that explores the challenges at the forefront of cross-architecture development. Each biweekly episode features industry VIPs who are blazing new trails through today's data-centric world. Available wherever you get your podcasts.