Intel® VTune™ Profiler
Find and Fix Performance Bottlenecks Quickly and Realize All the Value of Your Hardware
Performance Analysis for Applications & Systems
Intel® VTune™ Profiler optimizes application performance, system performance, and system configuration for HPC, cloud, IoT, media, storage, and more.
- CPU, GPU, and FPGA: Tune the entire application’s performance―not just the accelerated portion.
- Multilingual: Profile SYCL*, C, C++, C#, Fortran, OpenCL™ code, Python*, Google Go* programming language, Java*, .NET, Assembly, or any combination of languages.
- System or Application: Get coarse-grained system data for an extended period or detailed results mapped to source code.
- Power: Optimize performance while avoiding power- and thermal-related throttling.
Download as Part of the Toolkit
Intel VTune Profiler is included in the Intel® oneAPI Base Toolkit, which is a core set of tools and libraries for developing high-performance, data-centric applications across diverse architectures.
Download the Stand-Alone Version
A stand-alone download of Intel VTune Profiler is available. You can download binaries from Intel or choose your preferred repository.
Develop in the Free Intel® Cloud
Get what you need to build and optimize your oneAPI projects for free. With an Intel® DevCloud account, you get 120 days of access to the latest Intel® hardware—CPUs, GPUs, FPGAs—and Intel® oneAPI tools and frameworks. No software downloads. No configuration steps. No installations.
Features
Algorithm Optimization
- Locate hot spots—the most time-consuming parts of your code.
- Visualize hot code paths and time spent in each function and with its callees with Flame Graph.
Microarchitecture and Memory Bottlenecks
- Identify the most significant hardware issues that affect the performance of your application with microarchitecture exploration analysis.
- Pinpoint memory-access-related issues such as cache misses and high-bandwidth problems.
Accelerators and XPUs
- Optimize GPU offload schema and data transfers for SYCL, OpenCL code, Microsoft DirectX*, or OpenMP* offload code. Identify the most time-consuming GPU kernels for further optimization.
- Analyze GPU-bound code for performance bottlenecks caused by microarchitectural constraints or inefficient kernel algorithms.
- Explore CPU and FPGA interactions, and FPGA use.
Parallelism
- Examine how efficiently the code is threaded. Identify threading issues that impact performance.
- Evaluate compute-intensive or throughput HPC applications for efficient CPU use, vectorization, and memory use.
Method for OpenMP Code Analysis
Schedule Overhead in Intel® oneAPI Threading Building Blocks Applications
Platform and I/O
- Locate performance bottlenecks in I/O-intensive applications. Explore how effectively the hardware processes I/O traffic generated by external PCIe* devices or integrated accelerators.
- See a holistic view of system behavior for long-running workloads with Platform Profiler.
- Get a fine-grained overview for short-running workloads with System Overview.
Multi-Node
- Characterize performance aspects of large-scale message passing interface (MPI) and OpenMP workloads.
- Identify scalability issues and get recommendations for in-depth analysis.
What's New in 2022
Expanded Accelerator Profiling
- Identify occupancy issues that prevent GPU code from efficiently using available execution unit (EU) threads by using the GPU Compute/Media hot spots analysis.
- Analyze data transfer to pinpoint inefficient code paths between host and device in the GPU Offload Analysis.
- Profile DirectX applications on the CPU host to determine the gaps between the API calls and the reasons causing such inefficiencies.
Hardware Support
- Get support for the 3rd generation Intel® Xeon® Scalable processors (formerly code named Ice Lake server) and 12th generation Intel® Core™ processor family (formerly code named Alder Lake)
New Profiles
- Visualize hot code paths and identify time spent in each function and its callees using the Flame Graph in Hot Spots analysis.
- Diagnose the issues that may cause your system to be overheated or consume significant power using the CPU Throttling analysis.
Better Workflow
- Gain insight into system configuration, performance, and behavior with Platform Profiler—now fully integrated into Intel VTune Profiler.
- Quickly analyze the application performance characterization with pointers to deeper analysis using Performance Snapshot.
Better Data
- Expanded I/O Analysis: Quickly visualize I/O hardware use with a platform diagram. Discover the root cause of low throughput in Intel Data Direct I/O Technology. Optimize latency using extended utilization metrics in I/O analysis. Get added support for MPI applications.
- Application Performance Snapshot: Analyze workloads at scale to identify outliers and where they occurred. Get PCIe bandwidth information in command-line reports.
For a more complete and up-to-date list, see the release notes.
Get Started
Download
Intel VTune Profiler is a part of the Intel® oneAPI Base Toolkit.
Try It Out
Follow the Get Started Guide and use an introductory code sample to see how Intel VTune Profiler works.
Learn Analysis Techniques
Browse the cookbook to see if there are recipes for analysis of your performance bottlenecks.
Documentation & Code Samples
Code Samples
Learn how to access oneAPI code samples in a tool command line or IDE.
What Customers Are Saying
“We recommend using Intel MPI for best performance, and tools such as VTune Profiler and Advisor to help better understand performance optimizations and how to best migrate your workloads to the cloud.”
— Ilias Katsardis, HPC solution lead, Google Cloud*
"Intel’s VTune Profiler [helped us] to analyze code performance and further enhance it to run optimally on our products."
— Won-Chul Bang, PhD, vice president and head of product strategy, Samsung Medison
"The Application Performance Snapshot feature of Intel® VTune™ Profiler helped us analyze HemeLB running at 96K MPI ranks on SuperMUC-NG of the Leibniz Supercomputing Centre. It was straightforward and effective in its operation and analysis output."
— Dr. Jon McCullough, University College London
“We are always looking for new methods to accelerate workloads in our data center. Our teams used Intel® VTune™ Profiler’s flame graph feature and found it intuitive to use and practical for interpreting performance data. This tool [part of the Intel® oneAPI Base Toolkit] has become essential to optimizing code and workflows, and its ability to work across Intel CPUs and GPUs adds to our productivity and performance optimization efforts."
— Dr. Markus Rampp, head of HPC Applications Division and deputy director, Max Planck Computing & Data Facility
"We rely super heavily on Intel VTune Profiler and some of the other Intel products that are our primary way to understand performance at very large scale."
— Dan Stanzione, executive director, Texas Advanced Computing Center (TACC)
Specifications
Processor:
- 3rd generation Intel® Xeon® processor family v3 (or later)
- 4th generation (or later) Intel® Core™ processor
GPUs:
- Gen9 of Intel® HD Graphics or Intel® Iris® Xe MAX graphics or newer
- Xe Architecture
FPGAs:
- Intel® Arria® 10 FPGA and Intel® Stratix® FPGA
Languages:
- SYCL*
- C and C++
- C#
- Fortran
- OpenCL code
- Google Go programming language
- Java
- Python
- .NET
Development environments:
- Windows: Microsoft Visual Studio*
- Linux: Eclipse*
- Virtual machine support: Kernel-based virtual machine (KVM), Hyper-V*, VMware*
- Container support: Docker*, Singularity*, LXC, Apache Mesos*
- Interface: Desktop or web GUI, command line
For more information, see the system requirements.
Host operating systems:
- Windows
- Linux
- macOS*
Target operating systems:
- Windows
- Linux
- FreeBSD *
- Android *
- Wind River Linux*
- Yocto Project*
Compilers:
- Intel® compilers
- Microsoft* compilers
- GNU Compiler Collection (GCC)*
Threading analysis:
- OpenMP
- Intel® oneAPI Threading Building Blocks
- Native threads
Distributed environments:
- MPI (MPICH-based, OpenMPI)
Get Help
Your success is our success. Access these support resources when you need assistance.
Related Tools
This design and analysis tool achieves high application performance through efficient threading, vectorization, and memory use, and GPU offload on current and future Intel hardware. It supports C, C++, Fortran, DPC++, OpenMP, and Python.
- Offload Advisor: Get your code ready for efficient GPU offload even before you have the hardware
- Automated Roofline Analysis: See performance headroom against hardware limitations and get insights for an effective optimization roadmap
- Vectorization Advisor: Enable more vector parallelism and get guidance to improve its efficiency
- Threading Advisor: Model, tune, and test threading design options
- Flow Graph Analyzer: Create, visualize, and analyze task and dependency-computation
Stay in the Know with All Things CODE
Sign up to receive the latest trends, tutorials, tools, training, and more to
help you write better code optimized for CPUs, GPUs, FPGAs, and other
accelerators—stand-alone or in any combination.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.