What's New in
- GPU Accelerators
- Support for Unified Shared Memory extension of OpenCL™ APIWhen you use the GPU Offload analysis type to profile OpenCL™ applications, you can now profile the CPU-side stacks for GPU computing tasks and identify bottlenecks related to Unified Shared Memory (USM) for the OpenCL™ API .
- Platform Support
- Support for Legacy ProcessorsVTune Profilernow supports the following generations of processors in client and server platforms:
Starting with this release,VTune Profilerdoes not support processors older than the versions listed above. To analyze performance on older processors, use an older version ofVTune Profiler.
- Server CPUs:Intel® Xeon® processor v3 and newer families.
- Client CPUs:Intel® Xeon® 4thgeneration processors and newer families.
- HPC Performance Characterization Analysis
- Better Hardware ObservabilityThis release adds thePlatform Diagramto the Summary tab of the HPC Performance Characterization analysis result. ThePlatform Diagramreveals system topology, utilization metrics for physical cores, DRAM, and Intel® Ultra Path Interconnect (Intel® UPI) links.Available for server platforms based on Intel® microarchitecture code named Skylake and newer.
- VTune ProfilerServer
- New Command-Line Options for ConvenienceThevtune-backendbinary that launchesVTune ProfilerServer features new command-line options to make setup in certain environments more convenient. You can now specify a base URL thatVTune ProfilerServer will use as the basis for URL generation. Additionally, new options were added to suppress automatic help tours on startup and to provide/decline consent to collect usage information right from the command line.These new options can be especially useful if you are runningVTune ProfilerServer inside a container.
- More Information on Windows*
- Support for Debug Information For Inline FunctionsVTune Profileris now capable of reading debugging information for inline functions from PDB symbol files on Windows* OS.VTune Profilercan now display names and source code for inline functions in your workload.
- Hardware Support
Families of Intel® Xegraphics products starting with Intel® Arc™ Alchemist (formerly DG2) and newer generations feature GPU architecture terminology that shifts from legacy terms. For more information on the terminology changes and to understand their mapping with legacy content, see GPU Architecture Terminology for Intel® XeGraphics.
- Support for First Generation of Intel® Arc™ High-performance Discrete GPUsThis release ofIntel® VTune™supports the first generation of Intel® Arc™ high-performance discrete GPUs code named Alchemist, and previously known as DG2. The support includes:Profiler
- Explicit support for DPC++, DirectX, Intel® Media SDK, OpenCL™, and OpenMP offload software technologies.
- Support for multi-GPU systems. You can now profile all Intel GPU devices, including integrated and discrete GPUs.
- Support for GPU Offload and GPU Hotspots analyses, including source level in-kernel profiling.
- Input and Output Analysis
- Intel® VT-d ObservabilityIntel® Virtualization Technology for Directed I/O (Intel® VT-d) observability is introduced in the Input and Output analysis for server platforms based on 3rd Gen Intel® Xeon® Scalable processors (code named Ice Lake), the Intel Atom® P5900 Processor Family (code named Snow Ridge), and newer. New performance metrics reveal efficiency of hardware-driven DMA addresses remapping and penalties for sub-optimal Intel VT-d utilization.
- Managed Code Targets
- .NET 6 SupportThis release introduces support for analyzing .NET 6 targets using User-Mode Sampling. You can analyze .NET 6 workloads in Launch Application and Attach to Process modes on both Windows* and Linux* hosts.
- Application Performance Snapshot
- Histograms in Metric TooltipsThe metric tooltips in Application Performance Snapshot HTML reports were enhanced with histograms that clearly visualize the distribution of metric values observed during analysis.
- Operating System Support
- New Host Operating SystemsThis release introduces support for these OS hosts:
- Microsoft Windows* 11
- Ubuntu* 21.10
- Algorithm Group
- Flame Graph View in Hotspots AnalysisThis version ofVTune Profilerintroduces support forFlame Graphsin theHotspotsanalysis type. The Hotspots by CPU Utilization viewpoint has been enhanced with a Flame Graph window that displays a graphical view of hot code paths. Use flame graphs to analyze the time spent on each function and its callee functions.
- Input and Output
- Platform DiagramThis release introduces thePlatform Diagram, a new starting point for the Input and Output analysis. It reveals system topology and high-level utilization metrics for hardware resources including PCIe devices, Intel® Ultra Path Interconnect, and memory. It enables you to examine the utilization of your hardware at a glance.This feature is enabled for 1st and 2nd Generation Intel® Xeon® Scalable Processors in up to four-socket configurations, excluding the Intel® Xeon® Platinum 9200 series processors code named Cascade Lake AP. This feature is also supported on Intel Atom® Processors P Series code named Snow Ridge.
- Intel® Data Direct I/O TechnologyIntel® Data Direct I/O (Intel DDIO) utilization efficiency metrics are extended with average Inbound PCIe read/write latency and core/IO contention indicator.
- Linux Perf* CapabilitiesIt is now possible to perform Linux perf-based data collection without root access on 1st and 2nd Generation Intel Xeon® Scalable Processors on Linux kernel versions 5.10 and newer.
- Platform Analyses
- VTune- Platform Profiler as Analysis TypeProfilerVTune– Platform Profiler has been completely integrated intoProfilerVTuneas an analysis type. Platform Profiler is now fully available as an analysis from the GUI or command line ofProfilerVTune. For more information, see Platform Profiler Analysis.Profiler
- CPU Throttling Data in System Overview AnalysisThe System Overview analysis now displays information about factors that can cause the CPU to throttle. Use this information to examine if your system is overheated or consumes significant power, both of which could result in frequency drops that affect system performance.
- Microarchitecture Analyses
- Platform Diagram in Memory Usage View
The platform diagram is available for:
- System topology
- Utilization metrics for DRAM
- Intel® UPI links
- Physical cores
- All client platforms
- Server platforms based on Intel® microarchitecture code name Skylake, with up to four sockets.
- Analysis Targets
- .NET 5 WorkloadsThis release introduces support for running the Hotspots analysis on .NET 5 targets in Launch Application mode when using hardware event-based sampling.
- Extended Support for .NET 5 WorkloadsYou can now analyze .NET 5 workloads in theAttach to Processmode when you use Hardware Event-Based Sampling.
- FreeBSD* OS
- Input and Output Analysis on FreeBSDYou can now run the Input and Output analysis on remote FreeBSD targets. Analysis scope is limited to platform-level metrics.
- Code AnnotationThe Instrumentation and Tracing Technology API (ITT API) is now fully supported on FreeBSD OS. The appropriate header and library files are provided as part of the FreeBSD target package. You can use ITT API to annotate your code and collect arbitrary statistics with little to no overhead.
- Support for Unified Shared Memory WorkloadsStarting with the 2021.8 release, you can profile OpenCL, SYCL, and DPC++ applications that use Unified Shared Memory (USM) workloads. For OpenCL applications, this release also supports explicit data transfer of the buffer as Unified Shared Memory.
- GPU Accelerators
- Source-level analysis for DPC++ and OpenMP applications running on GPU over Level ZeroThe following modes in GPU Compute/Media Hotspots analysis are now available when profiling Level Zero applications:Support also includes full-scale analysis of the kernel source per code line, including Source/Assembly mapping.
- Advanced Data Transfer Information in GPU Offload AnalysisThe following additions to the Graphics window clarify better the data transfer between CPU host and GPU device when you run GPU profiling analyses:
- Allocation time information displays as part of total time by device operation.
- Data Transferred table has been renamed as Transfer Size table. Columns under Transfer Size feature new names for data transferred between host and device.
- Highlights and tool tips for workloads with sub-optimal offload schemes direct your attention to improve offload schema where necessary.
- Improved Tooltips for Occupancy Metrics in GPU AnalysisThe GPU Compute/Media Hotspots Analysis has been enhanced to detect factors that limit peak achievable occupancy for the hottest computing tasks that make the EU array idle when waiting for the scheduler. Improved tooltips for occupancy metrics now provide information about peak occupancy and bounding reasons for existing computing task launch configuration.
- GPU Analysis Coverage for Self-CheckCoverage of checks by the self-check functionality in VTune Profiler now includes GPU analyses as well. Run vtune-self-checker.sh script on Windows and Linux systems to check for the GPU Compute/Media Hotspots Analysis in source analysis and characterization modes when you run DPC++ applications on an Intel GPU. You must install the Intel® oneAPI Base Toolkit for this purpose.
- CPU Context for GPU Execution in GPU Offload AnalysisThe GPU Offload analysis now presents a richer set of information about execution on the GPU by including context from the CPU. This includes stack information on:
- Data transfer from host to device
- Data transfer from device to host
- Analysis of Multiple GPUsWhen you have multiple GPUs connected to your system, you can now analyze all of the GPUs collectively with the GPU Offload and GPU Compute/Media Hotspots analyses. Previously, you could analyze a single GPU at a time afterVTune Profileridentified all the GPUs connected to the system. When you run these analyses on all connected GPUs, see analysis information about each GPU in theSummarywindow. Full compute set inCharacterizationmode is not available in multi-adapter and multi-tile analysis.
- Hottest CPU Tasks in GPU Offload AnalysisTheSummaryview in the GPU Offload analysis now includes theHottest Host Taskstable, which displays the most active tasks running on the CPU. Use this table to examine the overhead on the host. Click on a performance-critical task to see more information in the Graphics window, where results are grouped by host Task Type.
- Support for Affinity MaskIf you use theZE_AFFINITY_MASKvariable to bind your workload to a single tile,VTune Profilercan then attribute kernels to the correct tile and also display relevant metrics per kernel.
- Host-GPU Bandwidth Information in GPU Offload AnalysisPreviously, you checked theAnalyze memory bandwidthoption in the GPU Offload analysis to see data required for this computation. Starting with this release ofVTune Profiler, you can use theAnalyze host-GPU bandwidthoption instead. Depending on your hardware configuration, this selection displays DRAM bandwidth, PCIe bandwidth, or both sets of data on the timeline.
- PCIe Bandwidth Information in Custom and Command Line Runs of GPU Offload AnalysisUse new options to collect information about PCIe bandwidth (between the host and GPU sides) when you run custom and command line runs of the GPU Offload analysis:
- In the UI, check theAnalyze host-GPU PCIe bandwidthoption for custom analysis.
- Improvements to Peak Occupancy MetricTheGPU Peak Occupancymetric for a computing task now flags the factors that limit peak occupancy in the order of priority. Start tuning your application by addressing the most restricting factor.VTune Profilercustomizes recommendations for potential improvements based on the launch parameters of the compute kernel (work size, SLM and barriers usage).
- Enhancements to GPU Offload SummaryThe Summary window of the GPU Offload analysis contains these enhancements for an improved user experience:
- Locate hotspots in your function when the GPU is not busy. See the newTop hotspots when GPU was idletable in theGPU Time, % of Elapsed Time(formerlyGPU Utilization) section.
- TheHottest Computing Functionssection now includes occupancy information.
- Support to Trace DirectX* API on CPU HostThis release ofVTune Profilerintroduces support to profile DirectX applications on the CPU host. These versions of the DirectX API can be traced:
- Direct3D 11
- Direct3D 12
- Hardware Support
- Support for Intel® Atom® ProcessorsSupport for Intel Atom® Processor P Series code named Snow Ridge, including Hotspots, Microarchitecture Exploration, Memory Access, and Input and Output analyses.
- Support for 3rd Gen Intel® Xeon® Scalable Processor ArchitectureThis releases supports the 3rd Gen Intel® Xeon® Scalable processor architecture (code named Ice Lake Server) .
- IDE Support
- Support for Microsoft Visual Studio* 2022This release introduces support for the integration ofVTune Profilerinto Microsoft Visual Studio 2022.
- Application Performance Snapshot
- Metric tooltips in HTML reportsMetric tooltips in APS HTML reports now present a more holistic view of metrics and their properties. The new tooltips present a compact yet comprehensive overview of a metric, which helps you to better understand the importance of metrics in performance analysis. This change includes a visual bar that indicates where the metric value stands in terms of current performance and tuning potential.
- PCIe bandwidth info in CLI reportsAPS command line reports now include PCIe bandwidth metrics. This data is only available on server platforms when using the Sampling Driver.
- New reports and filtersAPS now features the following new types of reports and filters:
- Node topology report: view relations between ranks, nodes, and PCIe devices.
- Metrics report: get a configurable table that displays any collected metric for each rank, node, or device.
- Ability to filter data by node.
- Outlier DetectionThis release introduces a mechanism for the detection of outliers, or individual metric values contributing to an average metric that differ significantly from the overall distribution or break a certain threshold. Outliers can cause imbalance and distort average metric values. You can now see outliers in both HTML and CLI reports, with attribution to specific rank or node where an outlier occurred.
- Metric Tooltip EnhancementsMetric tooltips now visualize ranges of average metrics, with their minimum, maximum, and average contributing values.
- User Interface
- Main Vertical ToolbarThis release introduces a new main vertical toolbar to enhance your user experience. All controls previously located in the main horizontal toolbar are now located on this toolbar. The vertical toolbar is designed to enhance your experience with clear, bright controls.
- Enhanced Project Navigator User ExperienceThe Project Navigator pane now features menu options to open a new or existing project to better facilitate yourVTuneexperience.Profiler
- Improvements to Vectorization InformationThe Vectorization sections of Performance Snapshot and HPC Performance Characterization analyses have been enriched to provide a clearer picture of the state of vectorization in your application. Quickly see if your code is not vectorized at all, if your code does not use the latest vector instruction set extension, or if your code has too many scalar instructions. This version ofVTunealso features improved recommendations to resolve vectorization issues.Profiler
- Rich Metric Tooltips in Multiple AnalysesThis release introduces rich metric tooltips in Performance Snapshot, Hotspots, HPC Performance Characterization, and Microarchitecture Exploration analyses. The new tooltips aim to make metrics more intuitive by providing visualizations for thresholds, desired direction (more/less is better), and tuning potential. Hover over a metric to get this tooltip.
- Detection of Compilation with Low Optimization Level in Hotspots AnalysisWhen debug information is available,VTune Profilernow detects and flags modules that may have been compiled using non-optimal compiler optimization flags in theTop Hotspotssection of the Hotspots analysis result. This can help detect underutilization of compiler optimization capabilities and correct the build system setup.
- Platform Diagram Extended with Persistent Memory BlockThis data is available on server platforms based on Intel microarchitectures code named Cascade Lake and Ice Lake.
- Changes to Viewpoint SelectionThe Viewpoint selection was adjusted with respect to each analysis type. Now, the viewpoint selection is disabled for certain analysis types, and only features a managed set of most helpful viewpoints for other analysis types. You can re-enable the display of all applicable viewpoints in the Options pane.
- Code Annotations
- Debug Formats
- Support for DWARF5 Debug FormatVTune Profilernow supports version 5 of the DWARF debug format. You can now use debug information in DWARF 5 format to resolve function names and source locations for binaries.
- Command Line Analysis
- Perf Tool Parameters for All Analysis TypesYou can now use the target-system command to get parameters on the command line for the nativeperftool for all CPU hardware-based analysis types, including custom analyses. Use theget-perf-cmdargument for this purpose. You can collect theperftrace on a target with the Linux Perf tool and then import the trace to the VTune Profiler UI.
- Information on Hybrid CPU AnalysisTheVTune ProfilerUser Guide features a new topic that explains how to profile applications that run on hybrid platforms.
- Guidance resource on GPU-profiling features inIntel® VTune™ProfilerA new article captures learning pathways to profile GPUs and illustrates techniques to Optimize Applications for Intel® GPUs withIntel® VTune™. Use this article to understand theProfilerIntel® VTune™workflow to profile and optimize GPUs. The article also informs about several key resources including procedural topics, cookbook recipes, and webinars that explain GPU compute profiling and graphics profiling with Intel software analyzer products.Profiler
- New CLI Cheat Sheet for quick referenceAdded a new downloadable document, theVTuneCLI Cheat Sheet. You can use this print-friendly PDF for quick reference on theProfilerVTunecommand-line interface.Profiler
- New Recipes inVTune ProfilerCookbookTheVTune ProfilerPerformance Analysis Cookbook features these new recipes: