HPC Performance Characterization Analysis
How It Works
Configure and Run Analysis
- Click the (standalone GUI)/ (Visual Studio IDE)Configure Analysisbutton on theIntel® VTune™toolbar.ProfilerTheConfigure Analysiswindow opens.
- FromHOWpane, click the Browse button and selectHPC Performance Characterization.
- Configure the following options:CPU sampling interval, msfieldSpecify an interval (in milliseconds) between CPU samples.Possible values -0.01-1000.The default value is1.Collect stackscheck boxEnable advanced collection of call stacks and thread context switches.The option is disabled by default.Analyze memory bandwidthcheck boxCollect the data required to compute memory bandwidth.The option is enabled by default.Evaluate max DRAM bandwidthcheck boxEvaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds.The option is enabled by default.Analyze OpenMP regionscheck boxInstrument and analyze OpenMP regions to detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction and atomic operations.The option is enabled by default.DetailsbuttonExpand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify or enable additional settings for the analysis, you need to create a custom configuration by copying an existing predefined configuration.VTunecreates an editable copy of this analysis type configuration.Profiler
- Click the Start button to run the analysis.
- Effective Physical Core Utilization: Explore application parallel efficiency by looking at physical core utilization by the application code execution. Look for scalability problems involving the use of serial time versus parallel time, tuning potential for OpenMP regions, and MPI imbalance.
- Vectorization: Determine if floating-point loops are bandwidth bound or vectorized. For bandwidth bound loops/functions, run the Memory Access Analysis to reduce bandwidth consumption. For vectorization optimization opportunities, use the Intel Advisor to run a vectorization analysis.
- Intel® Omni-Path Fabric Usage: Identify performance bottlenecks caused by reaching the interconnect limits.