Microarchitecture Exploration Analysis for Hardware Issues
How It Works
- Pipeline slots containing useful work that issued and retired (Retired)
- Pipeline slots containing useful work that issued and cancelled (Bad speculation)
- Pipeline slots that could not be filled with useful work due to problems in the front-end (Front-end Bound)
- Pipeline slots that could not be filled with useful work due to a backup in the back-end (Back-end Bound)
- Intel Microarchitecture Code Name Sandy Bridge: This microarchitecture is already partially based on the top-down method and theVTuneprovides a hierarchical analysis of the hardware metrics based on the following categories: Filled Pipeline Slots and Unfilled Pipeline Slots (Stalls).Profiler
- Intel Microarchitectures Code Name Nehalem and Westmere: During Microarchitecture Exploration analysis on these microarchitectures, theVTunecollects metrics that help identify such hardware-level performance problems as:Profiler
- Front End stall and its causes
- Stalls at execution and retirement: particularly those caused by stalls due to the various high latency loads, wasted work caused by branch misprediction, or long latency instructions.
- For a detailed tuning methodology behind the Microarchitecture Exploration analysis and some of the complexities associated with this analysis, see
Configure and Run Analysis
- Click the (standalone GUI)/ (Visual Studio IDE)Configure Analysisbutton on theIntel® VTune™toolbar.ProfilerTheConfigure Analysiswindow opens.
- FromHOWpane, click the Browse button and selectMicroarchitecture Exploration.
- Configure the following options:CPU sampling interval, msspin boxSpecify an interval (in milliseconds) between CPU samples.Possible values -1-1000.The default value is1 ms.Extend granularity for the top-level metricsselection areaYou may limit the data collection by selecting particular top-level metrics. In this case, theVTuneextends the level of granularity and collects additional sub-metrics only for the selected top-level metrics. For example, if you select theProfilerMemory Boundtop-level metric, theVTunecollects additional data and providesProfilerMemory Boundsub-metrics (such as DRAM Bound, Store Bound, and so on), which helps narrow down the analysis to particular microarchitecture levels.Limiting the amount of data collected simultaneously may also improve profiling accuracy due to less multiplexing. This may be particularly helpful for short-running application or applications with short phases.Analyze memory bandwidthcheck boxCollect the data required to compute memory bandwidth.The option is disabled by default.Evaluate max DRAM bandwidthcheck boxEvaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds.The option is enabled by default.Collection modedrop-down menuChoose theDetailedsampling-based collection mode (default) to view a data breakdown per function and other hotspots. Use theSummarycounting-based mode for an overview of the whole profiling run. This mode has a lower collection overhead and faster post-processing time.DetailsbuttonExpand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify or enable additional settings for the analysis, you need to create a custom configuration by copying an existing predefined configuration.VTunecreates an editable copy of this analysis type configuration.Profiler