- Threading analysis combines and replaces the Concurrency and Locks and Waits analysis types available in previous versions ofIntel® VTune™.Profiler
- Intel® VTune™ Profiler is a new renamed version of the Intel® VTune™ Amplifier.
- Thread count: a quick glance at the application thread count can give clues to threading inefficiencies, such as a fixed number of threads that might prevent the application from scaling to a larger number of cores or lead to thread oversubscription
- Wait time (trace-based or context switch-based): analyze threads waiting on synchronization objects or I/O
- Spin and overhead time: estimate threading runtime overhead or the impact of spin waits (busy or active waits)
- User-Mode Sampling and Tracing, which can recognize synchronization objects and collect thread wait time by objects using tracing. This is helpful in understanding thread interaction semantics and making optimization changes based on that data. There are two groups of synchronization objects supported byIntel VTune: objects usually used for synchronization between threads (such as Mutex or Semaphore) and objects associated with waits on I/O operations (such as Stream).Profiler
- Hardware Event-Based Sampling and Context Switches, which collects thread inactive wait time based on context switch information. Even though there is not a thread object definition in this case, the problematic synchronization functions can be found by using the wait time attributed with call stacks with lower overhead than the previous collection mode. The analysis based on context switches also shows thread preemption time, which is useful in measuring the impact of thread oversubscription on a system.
How It Works: User-Mode Sampling and Tracing
How It Works: Hardware Event-Based Sampling and Context Switches
- Inactive Sync Wait Time is caused by a request for synchronization
- Preemption Wait Time is caused by preemption
Configure and Run Analysis
- Click the (standalone GUI)/ (Visual Studio IDE)Configure Analysisbutton on theIntel® VTune™toolbar.ProfilerTheConfigure Analysiswindow opens.
- FromHOWpane, click the Browse button and selectThreading.
- Configure the collection options.User-Mode Sampling and TracingmodeSelect to enable the user-mode sampling and tracing collection for synchronization object analysis. This collection mode uses a fixed sampling interval of 10ms. If you need to change the interval, click theCopybutton and create a custom analysis configuration. For intervals less than 10ms, use theHardware Event-Based Sampling and Context Switchesmode.Hardware Event-Based Sampling and Context SwitchesmodeSelect to enable hardware event-based sampling and context switches collection.You can configure theCPU sampling interval, msto specify an interval (in milliseconds) between CPU samples. Possible values for thehardware event-based samplingmode are0.01-1000.1 msis used by default.When changing collection options, pay attention to theOverheaddiagram on the right. It dynamically changes to reflect the collection overhead incurred by the selected options.DetailsbuttonExpand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify or enable additional settings for the analysis, you need to create a custom configuration by copying an existing predefined configuration.VTunecreates an editable copy of this analysis type configuration.Profiler
- Summary window displays statistics on the overall application execution, identifying CPU time and processor utilization.
- Start on the resultSummarywindow to explore the Effective CPU utilization of your application and identify reasons for underutilization connected with synchronization, parallel work arrangement overhead, or incorrect thread count. Click links associated with flagged issues to be taken to more detailed information. For example, clicking a sync object name in theTop Waiting Objectstable takes you to that object in theBottom-upwindow.
- Analyze thread integration synchronization objects with wait and signal stacks and transitions on the timeline. Explore CPU time spent in threading runtimes to classify inefficiencies in their use.
- Modify your code to remove CPU utilization bottlenecks and improve the parallelism of your application.Concentrate your tuning on objects with long Wait time where the system is poorly utilized (red bars) during the wait. Consider adding parallelism, rebalancing, or reducing contention. Ideal utilization (green bars) occurs when the number of running threads equals the number of available logical cores.