Measuring Performance Impact of NUMA in Multi-Processor Systems (NEW)
- Application:The sample application used in this recipe is not available for download.
- Tool:VTune Profilerversion 2021.5.0 or newer
Run Platform Profiler Analysis
- On the Welcome screen ofVTune Profiler, clickConfigure Analysis.
- In the Analysis Tree, select thePlatform Profileranalysis type in thePlatform Analysesgroup.
- In theWHATpane, selectProfile System. If necessary, set limits on size and time for data collection in theAdvancedsection.
- Click theStartbutton to run the analysis. At any time before the collection completes, you can click theStopbutton to terminate data collection and view results.
- Platform diagram
- Interactive time line
- Performance data charts
Identify NUMA Issues
- Start your analysis of NUMA issues by selectingOverviewin theSelect Viewpulldown menu. In the system configuration overview, you can see that a NUMA system typically has multiple processor sockets, each of which has memory controllers. This diagram is an example of a platform diagram for a two-socket system.In the case of the sample application used here. the performance is mostly bound by memory access. There are a high number of memory accesses that are targeted to remote memory.
- See theNon-Uniform Memory Access Analysisgraph to compare local vs. remote memory accesses over time. A high percentage of remote accesses indicates a NUMA related performance issue.
- Observe theThroughput Metricssection. In this sample application, there are frequent spikes in cross-socket (UPI) traffic. These spikes correspond to remote memory accesses.
- Switch to theMemoryview to see additional information about memory accesses for each processor socket. In this sample application, both sockets initiate remote memory accesses.Latencies in memory accesses spike when the remote memory is accessed. These spikes indicate an opportunity for performance improvement.
- Switch to theUPIview to see the cross-socket traffic transmitted by each socket.