Profiling a .NET* Core Application
This recipe uses Intel® VTune™ Profiler for .NET Core dynamic-code profiling to locate performance hotspots in the managed code and optimize the application turnaround.
Ingredients
This section lists the hardware and software tools used for the performance analysis scenario.
- Application:a sample C# application that adds all the elements of an integer List. The application is used as a demo and not available for download.
- Tools:
- Intel® VTune™ Profiler 2018
- Starting with the 2020 release, Intel® VTune™ Amplifier has been renamed toIntel® VTune™.Profiler
- Most recipes in theIntel® VTune™Performance Analysis Cookbook are flexible. You can apply them to different versions ofProfilerIntel® VTune™. In some cases, minor adjustments may be required.Profiler
- Get the latest version ofIntel® VTune™:Profiler
- From theIntel® VTune™product page.Profiler
- Download the latest standalone package from the Intel® oneAPI standalone components page.
- Operating system:Microsoft* Windows* 10
- CPU: Intel microarchitecture code name Skylake
Prepare Your Application for Analysis
- Open a new command window for the .NET environment variables to take effect. Make sure that .NET Core 2.0 is successfully installed:dotnet --version
- Create a newlistadddirectory for the application:mkdir C:\listadd > cd C:\listadd
- Enterdotnet new consoleto create a new skeleton project with the following structure:
- Replace the contents ofProgram.csin thelistaddfolder with C# code that adds the elements of an integer List:using System; using System.Linq; using System.Collections.Generic; namespace listadd { class Program { static void Main(string[] args) { Console.WriteLine("Starting calculation..."); List<int> numbers = Enumerable.Range(1,10000).ToList(); for (int i =0; i < 100000; i ++) { ListAdd(numbers); } Console.WriteLine("Calculation complete"); } static int ListAdd(List<int> candidateList) { int result = 0; foreach (int item in candidateList) { result += item; } return result; } } }
- Createlistadd.dllin theC:\listadd\bin\Release\netcoreapp2.0folder:dotnet build -c Release
- Run the sample application:dotnet C:\listadd\bin\Release\netcoreapp2.0\listadd.dll
Run Advanced Hotspots Analysis
- Launch VTune Profiler with administrator privileges.
- Click theNew Projectbutton on the toolbar and specify a name for the new project, for example:dotnet.
- In theAnalysis Targetwindow, selectlocal hostandLaunch Applicationtarget type from the left pane.
- On theLaunch Applicationpane, specify the application to analyze:
- Application:C:\Program Files\dotnet\dotnet.exe
- Application parameters:C:\listadd\bin\Release\netcoreapp2.0\listadd.dll
The location ofdotnet.exedepends on your environment and can be identified with the command:where dotnet. - Click theChoose Analysisbutton on the right and select theAdvanced Hotspotsanalysis from the left pane.Advanced Hotspots analysis was integrated into the generic Hotspots analysis starting with Intel VTune Amplifier 2019, and is available via the Hardware Event-Based Sampling collection mode.
- ClickStartto run the analysis.
Identify Hotspots in the Managed Code
When the collected analysis result opens, switch to the
Bottom-up
tab and set the data grouping level to
Process/Module/Function/Thread/Call Stack
:

Expanding
dotnet.exe
>
listadd.dll
discovers the managed
listadd::Program::ListAdd
function that took the most CPU Time:

Double-click this hotspot function to open the source view. To view the source and disassembly code side by side, click the
Assembly
toggle button on the toolbar:

Use the statistics per source line/assembly instruction to identify the most time-consuming code snippets (line 24 in the example above) and work on optimizations.
Optimize the Code with Loop Interchange
VTune Amplifier highlights the following code line as performance-critical:
foreach (int item in candidateList)
For optimization, consider using the
for
loop statement. Replace the contents of
Program.cs
with this C# code:
using System;
using System.Linq;
using System.Collections.Generic;
namespace listadd
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Starting calculation...");
List<int> numbers = Enumerable.Range(1,10000).ToList();
for (int i =0; i < 100000; i ++)
{
ListAdd(numbers);
}
Console.WriteLine("Calculation complete");
}
static int ListAdd(List<int> candidateList)
{
int result = 0;
for (int i = 0; i < candidateList.Count; i++)
{
result += candidateList[i];
}
return result;
}
}
}
Verify the Optimization
To verify the optimization for the updated code, re-run the Advanced Hotspots analysis.
Before the optimization the sample application took 2.636 seconds of CPU time:

After optimization the application ran for 0.945s, which is a 64% reduction in time over the original:

To discuss this recipe, visit the
developer forum