Example 1: Event-based System-wide Sampling Collection
The command line below collects system-wide Hotspots analysis information without call stacks. This command automatically pulls in modules required for viewing results from the device and caches them in the
directory on the host. This happens only on the first collection, all subsequent collections reuse modules from the cache.
host>./vtune -target-system=ssh:email@example.com –collect hotspots -knob sampling-mode=hw -duration 10
For system-wide collection, a lot of modules running in the system during collection are copied from the target to the host, which may take a while. However, this happens only once since
caches target system modules on the host for faster access on the next collection. If you do not want the command to take the modules from the device, you can specify a local directory where modules will be searched first, for example:
host>./vtune -target-system=ssh:firstname.lastname@example.org –collect hotspots -knob sampling-mode=hw -duration 10 -search-dir /search/path
In the case above,
can be either a directory where modules are located, or it can be a pointer to the root file system of the target device. For example, when the collector searches for the
file from the target device, it first tries
, then it tries
, and only after that it attempts to copy the file from the target device.
Example 2: Event-based Sampling Collection
This example shows hot to attach the analysis to a running application by its PID.
host>./vtune -target-system=ssh:email@example.com –collect hotspots -knob sampling-mode=hw -target-pid 333
Example 3: Advanced Event-based Sampling Collection
You can take any event supported by the Performance Monitoring Unit (PMU). Additionally, you can enable multiple event collection at a time.
The following example identifies potential latency or responsiveness issues:
host>./vtune -target=ssh:firstname.lastname@example.org -duration 10 -collect-with runsa -knob event-config="CPU_CLK_UNHALTED.REF:sa=20000”
This command line takes samples at ~2x the rate of a context switch, which gives you an approximately 20% performance hit.