Input and Output Analysis
Use the Input and Output analysis of
Intel® VTune™
to locate performance bottlenecks in I/O-intensive applications at both hardware and software levels.
Profiler
The Input and Output analysis of
Intel® VTune™
helps to determine:
Profiler
- Platform I/O consumption by external PCIe devices and integrated accelerators:
- I/O bandwidth consumption, including Intel® Data Direct I/O Technology (Intel® DDIO) and Memory-Mapped I/O traffic.
- Memory bandwidth consumption.
- Software data plane utilization.
The Input and Output analysis features two main types of performance metrics:
- Platform-level metrics— application-agnostic hardware event-based metrics.
- OS- and API-specific metrics— performance metrics for software data planes—DPDK and SPDK—and the Linux* kernel I/O stack.
Linux* and FreeBSD* targets are supported.
The full set of Input and Output analysis metrics is available on Intel® Xeon® processors only.

Configure and Run Analysis
On FreeBSD systems, the graphical user interface of
VTune
is not supported. You can still configure and run the analysis from a Linux* or Windows* system using remote SSH capabilities, or collect the result locally from the CLI. For more information on available options, see
FreeBSD Targets.
Profiler
- LaunchVTuneand, optionally, create a new project.Profiler
- Click theConfigure Analysisbutton.
- In theWHEREpane, select the target system to profile.
- In theHOWpane, selectInput and Output.
- In theWHATpane, specify your analysis target (application, process, or system).
- Depending on your target app and analysis purpose, choose any of the configuration options described in sections below.
- ClickStartto run the analysis.VTunecollects the data, generates a result, and opens the result with that displays data according to configuration.Profiler
To run the Input and Output analysis from the command line, enter:
vtune -collect io [-knob <value>] -- <target> [target_options]
For details, see the
io
command line reference.
Platform-Level Metrics
To collect hardware event-based metrics, either
load the Intel sampling driver or
configure driverless hardware event collection (Linux targets only).
IO Analysis Configuration Check Box
| Features
| Prerequisites/Applicability
|
---|---|---|
Analyze PCIe traffic | Calculate inbound I/O (Intel® Data Direct I/O) and outbound I/O (Memory-Mapped I/O) bandwidth.
| Available on server platforms based on Intel® microarchitecture code named Sandy Bridge EP and newer.
The granularity of I/O bandwidth metrics depends on CPU model, collector used, and user privileges:
|
Calculate L3 hits and misses of inbound I/O requests (Intel® DDIO hits/misses).
| Available on server platforms based on Intel® microarchitecture code named Haswell and newer.
The granularity of inbound I/O request L3 hit/miss metrics depends on CPU model, collector used and user privileges:
1 —commonly, a set combines all devices sharing the same 16 PCIe lanes.
| |
Calculate average latency of inbound I/O reads and writes, as well as CPU/IO conflicts.
| Available on server platforms based on Intel® microarchitecture code named Skylake and newer.
The granularity of latency and CPU/IO conflicts metrics depends on CPU model, collector used and user privileges:
1 —commonly, a set combines all devices sharing the same 16 PCIe lanes.
2 —average inbound I/O read latency is not available in driverless collection on Skylake, Cascade Lake, Cooper Lake servers.
| |
Locate MMIO accesses | Locate code that induces outbound I/O traffic by accessing device memory through the MMIO address space.
| Available on server platforms based on Intel® microarchitecture code named Skylake and newer.
|
Analyze Intel® VT-d | Calculate performance metrics for Intel® Virtualization Technology for Directed I/O (Intel VT-d).
| Available on server platforms based on Intel® microarchitecture code named Ice Lake and newer.
The Intel VT-d metrics granularity depends on collector used and user privileges:
1 —commonly, a set combines all devices sharing the same 16 PCIe lanes.
|
Analyze memory and cross-socket bandwidth | Calculate DRAM, Persistent Memory, and Intel® Ultra Path Interconnect (Intel® UPI) or Intel® QuickPath Interconnect (Intel® QPI) bandwidth.
| While DRAM bandwidth data is always collected, persistent memory bandwidth and Intel® UPI / Intel® QPI cross-socket bandwidth data is only collected when applicable to the system.
|
Evaluate max DRAM bandwidth | Evaluate the maximum achievable local DRAM bandwidth before the collection starts.
This data is used to scale bandwidth metrics on the Platform Diagram and timeline and to calculate thresholds.
| Not available on FreeBSD systems.
|
OS- and API-Level Metrics
IO Analysis Configuration Check Box
| Prerequisites/Applicability
|
---|---|
DPDK | Make sure DPDK is built with
VTune
support enabled.
Profiler When profiling DPDK as
FD.io VPP plugin, modify the
DPDK_MESON_ARGS variable in
build/external/packages/dpdk.mk with the same flags as described in
Profiling with VTune section.
Not available for FreeBSD targets. Not available in system-wide mode.
|
SPDK | Make sure SPDK is built using the
--with-vtune advanced build option.
When profiling in
Attach to Process mode, make sure to
set up the environment variables before launching the application.
Not available in
Profile System mode.
|
Kernel I/O | To collect these metrics,
VTune
enables FTrace* collection that requires access to
Profiler debugfs . On some systems, this requires that you reconfigure your permissions for the
prepare_debugfs.sh script located in the
bin directory, or use root privileges.
Not available for FreeBSD targets.
|