User Guide

Contents

Input and Output Analysis

Use the Input and Output analysis of
Intel® VTune™
Profiler
to locate performance bottlenecks in I/O-intensive applications at both hardware and software levels.
The Input and Output analysis of
Intel® VTune™
Profiler
helps to determine:
The Input and Output analysis features two main types of performance metrics:
  • Platform-level metrics
    — application-agnostic hardware event-based metrics.
  • OS- and API-specific metrics
    — performance metrics for software data planes—DPDK and SPDK—and the Linux* kernel I/O stack.
Linux* and FreeBSD* targets are supported.
The full set of Input and Output analysis metrics is available on Intel® Xeon® processors only.

Configure and Run Analysis

On FreeBSD systems, the graphical user interface of
VTune
Profiler
is not supported. You can still configure and run the analysis from a Linux* or Windows* system using remote SSH capabilities, or collect the result locally from the CLI. For more information on available options, see FreeBSD Targets.
  1. Launch
    VTune
    Profiler
    and, optionally, create a new project.
  2. Click the
    Configure Analysis
    button.
  3. In the
    WHERE
    pane, select the target system to profile.
  4. In the
    HOW
    pane, select
    Input and Output
    .
  5. In the
    WHAT
    pane, specify your analysis target (application, process, or system).
  6. Depending on your target app and analysis purpose, choose any of the configuration options described in sections below.
  7. Click
    Start
    to run the analysis.
    VTune
    Profiler
    collects the data, generates a result, and opens the result with that displays data according to configuration.
To run the Input and Output analysis from the command line, enter:
vtune -collect io [-knob <value>] -- <target> [target_options]
For details, see the
io
command line reference
.

Platform-Level Metrics

To collect hardware event-based metrics, either load the Intel sampling driver or configure driverless hardware event collection (Linux targets only).
IO Analysis Configuration Check Box
Features
Prerequisites/Applicability
Analyze PCIe traffic
Calculate inbound I/O (Intel® Data Direct I/O) and outbound I/O (Memory-Mapped I/O) bandwidth.
Available on server platforms based on Intel® microarchitecture code named Sandy Bridge EP and newer.
The granularity of I/O bandwidth metrics depends on CPU model, collector used, and user privileges:
  • Code names:
    Sandy Bridge, Ivy Bridge, Haswell, Broadwell.
    • Granularity:
      by CPU socket (package) in any case.
  • Code names:
    Skylake, Cascade Lake, Cooper Lake.
    • Granularity:
      • With sampling driver:
        I/O device (external PCIe or integrated accelerator).
      • Driverless with root:
        I/O device (external PCIe or integrated accelerator).
      • Driverless without root:
        before kernel v5.10—CPU socket; on kernels v5.10 and newer—I/O device.
  • Code names:
    Snow Ridge, Ice Lake
    • Granularity:
      • With sampling driver:
        I/O device (external PCIe or integrated accelerator).
      • Driverless with root:
        I/O device (external PCIe or integrated accelerator).
      • Driverless without root:
        before kernel v5.14—CPU socket; on kernels v5.14 and newer—I/O device.
Calculate L3 hits and misses of inbound I/O requests (Intel® DDIO hits/misses).
Available on server platforms based on Intel® microarchitecture code named Haswell and newer.
The granularity of inbound I/O request L3 hit/miss metrics depends on CPU model, collector used and user privileges:
  • Code names:
    Haswell, Broadwell.
    • Granularity:
      by CPU socket (package) in any case.
  • Code names:
    Skylake, Cascade Lake, Cooper Lake.
    • Granularity:
      • With sampling driver:
        set of I/O devices
        1
        .
      • Driverless with root:
        set of I/O devices
        1
        .
      • Driverless without root:
        CPU socket (package).
  • Code names:
    Snow Ridge, Ice Lake
    • Granularity:
      • With sampling driver:
        set of I/O devices
        1
        .
      • Driverless with root:
        set of I/O devices
        1
        .
      • Driverless without root:
        CPU socket (package).
1
—commonly, a set combines all devices sharing the same 16 PCIe lanes.
Calculate average latency of inbound I/O reads and writes, as well as CPU/IO conflicts.
Available on server platforms based on Intel® microarchitecture code named Skylake and newer.
The granularity of latency and CPU/IO conflicts metrics depends on CPU model, collector used and user privileges:
  • Code names:
    Skylake, Cascade Lake, Cooper Lake.
    • Granularity:
      • With sampling driver:
        set of I/O devices
        1
        .
      • Driverless with root:
        set of I/O devices
        1, 2
        .
      • Driverless without root:
        CPU socket (package)
        2
        .
  • Code names:
    Snow Ridge, Ice Lake
    • Granularity:
      • With sampling driver:
        set of I/O devices
        1
        .
      • Driverless with root:
        set of I/O devices
        1
        .
      • Driverless without root:
        CPU socket (package).
1
—commonly, a set combines all devices sharing the same 16 PCIe lanes.
2
—average inbound I/O read latency is not available in driverless collection on Skylake, Cascade Lake, Cooper Lake servers.
Locate MMIO accesses
Locate code that induces outbound I/O traffic by accessing device memory through the MMIO address space.
Available on server platforms based on Intel® microarchitecture code named Skylake and newer.
  • This option is not available in Profile System mode.
  • This option is available on Linux systems only.
Analyze Intel® VT-d
Calculate performance metrics for Intel® Virtualization Technology for Directed I/O (Intel VT-d).
Available on server platforms based on Intel® microarchitecture code named Ice Lake and newer.
The Intel VT-d metrics granularity depends on collector used and user privileges:
  • Code names:
    Snow Ridge, Ice Lake
    • Granularity:
      • With sampling driver:
        set of I/O devices
        1
        .
      • Driverless with root:
        set of I/O devices
        1
        .
      • Driverless without root:
        before kernel v5.14—CPU socket; on kernels v5.14 and newer—set of I/O devices
        1
        .
1
—commonly, a set combines all devices sharing the same 16 PCIe lanes.
Analyze memory and cross-socket bandwidth
Calculate DRAM, Persistent Memory, and Intel® Ultra Path Interconnect (Intel® UPI) or Intel® QuickPath Interconnect (Intel® QPI) bandwidth.
While DRAM bandwidth data is always collected, persistent memory bandwidth and Intel® UPI / Intel® QPI cross-socket bandwidth data is only collected when applicable to the system.
Evaluate max DRAM bandwidth
Evaluate the maximum achievable local DRAM bandwidth before the collection starts.
This data is used to scale bandwidth metrics on the Platform Diagram and timeline and to calculate thresholds.
Not available on FreeBSD systems.

OS- and API-Level Metrics

IO Analysis Configuration Check Box
Prerequisites/Applicability
DPDK
Make sure DPDK is built with
VTune
Profiler
support
enabled.
When profiling DPDK as FD.io VPP plugin, modify the
DPDK_MESON_ARGS
variable in
build/external/packages/dpdk.mk
with the same flags as described in Profiling with VTune section.
Not available for FreeBSD targets. Not available in system-wide mode.
SPDK
Make sure SPDK is built using the
--with-vtune
advanced build option.
When profiling in
Attach to Process
mode, make sure to set up the environment variables before launching the application.
Not available in
Profile System
mode.
Kernel I/O
To collect these metrics,
VTune
Profiler
enables FTrace* collection that requires access to
debugfs
. On some systems, this requires that you reconfigure your permissions for the prepare_debugfs.sh script located in the
bin
directory, or use root privileges.
Not available for FreeBSD targets.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.