Intel® VTune™ Profiler Functionality on AWS* Instances

ID 675508
Updated 6/17/2022
Version Latest
Public

author-image

By

Introduction

Intel® VTune™ Profiler is a performance profiling tool that delivers software and hardware performance analysis through its graphical and command line interface. There are three general types of data it collects:

  1. Software (user-mode hotspots and threading) - these collections are generally software-based and do not rely on availability of hardware events
  2. Hardware (event-based hotspots and threading, microarchitectural analysis, and HPC characteristics) - these collections are hardware-based and require the availability of some hardware events
  3. Memory (memory access and bandwidth analysis) - this collection is hardware-based and requires the availability of events that occur outside of the CPU (uncore events)

Amazon Web Services* (AWS*) provides a large variety of instance types and sizes for users in its Elastic Compute Cloud* (EC2*) service. Some VTune Profiler collection types will be unavailable on certain instances due to the hypervisor not providing the necessary hardware counters.

Instances Tested

Compute Optimized (C5, C6i)

VTune Profiler Functionality by Instance Type
Instance VTune Profiler Collections Supported Intel® Xeon® Scalable Processor Code Name
c5.4xlarge and smaller Software only Skylake or Cascade Lake
c5.9xlarge Software, Hardware Skylake or Cascade Lake
c5.12xlarge Software, Hardware Cascade Lake
c5.18xlarge Software, Hardware Skylake or Cascade Lake
c5.24xlarge Software, Hardware Cascade Lake
c5.metal All Cascade Lake
c6i.12xlarge and smaller Software only Ice Lake
c6i.16xlarge Software, Hardware Ice Lake
c6i.24xlarge Software only Ice Lake
c6i.32xlarge Software, Hardware Ice Lake
c6i.metal All Ice Lake

General Purpose (M5, M6i)

VTune Profiler Functionality by Instance Type
Instance VTune Profiler Collections Supported Intel® Xeon® Scalable Processor Code Name
m5.8xlarge and smaller Software only Skylake or Cascade Lake
m5.12xlarge Software, Hardware Skylake or Cascade Lake
m5.16xlarge Software only Skylake or Cascade Lake
m5.24xlarge Software, Hardware Skylake or Cascade Lake
m5.metal All Skylake or Cascade Lake
m6i.12xlarge and smaller Software only Ice Lake
m6i.16xlarge Software, Hardware Ice Lake
m6i.24xlarge Software only Ice Lake
m6i.32xlarge Software, Hardware Ice Lake
m6i.metal All Ice Lake

Memory Optimized (R5, R6i)

VTune Profiler Functionality by Instance Type
Instance VTune Profiler Collections Supported Intel® Xeon® Scalable Processor Code Name
r5.8xlarge and smaller Software only Skylake or Cascade Lake
r5.12xlarge Software, Hardware Skylake or Cascade Lake
r5.16xlarge Software only Skylake or Cascade Lake
r5.24xlarge Software, Hardware Skylake or Cascade Lake
r5.metal All Skylake or Cascade Lake
r6i.12xlarge and smaller Software only Ice Lake
r6i.16xlarge Software, Hardware Ice Lake
r6i.24xlarge Software only Ice Lake
r6i.32xlarge Software, Hardware Ice Lake
r6i.metal All Ice Lake

Instance Description

The instances tested use Intel® Xeon® Scalable Processors (codename Skylake, Cascade Lake, and Ice Lake) of various sizes and configuration. For more information, see: https://aws.amazon.com/ec2/instance-types/

Performance Monitoring Unit (PMU)

The PMU is on-chip hardware that monitors micro architectural events such as cache misses, cache hits and elapsed cycles. It also analyzes how the operating system or application performs on the processor. The PMU consists of two main types of events, hardware and software. The hardware event includes instructions, CPU cycles and cache references, and the software event includes context switches and page faults.

VTune Profiler has two ways of collecting on these events in Linux*:

  • Linux Perf* tool - an interface that provides access to the PMU and its features. Perf also provides modes such as event-based sampling (EBS) which records when a threshold number of events is reached. Perf is already installed on the default kernel.
  • VTune Profiler's sep driver - provided as part of the VTune Profiler package and installed if PMU access is detected. If VTune Profiler is unable to use the sep driver, it will collect using perf. The sep driver is only supported on metal instances at this time.

Instances without Full PMU Support

VTune Profiler analysis types such as the Additional Insights on Hotspot Analysis, Microarchitecture Exploration and HPC Performance Characterization require access to PMU events in order to provide hardware data such as instructions retired and number of cycles. The PMU events accessible on AWS* instances depends largely on the instance size. The instances tested run on Intel Xeon Scalable Processors with two sockets. Only instance sizes that use one or both complete sockets allow for PMU access, presumably because partial use of a socket results in shared CPU resources. Of the larger instances tested, the M5.16xlarge and R5.16xlarge instances do not support PMU events because they consume one complete socket and a portion of the second. Therefore they do not allow for the hardware analyses to take place.

Intel VTune Profiler - Application Performance Snapshot

Application Performance Snapshot (APS) is a utility packaged with VTune Profiler for Linux*. It provides the ability to quickly visualize MPI and OpenMP imbalances, efficiency of memory access, floating point unit (FPU), I/O and memory data in your application. After analyzing this data, it displays ways to perform additional analysis with VTune Profiler.

APS has the same limitations as VTune Amplifier hardware analysis types. It can only run when PMU events are accessible.

Intel VTune Profiler - Platform Profiler

The VTune Profiler Platform Profiler utility is also packaged with VTune Profiler. It profiles at the system level to help identify hardware configuration issues such as storage layout, memory and disk I/O, CPU frequency, cycles per instruction (CPI), power consumption and many more.

Platform Profiler is limited to use on metal instances only.

Metal versus Non-metal Instances

Some instance types have a metal offering that is the same size as the largest non-metal instance. For example, c5.24xlarge has the same number of vCPUs as c5.metal, and appears to utilize the same hardware. The main difference is that the 24xlarge instance still uses a hypervisor which prevents full access to the PMU, including uncore events used in memory access analysis. The result is that VTune Profiler will still be limited on the largest non-metal instance, and fully functional on the metal equivalent.