Developer Guide

  • 2021.1
  • 11/03/2021
  • Public
Contents

Measurement Library

The measurement library is a set of C APIs that enable you to analyze the behavior of real-time applications. You can use the library to instrument your code and gather various latency statistics in CPU clock cycles, nanoseconds, or microseconds. In addition, you can react to deadline violations and store latency values in a shared memory ring buffer to be processed by an external application.
The library is intended specifically for analysis of isochronous cyclic workloads and parts of these workloads, such as data input, processing, and output. An
isochronous cyclic workload
is a code sequence in the real-time application that runs multiple times, and each iteration must be executed within a defined period. The defined period is also known as maximum tolerable latency or deadline and is the same for each iteration. Correspondingly, each part of the cyclic workload has its own deadline. The maximum measured latency value among all iterations is known as the worst-case execution time (WCET). For one-time measurements, the library is not recommended.
The following sections provide a brief overview of key concepts. More information is provided later in this guide.

Instrumenting Code

The library works in conjunction with the Instrumentation and Tracing Technology (ITT) APIs. The ITT APIs generate and control the collection of trace data during application run.
First, you will identify the tasks that you want to measure in your application. Add
__itt_task_begin
and
__itt_task_end
calls in your code to mark the beginning and end of each task.
/* Initialize the ITT domain */ domain = __itt_domain_create("TCC"); /* Initialize the ITT handlers to collect performance data */ cycle_handler = __itt_string_handle_create(cycle_name); for (int i = 0; i < iterations; ++i) { /* Start cycle measurement */ __itt_task_begin(domain, __itt_null, __itt_null, cycle_handler); /* Run cycle */ cycle(); /* End cycle measurement */ __itt_task_end(domain);
The begin function uses the CPU’s timestamp counter (TSC) to collect the start time of the measured block. The end function uses the TSC to collect the end time of the measured block, and calculates the difference between the start and end timestamps to get the latency of the measured block.
The begin and end functions together add runtime overhead in the hundreds of nanoseconds range (see Overhead and Precision for details). To reduce the relative cost of the functions, you can run the measured sequence multiple times. This can be useful, for example, when your data processing code contains an inner loop with multiple iterations of the measured sequence. In this case, latency is the sum of multiple iterations of the measured sequence.
Each instrumented block of code is a
measurement instance
. For each measurement instance, a unique
__itt_string_handle
should be created and used. Your application can have multiple measurement instances, including nested instances. For example, if your application contains multiple consecutive stages, you can create a measurement instance for each stage and an additional measurement instance for the entire workload. This can help to isolate the biggest source of latency among the stages.

Collecting Latency Data

The ITT APIs are implemented in a static library
libittnotify64.a
, which forwards the calls to a shared library called a
collector
. A collector performs data collection and processing.
Your application can access the following collectors:
  • Measurement library collector: This collector implements a subset of the functions defined in ITT APIs, plus additional functions to access collected data from the instrumented application. You can use this collector to access the measurement results from your application and store results in a shared memory buffer or in a file with simple format.
  • VTune™ collector: Intel® VTune™ Profiler provides the VTune™ collector, which collects data for visualization in that tool. For example, you can analyze the relationship between tasks in your code relative to other CPU and GPU tasks.
You can use only one collector at a time.
The collector is selected and loaded at runtime based on the environment variable
INTEL_LIBITTNOTIFY64
.
The measurement library uses a handle to a structure called
tcc_measurement
to store raw and processed latency data and the measurement state. The structure is created implicitly when the ITT APIs call measurement collector functions.
Each measurement structure contains a reference to a corresponding measurement buffer, which has zero size by default. When the buffer size is zero, only aggregated statistics, without per-iteration data, will be collected. When the buffer size is higher than zero, the collector library will put collected per-iteration latency measurements into the buffer. Buffer size and other attributes can be configured through the use of the environment variable
TCC_MEASUREMENTS_BUFFERS
.
You can also specify whether to use a shared memory ring buffer or a local buffer through the
TCC_USE_SHARED_MEMORY
environment variable. Shared memory allows other applications to access the measurements, which can be used for data monitoring, storage, and analysis.

Analyzing Latency Data from the Measurement Library Collector

You can access the measurement structure in your application by calling
tcc_measurement_get()
when the measurement library collector is currently loaded.
You can do the following analysis of latency data:
  • Analyze measurements in your application:
    • Measure and print the minimum, maximum, and average latencies of a workload
    • Set a deadline and run a custom callback function every time an iteration exceeds the deadline
    • Convert measurement results to CPU clock cycles, microseconds, or nanoseconds
  • Analyze measurements offline:
    • Store the raw measurement results in a dump file for offline analysis by a separate application
    • Print measurements to the console or in JSON format
    • Visualize the data, for example, histograms
  • Monitor measurements from a monitoring application:
    • Create a separate application to track measurements generated from the real-time application and perform actions on those measurements, such as print various statistics and react to deadline violations

Analyzing Latency Data from the VTune™ Collector

When the VTune™ collector is enabled, you can visualize the latency data in Intel® VTune™ Profiler. Use of Intel® VTune™ Profiler is not required, but can offer rich data about your application. For example, you can see the sequence and duration of tasks in your application, along with CPU and GPU tasks, on a consolidated timeline.

Libraries

The following table shows which header files to use to access corresponding APIs.
Library Name
Description
Header File
Instrumentation and Tracing Technology (ITT) APIs
A static library for instrumentation of code. ITT is supported by various software toolkits and Intel® VTune™ Profiler.
ittnotify.h
Measurement library collector
A dynamic library for runtime data collection.
Measurement library
Shared and static libraries for accessing and analyzing the results.
tcc/measurement.h; tcc/measurement_helpers.h

Example of Using Measurement Library

The following diagram demonstrates the flow for an example scenario which uses ITT APIs, measurement library collector, and measurement library static library. The same workflow is described in more detail in Analyze Measurements in Your Workload.
Starting on the left side, the diagram shows that the real-time application is instrumented with ITT APIs and it is linked against the ITT Notify static library (
libittnotify64.a
). At runtime, the static library reads the environment variable
INTEL_LIBITTNOTIFY64
and loads the measurement library collector (
libtcc_collector.so
), a dynamic library. The measurement library collector initializes the structures for data collection and stores the latency measurements there.
In addition, from the right side of the diagram, the real-time application uses measurement library functions to access the data structures. In this case, the application is linked against the measurement library (
libtcc_static.a
), a static library. The measurement library reads the environment and loads the measurement library collector (
libtcc_collector.so
). As a result, the application can access the data structures created in the measurement library collector. The
libtcc.so
shared library is linked by the measurement library collector and real-time application (through
libtcc_static.a
) to handle internal function calls.
The
__itt_task_begin()
and
__itt_task_end()
implementation in the measurement library collector is not thread-safe. You should not create and use measurement instances from multiple threads simultaneously when using the measurement library collector.
You can use the measurement library during development of your application and disable it for production deployment. Use the
-DNO_TCC_MEASUREMENT -DINTEL_NO_ITTNOTIFY_API
compilation option to compile your application without measurement library calls. Reasons for disabling the library for production deployment include eliminating measurement overhead and security risks of using environment variables.

Overhead and Precision

The APIs have minimal runtime overhead and high measurement precision.
11th Gen Intel® Core™ processors:
  • Each measurement adds no more than 102 ns overhead (52 ns average)
  • Accurately measures intervals starting from 15 ns
Intel Atom® x6000E Series processors:
  • Each measurement adds no more than 608 ns overhead (220 ns average)
  • Accurately measures intervals starting from 60 ns
Results may vary. Testing conducted November 12, 2020. Configuration:
  • 11th Gen Intel® Core™ processor:
    • Hardware: QVD5 (B2)
    • BSP: TGL_external_ER57 + Intel® TCC dependencies layer
    • BIOS: TGLIFUI1.R00.3455.A02.2011240812
    • Intel® TCC Mode enabled.
    • No software SRAM regions.
  • Intel Atom® x6000E Series processor:
    • Hardware: QV3J (B0 fuse rev.11) + 44698-201 customer reference board
    • BSP: EHL_external_Beta3 + Intel® TCC dependencies layer
    • BIOS: EHLSFWI1.R00.2463.A12.2012141439
    • Intel® TCC Mode enabled.
    • No software SRAM regions.
Methodology:
  1. Set real-time settings:
    • Scheduler: 99 FIFO
    • CPU 3
    • Interrupts disabled
  2. Determining the minimum interval: Run start and stop measurements without anything else. Calculate minimum, average, and maximum.
  3. Determining the overhead: Measure start and stop. Calculate minimum, average and maximum.

Usage Model

To summarize, follow these steps to analyze your workload:
  1. Instrument your code using ITT APIs: See Instrument the Code. For examples of instrumenting the code, see Single Measurement Sample and Multiple Measurements Sample.
  2. Set up the collector using environment settings and run your instrumented application: See Control Data Collection.
  3. Analyze the results:

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.