Intel® Time Coordinated Computing Tools (Intel® TCC Tools)
Improve Performance of Latency-Sensitive Applications
Time As Performance
Intel® processors are multipurpose and can serve a wide range of use cases including data analysis in the cloud, gaming PCs and traditional office laptops, and edge devices. Intel® Time Coordinated Computing (Intel® TCC) is a new set of features that augments the compute performance of Intel processors to address the stringent temporal requirements of real-time applications. Intel TCC reduces jitter and improves performance for latency sensitive applications. It helps to maximize efficiency by aggregating time-critical and non-time-constrained applications onto a single board.
While Intel TCC features reside in the processor, their full potential is unlocked when the whole solution stack is optimized top to bottom. Intel offers a reference real-time software stack that abstracts these hardware features to accelerate hardware configuration ("tuning") and application development. This solution stack consists of:
- Real-time hardware processors optimized for real-time applications:
- Intel® Xeon® D-2752TER processor
- Intel® Xeon® D-1700T series processors
- Intel® Xeon® W-11000E series processors
- 12th generation Intel® Core™ processors
- 11th generation Intel® Core™ processors
- Intel Atom® x6000E series processors
- System software stack:
- The board support package (BSP) foundation is a Yocto Project* distribution of Linux with PREEMPT_RT patch and other real-time optimizations.
- UEFI reference BIOS with Intel® TCC Mode
- Slim Bootloader boot option with Intel® TCC Tools
- Intel TCC Tools
Seamless Development of Real-Time Solutions
Attain Hard Real-Time Requirements
Projected real-time compute performance:
- 1 ms cycle time with out-of-box configuration†
- Sub-200 us cycle time with the Intel TCC Mode setting in BIOS†
- Flexible real-time tuning for a range of cycle times with Intel TCC Tools
The various tools can be used separately or together depending on the unique needs of your real-time use case.
†Based on specific real-time system configurations and workloads. For complete information about performance and benchmark results, see Benchmarks.
Optimize System Performance
Data Streams Optimizer improves the latency of data transfers between compute subsystems, including cores, memory, and PCIe*.
Cache Configurator and Cache Allocation Library help applications achieve low latencies by using buffers in cache memory.
Understand Workload Deadlines
Real-Time Readiness Checker and Measurement Library help find the causes of deadline misses, such as system configuration settings and bottlenecks in your code.
Learn about Time Synchronization
Time-Aware GPIO and Ethernet Timestamps sample applications explain how to use hardware-assisted time synchronization available in supported Intel processors and enabled in the BSP.
What Problems Does Intel TCC Tools Solve?
The board support package and the Intel TCC Mode in BIOS may be sufficient to satisfy many use cases with initial cycle times possible in the hundreds of microseconds. Other use cases may require more granular tuning techniques to achieve cycle times, balance system power consumption, or address unique demands. These techniques can optimize cache and I/O.
Intel TCC Tools facilitates these optimizations by offering C language APIs and tools. Previously, these techniques were not publicly available and required specialized hardware knowledge.
To use Intel TCC Tools, you may need to know, measure, or analyze new aspects of your real-time application, such as latency requirements for memory access or I/O paths. Take advantage of sample applications and documentation support to accelerate your learning and code development.
Figure 1. Steps to optimize for real-time improvements.
Real-Time Improvements across the Software Stack
Each software component helps improve the real-time performance by addressing latency and jitter caused by various sources.
As shown in Figure 1, the following describes the components and steps to take to optimize for real-time software stack improvements.
Step 1: Add the board support package.
Satisfies cycle times (the amount of time allotted to complete a cyclic workload) in the low milliseconds range
Addresses operating system latency via Yocto Project* distribution of Linux* with real-time kernel and optimized drivers
Step 2: Activate Intel TCC optimizations in the firmware to further enhance performance.
- Satisfies cycle times in the hundreds of microseconds range
- Addresses processor latency using a Intel TCC Mode setting that disables power management and enables Intel TCC features, system management interrupt (SMI) reductions, and other optimizations
Step 3: Add Intel TCC Tools to fine-tune performance.
- Satisfies cycle times below 100 microseconds or unique demands
- Addresses the need to balance real-time performance, power, and general compute by the tuning of cache or I/O
When used together or separately, these components offer flexibility in achieving a range of real-time requirements.
Figure 2. Illustration of how the board support package, Intel TCC Mode in the BIOS, and Intel TCC Tools work together as a process.
Use Intel TCC Tools Features
After you add the board support package and activate Intel TCC Mode in the BIOS (see Figure 1), follow these steps.
Step 3 (continued): Set advanced-level tuning in the processor and BIOS with Intel TCC Tools. Use the Real-Time Readiness Checker to quickly check BIOS and other system settings that may be affecting real-time performance.
Step 4: Run your real-time application to understand if your deadline is met.
Step 5: Measure and analyze the behavior of your application. Instrument your code with the Measurement Library to gather latency statistics and visualize them for your needs.
Step 6: Optimize your real-time configuration based on whether you find data access or data transfer deadline violations, or if other system requirements are not met.
Problems That Can Be Solved Using Intel TCC Tools
Data Access Latency
- Use Cache Configurator to choose preset configurations that reserve cache among multiple applications. You can also refine cache allocation among GPU, CPU, and other components.
- Use Cache Allocation Library to replace malloc in your code. This creates low-latency buffers in the cache.
Other System Requirements Not Met and Data Transfer Latency
- Use the Data Streams Optimizer to fine-tune the transfer of data between two processor subsystems that act as a source and destination, such as between PCIe devices and CPU cores. The iterative tuning of the data streams optimizer identifies a configuration that balances real-time performance with system power consumption, or computational resources available for other tasks, to meet your various system requirements.
Real-Time Configuration and Optimization
Accelerate and Automate System Tuning
Data Streams Optimizer
- Automates real-time platform configuration tuning using a command-line tool.
- Addresses specific workload latency between the CPU, memory, and PCIe end points by optimizing power consumption and compute performance.
- Focuses on tuning I/O and processor fabric to enhance the transfer of data between two processor subsystems. This tool identifies the various control points between the entities that could be tuned to meet the requirements, and instructs the BIOS to write specific values to registers for these control points. This enables real-time tuning without changing the application code.
To use the tool, you need to know how data flows through the compute module (that is, through which paths or streams), the size of the payload exchanged between end points, and the maximum tolerable latency for such data exchanges.
Allocate Cache for Real-Time Applications
Cache Configurator
- Uses a command-line tool to discover and manage cache memory resources to add, modify, or delete buffers at varying levels in the cache and memory hierarchy
- Divides the remaining cache resources among various components (such as CPU, GPU, or I/O) without the need to learn the low-level details of the cache architecture
Example Output
The following buffer will be created:
BUFFER 1
LATENCY(ns): 100
CACHE LEVEL: 2
CPU CORE: 3
BUFFER SIZE(bytes): 262144
Allocate Buffers Effectively across Platforms
Cache Allocation Library
APIs contained in this library create buffers that meet specified latency requirements.
To use the library, you need to know latency and size of the dataset that your application processes, as well as the hot spots in your application’s code that are the most latency sensitive.
Benefits include:
- Malloc replacement for reliable low latency
- Target cache misses and other sources of memory access latency
- Simple, familiar API signature
- Abstracts the complexity of cache architecture
- No code changes needed to achieve the same latency on supported Intel processors
Example Function
To create a buffer, specify its size and maximum tolerable latency for access:
/* The example parameters specify a 64-byte buffer and 20-nanosecond latency requirement. */
void *mem = tcc_buffer_malloc(64, 20);
Measurement and Analysis
Check System Readiness for Real-Time Workloads
Real-Time Readiness Checker
Use this diagnostic tool to check real-time BIOS and operating system configuration readiness.
- Verifies whether the system has a supported processor, BIOS, and operating system
- Checks for features that may affect real-time performance, such as Intel® Turbo Boost Technology, Enhanced Intel SpeedStep® technology, and processor power-saving states
- Reports CPU and GPU frequencies
- Operates at the UEFI BIOS or operating system level
Instrument Your Code to Analyze Performance
Measurement Library
Use this lightweight library for instrumenting user space applications to collect latency measurements.
- Measures worst-case execution time (WCET) and other latency statistics in processor clock cycles and time units
- Enables minimal runtime overhead and high measurement precision
- Each measurement adds no more than 610 ns overhead†
- Accurately measures intervals starting from 60 ns*
- Tracks deadline violations
- Stores latency values in a shared memory ring buffer to be processed by an external application
- Uses the Instrumentation and Tracing Technology API (ITT API) to support task visualization and system-wide analysis in tools such as Intel® VTune™ Profiler, which does low-level application performance analysis
- Take advantage of samples to get started, which demonstrate methods for measurement data analysis including latency histograms and deadline monitoring
†Based on specific configurations and workloads.
Example Functions
Use this function at the beginning of the code block you want to analyze:
/* Get the start time of the measured code block from the processor time stamp counter (TSC). The name is a pointer to __itt_string_handle to identify the measurement. */
__itt_task_begin(domain, __itt_null, __itt_null, name);
Use this function at the end of the code block:
/* Get the end time from the TSC and calculate the difference between the start and end times to derive the latency of one iteration. */
__itt_task_end(domain)
Time Synchronization & Communication
Enable Time Synchronization in Network, I/O, and Compute
Time-Aware GPIO and Ethernet Timestamps Samples
The time-aware GPIO sample applications explain the basics of using hardware-assisted time synchronization on GPIO pins and its advantages over normal software-controlled GPIO.
The Ethernet timestamps sample application shows the accuracy of hardware-assisted cross-timestamping between the system and network controller clocks, which allows the application to extend precise time synchronization to other devices on the network beyond the compute node.
Example Output
Compare output period jitter of software-controlled GPIO versus time-aware GPIO. Software GPIO data is represented in blue. TGPIO data is represented in orange. Software GPIO causes higher jitter compared to TGPIO.
Real-Time Communication (RTC) Demonstration
Cache Allocation Library and Time-Sensitive Networking (TSN)
The RTC demonstration with example programs and scripts highlights the benefit of combining the Cache Allocation Library with TSN. These examples offer less jitter in data processing, more stable packet drops, and deterministic time-of-packet arrival for network packet transmissions.
The demonstration showcases two scenarios using the Cache Allocation Library and TSN.
- In the basic scenario, Board A runs a data processing workload, and then sends a network packet to Board B once every 500 µsec.
- In the single-input single-output (SISO) scenario, Board A runs a data processing workload, and then sends a network packet to Board B. Board B receives the network packet, runs a data processing workload, and sends a network packet back to Board A.
Optimized versus Non-Optimized Mode Using the RTC Demonstration
In non-optimized mode, the RTC demonstration performs as follows:
- Does not use Intel TCC Mode in the BIOS, hence the platform is not optimized for real-time tasks
- Uses the standard AF_PACKET to send and receive data over the network, but network traffic competes with Intel TCC packets
- Does not use virtual channels, so data sent over PCI (video, audio, and USB data) competes with the Intel TCC data
- Uses dynamic random-access memory and data processed on the same core evicts Intel TCC data from the cache and increases time to access the data
In optimized mode, the RTC demonstration performs as follows:
- Uses Intel TCC Mode set to ON in the BIOS so the platform is optimized to run real-time tasks
- Uses the AF_XDP packet, optimized for packet processing time allowing packets to travel to memory faster
- Uses virtual channels to pass network data to memory and the processor, so it does not affect the Intel TCC data processing
- Uses the cache pseudo locking feature to keep data in cache with a predictable access time
Operating System
Host System Specifications:
- Ubuntu* 20.04 LTS
Target System Specifications:
- Yocto Project* Linux - full support for target
- Windows® 10 - limited support, data streams optimizer only on target system. For more details, see Intel TCC Tools documentation.
- Ubuntu* and Debian* via integration of Intel TCC Tools with Intel® Edge Controls for Industrial (Intel® ECI), a related reference platform.
Analysis Profiling Tools
Intel VTune Profiler (optional)
Reference Platform Integration
Intel® Edge Controls for Industrial (Intel® ECI) is a prevalidated reference platform that integrates real-time compute, standards-based connectivity, safety, and IT-like management. Intel® ECI uses Debian or Ubuntu-based images, plus it includes support for Intel TCC Technology.
Target System Specifications
Intel® Xeon® D-2700 and D-1700 Series Processors
(Formerly code named Ice Lake D)
Industrial Processors Recommended for Real-Time Applications
- Intel® Xeon® D-2752TER Processor
- Intel® Xeon® D-1746TER Processor
- Intel® Xeon® D-1735TR Processor
- Intel® Xeon® D-1715TER Processor
- Intel® Xeon® D-1712TR Processor
Common Real-Time Use Cases
- Process Automation (factory automation)
- Industrial Automation
- Robotics Control
Currently Available Boards
Intel® Xeon® D-2700T and D-1700T Series processors reference validation platform.
To get this hardware, contact your Intel representative.
Intel® Xeon® W-11000E Series Processors
(Formerly code named Tiger Lake H)
Industrial Processors Recommended for Real-Time Applications
- Intel® Xeon® W-11865MRE Processor
- Intel® Xeon® W-11865MLE Processor
- Intel® Xeon® W-11555MRE Processor
- Intel® Xeon® W-11555MLE Processor
- Intel® Xeon® W-11155MRE Processor
- Intel® Xeon® W-11155MLE Processor
Common Real-Time Use Cases
- Process Automation (factory automation)
- Industrial Automation
- Robotics Control
Currently Available Boards
Intel® Xeon® W-11000E Series processors customer reference platform.
To get this hardware, contact your Intel representative.
12th Generation Intel® Core™ Processors
(Formerly code named Alder Lake S)
Industrial Processors Recommended for Real-Time Applications
- 12th Generation Intel® Core™ i9-12900E Processor
- 12th Generation Intel® Core™ i7-12700E Processor
- 12th Generation Intel® Core™ i5-12500E Processor
- 12th Generation Intel® Core™ i3-12100E Processor
Common Real-Time Use Cases
- Industrial PC
- Motion Control
- Robotics
- Vision
- Workload Consolidation
- Human Machine Interface (HMI)
- Intelligent Gateways
- Energy Substations
Currently Available Boards
12th generation Intel® Core™ processor reference validation platform paired with a R680E Platform Controller Hub.
To get this hardware, contact your Intel representative.
11th Generation Intel® Core™ Processors
(Formerly code named Tiger Lake UP3)
Industrial Processors Recommended for Real-Time Applications
- 11th Generation Intel® Core™ i7-1185GRE Processor
- 11th Generation Intel® Core™ i5-1145GRE Processor
- 11th Generation Intel® Core™ i3-1115GRE Processor
Common Real-Time Use Cases
- Industrial PC
- Motion Control
- Robotics
- Vision
- Workload Consolidation
- Human Machine Interface (HMI)
- Intelligent Gateways
- Energy Substations
Currently Available Boards
11th Generation Intel® Core™ processors DDR4 reference validation board.
To get this hardware, contact your Intel representative.
Intel Atom® x6000E Series Processors
(Formerly code named Elkhart Lake)
Industrial Processors Recommended for Real-Time Applications
- Intel Atom® x6212RE Processor
- Intel Atom® x6414RE Processor
- Intel Atom® x6425RE Processor
- Intel Atom® x6427FE Processor
- Intel Atom® x6200FE Processor
Common Real-Time Use Cases
- Industrial PC
- Motion Control
- Robotics
- Vision
- Workload Consolidation
- HMI
Currently Available Boards
Intel Atom® x6000E Series processors customer reference platform.
To get this hardware, contact your Intel representative.