Data Streams Optimizer for Real-Time Systems

Discover how this feature of Intel® Time Coordinated Computing Tools (Intel® TCC Tools) optimizes the time required to transfer data between two processor subsystems while taking power consumption into account.

Key Takeaways

  • The data streams optimizer enables developers to enhance data transfer between two processor subsystems.

  • The data streams optimizer increases or decreases power management levels to satisfy stringent or relaxed real-time requirements.

  • The data streams optimizer provides developers with tuning technologies that have never been publicly available before.

  • Less experienced developers can use the data streams optimizer to complete advanced tuning without additional training.

  • The data streams optimizer impacts applications susceptible to latency exceeding the threshold for data movement.

author-image

By

Fine-Tune and Balance Performance and Power for Real-Time Applications

Some of today’s technology use cases demand real-time applications with extremely strict time requirements, including meeting hard-real-time requirements to enable real-time computing. Real-time computing is when hardware and software systems must operate within defined rules and deadlines. Real-time programs must guarantee a response within the deadline. Fine-grained tuning may be needed to meet these real-time requirements. 

To help developers achieve the required real-time computing tuning, Intel offers hardware processors and enabling software optimized for real-time applications. For use cases that require deep, fine-grained tuning, Intel also offers Intel® TCC Tools, a set of features that augments the compute performance to address stringent temporal requirements of real-time applications as well as the need to balance power and performance for latency-sensitive applications. The capabilities offered by Intel® TCC Tools have never been available publicly until now.

Two features included in Intel® TCC Tools provide tuning capabilities that optimize the system for real-time applications: cache allocation (via the cache allocation library and/or cache configurator tool) and the data streams optimizer.

Learn more about cache allocation here.

Continue reading to discover more about the data streams optimizer.

Fine-Grained Tuning Using the Data Streams Optimizer

The data streams optimizer feature is designed specifically for extremely fine-grained tuning or balancing power and performance.

Data streams performance improvements from fine-grained tuning can be seen in the range of 20 microseconds to two microseconds for workloads with cycle times of less than 250 microseconds for single stream data movement between a PCIe endpoint and a processor core or memory.

Similar improvements may also be realized for workloads with higher cycle times and multiple data movements between endpoints. Improvements to this extent are only possible for applications with highly tuned software.

If you can’t measure the data stream latency between endpoints, the performance improvements might not be easily observed. If this is the case, it might be necessary to perform a deep inspection of your workloads and intraworkload latencies to determine if the data streams optimizer addressed your issues.

Why Use Data Streams Optimizer?

Before having a tool like the data streams optimizer for granular tuning, the process was extremely complex and time consuming. It required continuous testing of real-time workloads and experimentation with the tuning parameters that affect real-time performance. Additional engagement with internal expertise at Intel were also required for assistance with hidden knobs and proprietary registers.

The data streams optimizer drastically simplifies tuning. It automates the process and applies tuning configurations that adjust visible and hidden tuning parameters in the form of a series of register writes. This enables developers who don’t have extensive experience or expertise to complete tuning without additional help or training.

While controlled by software, the tuning process doesn’t alter any of the software ingredients of the solution, but focuses solely on real-time hardware optimizations. For that reason, the data streams optimizer can be used with new applications as well as legacy (existing) applications—while not requiring changes to the source code.

Why Use Data Streams Optimizer?

Before having a tool like the data streams optimizer for granular tuning, the process was extremely complex and time consuming. It required continuous testing of real-time workloads and experimentation with the tuning parameters that affect real-time performance. Additional engagement with internal expertise at Intel were also required for assistance with hidden knobs and proprietary registers.

The data streams optimizer drastically simplifies tuning. It automates the process and applies tuning configurations that adjust visible and hidden tuning parameters in the form of a series of register writes. This enables developers who don’t have extensive experience or expertise to complete tuning without additional help or training.

While controlled by software, the tuning process doesn’t alter any of the software ingredients of the solution, but focuses solely on real-time hardware optimizations. For that reason, the data streams optimizer can be used with new applications as well as legacy (existing) applications—while not requiring changes to the source code.

Intel® TCC Tools Data Streams Optimizer: Improve Latency of Data Transfers Between Systems

The Intel® TCC Tools data streams optimizer is a powerful tool that helps ensure systems are optimally configured for specific real-time use cases. It’s a command-line tool that configures I/O and processor fabric settings to optimize the time it takes to transfer data between two processor subsystems.

Figure 1: Data Streams Optimizer Supported Data Streams
Figure 1: Data Streams Optimizer Supported Data Streams

 

When using the data streams optimizer, fabric is defined as the interconnect technology that carries on-chip communications between the different functional components of the processor. Data streams is defined as the transfer of data. Processor subsystems can include memory, processor cores, and PCIe endpoints.

To accomplish this, the data streams optimizer automates the tuning process and applies tuning configurations that adjust visible and hidden tuning knobs in the form of a series of register writes. The data streams optimizer tuning configurations also try to find a balance between real-time performance and power management. It can increase system power consumption to meet the most stringent real-time requirements or decrease it where requirements are more relaxed.

Results of using the data streams optimizer are most visible on applications that are susceptible to latency exceeding the allowed threshold for data movement.

When to Use Data Streams Optimizer

Intel divides tuning into four categories, some of which don’t require the extremely granular tuning provided by the data streams optimizer:

  • System software tuning
  • Power consumption tuning
  • Intel TCC Tools features tuning
  • Fabric tuning

Each tuning type impacts latency in different ways. Use cases with more relaxed requirements need less tuning to meet their latency requirements, while use cases with stricter requirements necessitate substantial effort in tuning.

Examine the diagram to learn how different tuning categories help decrease the worst-case latency.

Figure 2: Intel® TCC Tools Tuning Strategy

When tuning the platform, address higher-impact tuning first (system software tuning), then work down to lower-impact tuning (fabric tuning) until the requirements are satisfied. Types of tuning on the right on the diagram are more complex and experience higher penalties to concurrent workloads than those on the left.

Use the following chart to ensure you choose the correct Intel® TCC Tools feature for tuning.

Tuning Category Intel® TCC Tools Feature
System software tuning Board support package (BSP) with real-time optimizations
Power management tuning

Out-of-box tuning: BSP with real-time optimizations, Intel® TCC Mode in BIOS

Advanced tuning: Data streams optimizer

Intel® TCC Tools features tuning

Out-of-the-box tuning: BSP with real-time optimizations, Intel® TCC Mode in BIOS

Advanced tuning: Data streams optimizer

Fabric tuning Data streams optimizer

Finding a Balance Between Real-Time Performance and Power Consumption

It’s always a challenge to meet workload-specific, real-time performance requirements without overprovisioning the best-effort capabilities and power consumption of a system.

The data stream optimizer addresses this problem with a three-level-platform fine-tuning strategy that systematically reduces worst-case execution time using an iterative process of elimination. It accomplishes this by eliminating the highest source of jitter then validating whether those optimizations were sufficient to meet workload requirements. The data streams optimizer repeats the process until success or failure occurs. In this instance, failure suggests that the hard limits of the processor have been exceeded.

Supported Data Streams

The data streams optimizer tunes the platform to meet specified requirements for the following data streams.

Get more information on data streams

Data Stream Description
PCIe from Memory (Reads) PCIe device reading data from a memory buffer.
PCIe to Memory (Writes) PCIe device writing data to a memory buffer.
Core from PCIe (MMIO Reads) Processor core reading data from a PCIe endpoint.
Core from PCIe (PCIe MSI) Processor core responding to a message signaled interrupt (MSI) generated by a PCIe endpoint.
Core to PCIe (MMIO Writes) Processor core writing data to a Memory-Mapped I/O (MMIO) space region on a PCIe device.

 

Below is an example of “core-from-PCIe stream” requirements file.

Figure 3: Example of Requirements File

Run a Data Streams Optimization Sample Demo

Follow these steps to experience the data streams optimizer and see firsthand the benefit of fine-grained tuning.

  1. Read about the data streams optimizer and developer workflow overview for an introduction to important concepts.
  2. Read the scenario for the sample demo.
  3. Begin the MRL setup by configuring the hardware and making sure the real-time kernel is running on the target.
  4. Run the MRL workload on the untuned system to get the baseline latency measurement.
  5. Start the preproduction phase and generate a tuning configuration. Walk through the data streams optimizer preproduction steps. The tool tunes the system and shows a performance improvement.
  6. Enter the production phase and apply tuning configuration. Walk through the data streams optimizer production steps.

Figure 4: DSO Pre-production Tuning in Progress

Figure 5: Example of tuning configuration failed workload validation

Figure 6: DSO running different tuning configuration file

Figure 7: Example of tuning configuration passed workload validation

Discover All the Intel® TCC Tools Features

Intel® TCC Tools includes another tuning capability that optimizes the system for real-time applications—cache allocation tools via the cache allocation library and cache configurator. Cache allocation helps reduce hotspots—areas in your real-time application’s code that are the most latency sensitive—by optimizing or reducing access time to memory objects with high amounts of misses.

Intel® TCC Tools also offers auxiliary capabilities so you can check the configuration of your real-time system, understand bottlenecks in your code, or learn about time synchronization techniques. Auxiliary capabilities include:

  • Measurement library: A set of C APIs that help analyze different aspects of your application’s performance and identify bottlenecks.
  • Real-time readiness checker: A command-line tool that checks the many attributes that may affect real-time performance, such as processor model, BIOS version, BIOS settings, and other dependencies.
  • Time synchronization sample applications:
    • ​Time-aware general-purpose input/output (GPIO) sample applications: Applications that explore the basics of using hardware-assisted time synchronization on GPIO pins and its advantages over normal software-controlled GPIO.
    • Ethernet timestamps sample application: Applications that show the accuracy of hardware-assisted cross-timestamping between the system and network controller clocks, which allows the application to extend precise time synchronization to other devices on the network beyond the compute node.
    • Real-time communication demo: A set of example programs and scripts that demonstrate the benefit of combining the cache allocation library with Time-Sensitive Networking (TSN).