Developer Guide

  • 2021.3
  • 11/18/2021
  • Public

Step 2: Run MRL on Untuned System

In this step, you will run the MRL workload validation script on the untuned system that you set up in Step 1: MRL Setup. As described in the scenario, Intel® TCC Mode is enabled in BIOS. This step will produce a baseline latency measurement of how the system performs with Intel® TCC Mode enabled, which you will later compare with the latency measurement gathered after tuning the system with the data streams optimizer.
The output examples shown here are for illustration only. Your output may vary.
To run MRL on an untuned system:
  1. From your host system, connect to the target board via SSH if it is not already connected. Replace
    with the username (typically it is
    for customer reference boards). Replace
    with the IP address or hostname of the target board.
    ssh <user>@<target>
  2. In the SSH session
    , make sure that Intel® TCC Tools are installed on the target board:
    ls /usr/share/tcc_tools/ -la
    If the directory is empty, Intel® TCC Tools are not installed. Complete the installation instructions in the Get Started Guide for UEFI BIOS or Get Started Guide for Slim Bootloader.
  3. [Target board]
    Get the PCI device name:
    The MRL workload validation script supports integrated TSN Ethernet controllers and Intel® Ethernet Controller I225 devices. To check which device you have, run:
    lspci | grep -E 'Ethernet controller: Intel Corporation'
    Output example:
    aa:00.0 Ethernet controller: Intel Corporation Device 15f2 (rev 03)
    If you see
    in the output, the device is an Intel® Ethernet Controller I225. If you see
    in the output, the device is an integrated TSN Ethernet controller.
  4. [Target board]
    Run the workload validation script, using the following command. Replace
    in the
    argument. For example,
    . Make sure to delete the angle brackets (< >) too.
    python3 /usr/share/tcc_tools/tools/demo/workloads/bin/ --latency_us 9 --device <device_name> --iterations 10000000 --core 3
  5. Leave the rest of the arguments as is. For reference, the following table describes the arguments:
    --latency_us 9
    Required. Maximum latency requirement in microseconds (us) to be verified.
    --iterations 10000000
    Optional. Number of iterations.
    --core 3
    Optional. Processor core on which the sample will run.
  6. Note that in the workload validation script command,
    is the maximum latency requirement for MRL latency. It is set to 9 microseconds, which aligns to the scenario. If the actual latency exceeds this requirement, the validation is considered a fail.
    The workload validation script is an example of how the sample can be used in the data streams optimizer workflow. To prepare a validation script for your workload, see Create a Workload Validation Script.
  7. Confirm that you see output similar to the example below.
    Enabling userspace access to performance counters Removing stmmac_pci Memory regions: ['6001360000', '600136f000'] Using memory region 0 with address 6001360000 Start validation Running test ... Done. Test is complete! Results saved in data_mmio_read_latency_us.csv data_mmio_read_latency_ticks.csv data_avg_inst_count.csv Validation stopped Restoring stmmac_pci Validation is finished. Please wait for results processing. Validation information: device: TSN address: 6001360000 core: 3 iterations: 10000000 processor: |p_ehl| Latency must be less than 9.0 us. Statistics: |Min |Max |Avg |Median ---------------------------------------------------------------- Microseconds|0.741 |9.997 |1.003 |0.983 ================================================================ Deadline |Iterations |Passed |Failed --------------------------------------------------- 9.0 us |10000000 |9997088 |2912 =================================================== Failed: at least one iteration failed.
    The first table displays:
    • Min: Minimum latency (execution time of the fastest iteration)
    • Max: Maximum latency (execution time of the slowest iteration)
    • Avg: Average latency
    • Median: Median latency
    The second table displays:
    • Deadline: Maximum tolerable latency for each iteration
    • Iterations: Total number of iterations
    • Passed: Number of iterations that met the deadline
    • Failed: Number of iterations that exceeded the deadline
    The script compares the maximum latency measurement to the deadline. In this example, the maximum latency measurement of
    9.997 µs is higher than the requirement of 9 µs, and the validation is considered a fail. By observing the number of failed and passed iterations, you can understand how critical the maximum latency measurement is for your deadline. The values may be different on your system.
    Although the Intel® TCC Mode settings in BIOS already disable several power management features to optimize MMIO read latency, the data streams optimizer can disable additional fabric power management to meet this latency requirement of 9 µs. Intel® TCC Mode enabled achieved higher latency and lower power consumption compared to the data streams optimizer. In a real-world use case, you can perform additional analysis outside of Intel® TCC Tools to determine if your system requirements, like power consumption, are met.
  8. Make a note of the maximum latency measurement so you can compare it to the measurement that you will get after tuning the system.
  9. Exit the connection to the target board.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at