Intel® Hyperflex™ Architecture High-Performance Design Handbook

ID 683353
Date 10/04/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

2.4.1.2. Time Domain Multiplexing

Time domain multiplexing increases circuit throughput by using multiple threads of computation. This technique is also known as C-slow retiming, or multithreading.

Time domain multiplexing replaces each register in a circuit with a set of C registers in series. Each extra copy of registers creates a new computation thread. One computation through the design requires C times as many clock cycles as the original circuit. However, the Compiler can retime the additional registers to improve the fMAX by a factor of C. For example, instead of instantiating two modules running at 400 MHz, you can instantiate one module running at 800 MHz.

The following set of diagrams shows the process of C-slow retiming, beginning with an initial circuit.

Figure 49. C-slow Retiming Starting Point

Edit the RTL design to replace every register, including registers in loops, with a set of C registers, comprising one register per independent thread of computation.

Figure 50. C-slow Retiming Intermediate PointThis example shows replacement of each register with two registers.

Compile the circuit at this point. When the Compiler optimizes the circuit, there is more flexibility to perform retiming with the additional registers.

Figure 51. C-Slow Retiming Ending Point

In addition to replacing every register with a set of registers, you must also multiplex the multiple input data streams into the block, and demultiplex the output streams out of the block. Use time domain multiplexing when a design includes multiple parallel threads, for which a loop limits each thread. The module you optimize must not be sensitive to latency.