AN 944: Thermal Modeling for Intel® Agilex™ FPGAs with the Intel® FPGA Power and Thermal Calculator

ID 683810
Date 3/29/2021
Public

7.2. Thermal Design Optimization

After you have captured your design in the Intel® FPGA Power and Thermal Calculator (PTC), it is good practice to evaluate whether any thermal optimization is possible to make the cooling easier. Improved cooling can be achieved by reducing overall power consumption, or by reducing the design's maximum ΨJC value.

Power Reduction

There are two types of power consumed in an FPGA: static power and dynamic power.

  • Static power is the power that the configured device consumes when powered up but with no user clocks operating. Static power is mainly a function of die temperature. For Intel® Agilex™ devices, this excludes DC bias power of analog blocks, such as I/O and transceiver analog circuitry.

    Reducing junction temperatures can save power. For example, if a given design has a total static power of 14.6 watts when the maximum TJ is 95°C, and you decrease the maximum TJ, the static power also decreases, with no change to the operation of the device. However, reducing the maximum TJ requires additional cooling, such as a reduction to the ambient temperature, increased airflow, or the use of a larger heat sink. You should always consider methods of reducing static power consumption—especially when evaluating operating costs for large data centers or central offices.

  • Dynamic power is the additional power consumed due to signal activity or toggling. For example, if you reduce the number of half ALMs or flip flops in the core die, or the clock frequency or toggle rate, the dynamic power goes down. Such action may not always be possible, but you should consider it, especially for dies that seem to be the limiting factor in the cooling system.

Transceiver Channel Spreading

Intel® Agilex™ devices have transceiver dies with either 16 channels or 24 channels. If the channel selection can be done manually then it is possible to reduce the power dissipation by physically spreading the channels on the tile. Simply put, fewer contiguous channels use less power, have lower local power density (power/area), and are easier to cool.

For example. the table below shows an E-Tile with 14 channels but different placement. As shown, E-Tile 3 has the most widespread placement and therefore the lowest ΨJC, which translates to operating at almost 8°C lower temperature in a 100 watt device, under the same cooling condition as the other two.

Table 6.  Transceiver Channel Placement Impact on Thermal Behavior
XCVR Die ID Starting Channel Location Number of Channels Op Mode Data Rate Thermal Power ΨJC °C/W Temperature Difference for a 100 watt device
E-Tile 1 5 14 TX 28000 9.03 0.334 7.7
E-Tile 2 0 14 TX 28000 9 0.325 6.8
E-Tile 3 0 2 TX 28000 8.871 0.257  
3 2 TX 28000
6 2 TX 28000
9 2 TX 28000
12 2 TX 28000
15 2 TX 28000
18 2 TX 28000

It may not always be possible to optimize channel placement, due to design constraints and requirements; however, you should consider doing so whenever possible.