1. Intel Stratix 10 Thermal Modeling and Management with the Early Power Estimator
1.1. List of Abbreviations
Table 1. Definition of terms
Computational Fluid Dynamic, a numerical analysis
method for solving the conjugated heat transfer problems.
Compact Thermal Model, a geometric model that is
used as an input to CFD tool.
Early Power Estimator, a tool that estimates the
power consumption of the FPGA device.
Field Programmable Gate Array
High Bandwidth Memory
Integrated Heat Spreader - case of an
Stratix® 10 FPGA.
Multi-Chip Module - an integrated circuit (IC)
with more than one die.
Single Chip Module
Integrated Heat Spreader or Case Temperature. The
case temperature of a component is measured with an attached
heat sink. This temperature is measured at the top geometric center
of the package case/die.
Thermal Design Power, the power dissipated in a
die that is used for thermal analysis purposes.
Ambient Temperature, measured locally surrounding
the FPGA. The ambient temperature should be measured just upstream
of a passive heat sink or at the fan inlet for an active
Core Fabric Die Temperature
Maximum Junction Temperature, a maximum
allowable absolute temperature rating of the device or a targeted
Total Thermal Design Power, the power dissipated
in the device that is used for thermal analysis purposes.
Thermal Interface Material
Temperature Sensor Diode
Stratix® 10 device has a Multi-Chip Module (MCM) structure. It can contain between two and nine dies. One die is always the main FPGA core fabric die, and there can be between one and six transceiver dies, and up to two High Bandwidth Memory (HBM) dies. Due to complex construction and no uniform power density in some of the dies, the thermal engineering of an
Stratix® 10 device requires a specific process and familiarity with the following:
1.3. Intel Stratix 10 Early Power Estimator Tool (EPE)
The Early Power Estimator (EPE) is a tool that estimates the power
consumption of an FPGA device early in the design process. It allows you to enter and select
the relevant information for a specific FPGA design and obtain the power and the relevant
thermal design information for electrical and thermal design purposes. The data provided to
the EPE is divided in two categories, general and thermal. Both inputs affect the overall
power dissipation of each die and the thermal characteristic of the package to be used for
system thermal modeling. Below are the necessary inputs provided to the EPE.
FPGA package size
FPGA core fabric size and grade
Transceiver type, protocol, grade and placement per transceiver
Utilization of FPGA hardware blocks
Clock rates, toggle rates and frequencies
Ambient air temperature (TA) of the
Maximum allowable junction temperature (TJ-MAX) of any die in the FPGA, at the provided TA
Stratix® 10 FPGA complex is contained in a BGA
package with a copper IHS and it can contain up to three types of dies:
Core Fabric Die or the main FPGA die: This is the
die that contains the basic logic resources and it is provided in different sizes and
grades. Each package can only have a single core fabric die.
Transceiver Die: Transceiver dies are offered in
three types: L-Tile, H-Tile and E-Tile. Packages with E-Tile are always equipped with one
H-Tile. Each transceiver tile type supports certain protocols and transceiver speeds.
Depending on the package size, an
Stratix® 10 device can
support between one and six transceiver dies and each die has 24 transceiver
HBM Die: This die is provided in two
configurations, 4 high or 8 high, which refers to the number of memory die stacks in each
HBM. Not all
Stratix® 10 packages
and the ones that do, can have either one or two HBMs.
1.5. Physical Package Structure
Figure 1. Physical Package StructureThis is a typical package structure relevant to thermal
analysis and as laid out in the compact thermal models. This package only
shows the core fabric die and transceiver dies.
Stratix® 10 FPGA thermal parameters do not contain the traditional θJC and
θJB values due to its MCM construction. Therefore, you cannot
use the two resistor models for the thermal modeling of the package.
Intel® offers a Compact Thermal Model (CTM) which
will be discussed in the next section. You will need a combination of CTM and the following
thermal parameters for the thermal engineering of an
Table 2. Thermal Design Parameters
Ambient temperature, measured locally surrounding the FPGA.
Measure the ambient temperature just upstream of a passive heat sink or at the fan
inlet for an active heat sink. This value affects the junction temperature of the main
FPGA core fabric die and its power dissipation.
TJ-MAX is the maximum junction
temperature value that the design allows for the given TA.
For example, a design may allow the device maximum rated junction temperature at its
maximum TA, but for a lower ambient temperature, the
junction temperature requirements may be lower than the maximum rated value. These two
cases require two sets of thermal entries to the EPE tool to determine the design
Core fabric die temperature. EPE tool evaluates the thermal
design parameters over a range of TCORE values.
EPE tool reports the power dissipation of each die
Total Thermal Design Power is the total power dissipation of the
device, EPE tool reports TTDP for each main FPGA core fabric die temperature.
ΨJC is the thermal resistance between
each of the dies in the package and the center of the package IHS. An MCM like the
Stratix® 10 device will have as many ΨJC values as the number of dies in the package. For example,
Stratix® 10 contains five dies, there will be five
ΨJC values reported by the EPE tool. However, the focus
of thermal design is always on the die with maximum ΨJC and
that is the one that is used for calculating the TJ-MAX.
The ΨJC value is calculated from the following
ΨJC = (TJ - TCASE) / TTDP
ΨCA is the other thermal resistance
value reported by the EPE tool. It is the thermal resistance between the center of the
package IHS and ambient temperature. ΨCA can be used as a
figure of merit in assessment of the required cooling solution for a design. For
example, the lower the ΨCA value, the more aggressive
cooling solution is needed. The value of ΨCA is calculated
from the following equation:
ΨCA = (TCASE - TA) / TTDP
Integrated heat spreader or case temperature is the temperature
at the top center of IHS. If the cooling solution maintains a TCASE equal to the TCASE value reported by the
EPE tool, then the TJ-MAX value will be same as entered in
the tool. A higher TCASE points to a higher TJ than TJ-MAX. Therefore, the
goal of the cooling design should be to keep the TCASE at
or below the value reported by the EPE tool.
Figure 3. Individual Die Thermal Resistance to the Top of IHS
Figure 4. Thermal ResistanceThe diagram shows the thermal resistance from each die to the IHS top
surface and also to the air.
1.7. Intel Stratix 10 Compact Thermal Model (CTM)
Stratix® 10 FPGA thermal analysis requires the use of its CTMs
in a Computational Fluid Dynamic (CFD) tool. The results of the CFD analysis are
only valid to determine the core fabric power and IHS temperature. These values are
used to determine the junction temperature of all the dies.
This methodology is used because the construction of the CTM does not capture
the details of transceiver channel placements; therefore, it cannot be used to
predict the correct junction temperature of a transceiver die. The transceiver
junction temperature is calculated using the total power dissipation, IHS
temperature and thermal resistance of each die which will be covered in later
Stratix® 10 CTMs are offered in the following
Icepak* from ANSYS
Flotherm* from Mentor Graphics
6SigmaET* from Future Facilities
Thermal Analysis* from SolidWorks
Please contact your support representative to obtain the CTM models.
1.8. Intel Stratix 10 Temperature Sensing Diodes (TSD)
Each die in an
Stratix® 10 FPGA device contains a
Temperature Sensing Diode (TSD).
Intel® provides a
Temperature Sensor IP core to obtain the temperature of each die. However, with flexibility in
Stratix® 10 devices, the location of hot spots on the transceiver
die may vary based on your application, and it may not always be in the same location as the
temperature sensor. Therefore, a temperature sensor may not report the actual temperature of
the hot spot. The EPE calculates the offset values for each transceiver die and reports them
worksheet. Addition of these values to the temperatures reported by the appropriate TSDs
results in the correct values for the maximum junction temperature of each die. The accuracy
of the TSD is
5°C. Therefore, you may need to adjust the TJ-MAX for some designs to ensure the threshold temperature is never crossed.
1.9. Intel Stratix 10 Thermal Design Process
The Stratix 10 FPGA
thermal design process consists of the steps shown below:
Stratix® 10 Thermal
Supply Design Information to EPE
This is the first step in the thermal design process of
Stratix® 10 device that provides the tool with the
necessary data to estimate the power dissipation of each die. The inputs
include the FPGA design information as well as the thermal design
requirements of TA and TJ-MAX and power margin selection.
Obtain Thermal Design Parameters
The EPE tool provides the thermal design parameters. The
power dissipation of the transceiver die is provided as a constant
value, but the main core die power dissipation is provided as a function
of its junction temperature and it should be used accordingly in the CFD
Obtain the applicable CTM for the CFD analysis. Each
CTM is provided with the maximum number of dies possible in a package.
Unused dies can be ignored and left in the model without affecting the
Run CFD Analysis
Model the system in the CFD tool and apply all the
applicable power values to the corresponding dies. The
transceiver and HBM die temperatures cannot be predicted by the CFD and
are calculated manually.
Temperatures and ΨCA
Junction temperatures of all the dies and ΨCA of the cooling solution are calculated
using the following equations:
You can verify the CFD modeling results
by comparing the above calculated ΨCA with
the value provided by the EPE tool for the corresponding TTDP. If the
two values are the same, then the calculated TJ = TJ-MAX.
1.10. Early Power Estimator (EPE)
Note: To speed the EPE calculation
Intel® suggests to assign all the design
information in the EPE before activating the Thermal worksheet.
Use the EPE tool to estimate the power dissipation of the dies in an
Stratix® 10 FPGA. An Excel spreadsheet
provides the interface to the tool, and it contains multiple worksheets, each applicable
to a part of the design. The EPE tool calculates thermal design parameters that are
unique to each design. To activate the Thermal worksheet of the
EPE, the following parameters need to be modified in the Main
worksheet of the EPE:
Set the Power Characteristics
Set the Junction Temp Mode to
Detailed Thermal Model.
Figure 6. Main Worksheet of the EPE
This activates the Thermal worksheet of the EPE
tool, and as a result, any changes made to the EPE affect the values in this worksheet.
To obtain the correct thermal values for the analysis, you must enter all the necessary
design information and settings in the subsequent worksheets of the EPE.
Selecting the device, package, and transceiver in the
Main worksheet of EPE will enable selection of appropriate
transceiver and HBM die types and counts in XCVR
and HBM worksheets. In the XCVR worksheet you must specify placement of each
transceiver in the exact tile and channel location (0-23) to be used in the design. This
is necessary to obtain the correct power and thermal parameters. Similarly, in the
HBM worksheet you must select the correct HBM and channel
numbers (0-7) for your application.
For an example of transceiver placement refer to Figure 7 showing an
Stratix® 10 device with 4 H-Tiles configured to use 54
transceiver channels placed in specific channel locations.
Figure 7. Transceiver Channel Placement On The Dies
Figure 8. Transceiver Channel Placement in the EPE
After you have entered all the design data and activated the
Thermal worksheet, set the proper thermal variables in the
Thermal worksheet .
Apply Recommended Margin:
Intel® recommends that you turn on
the recommended margin to ensure sufficient cooling and account for approximations
in power modeling.
Ambient Temp, TA(°C): Temperature of the air or other coolant that
flows over the heat sink.
Max. Junction Temp, TJ-MAX(°C): Allowed maximum temperature of any die in the
package, regardless of its type. The Max TJ setting can
be set to any value that a design requires below the max rating of device.
Figure 9. Thermal Settings
Once you have entered the thermal settings, the EPE updates the power
dissipation of all dies based on the required thermal solution. For example, if the
maximum allowed junction temperature is 95°C, the EPE calculates a cooling solution that
satisfies this requirement. That
at least one die is operating at 95°C, while other dies are operating at lower
temperatures due to their lower power consumption or power density.
Figure 10. Power Dissipations and Thermal ParametersThis figure shows the section of the
Thermal worksheet containing the thermal parameters of
The EPE also provides a solution table which consists of three rows and depicts three
sets of solutions. The middle row (Operating Point) is the same as the above solution,
and the other two rows represent solutions that are 5°C above and below the core
operating temperature resulting from the design. Using this table, you can create the
temperature dependent core die power curve which is used in the CFD modeling.
Figure 11. Solution Table
1.11. Transceiver Channel Spreading
Reducing the thermal resistances of the package in each design improves the
efficiency of the cooling system. One way to achieve this is by spreading out the transceiver
channels or use an extra transceiver tile to reduce the power density of a transceiver die.
Targeted spreading can reduce ΨJC and increase ΨCA, thereby reducing the cooling requirement.
Stratix® 10 FPGA thermal design parameters are
unique for every project. Thermal design parameters are mainly determined by the power , local
power density and power ratio of dies. For this, any changes to the design require design
information to be updated accordingly in the EPE so that all the thermal parameters are
1.13. Intel Stratix 10 Thermal Design Example
This section uses an example to demonstrate the necessary steps for the thermal analysis of an
Stratix® 10 device.
Design Statement: Design a forced convection cooling system for an
Stratix® 10 device as shown in Table 3 and the specified
thermal requirements as shown in Table 4. Transceiver
channel placement and HBM data are shown in Figure 12 and Figure 13. The
core functionality and other activities are set such that the core die reaches a
typical power for the
Stratix® 10 FPGA.
Table 3. FPGA Designation
Number of Transceiver
Number of HBM
Maximum Ambient Temperature , °C
Maximum Allowed Junction Temperature, °C
Table 4. Thermal Requirements
Maximum Ambient Temperature , °C
Maximum Allowed Junction Temperature, °C
Figure 12. Transceiver Placement
Figure 13. HBM Selections
Main worksheet power values are associated with a
function and not necessarily dissipated in the die providing the function. So the
Thermal worksheet may show a different value for HBM than
the Main worksheet. For thermal analysis, always use the
power values in the Thermal worksheet.
Figure 14. EPE Main
After entering all the design data into the EPE and activating the
the following two tables are updated with all the thermal design parameters. For
example in this design the case temperature should be kept below 84 °C and the
of any die is 0.067 °C/W.
Figure 15. EPE Thermal
Figure 16. EPE Thermal
1.13.1. CFD Analysis Setup
The next step in the thermal analysis process is to create a CFD model of the
system using the required CTM as shown in row
In the CFD
the power dissipation of transceivers and HBMs are set as
and the power dissipation of the core as a
value from the first two columns of the solution table
Figure 17. Die Power Assignment for the CFD ModelThis figure shows the FPGA power dissipation assignment to be used in the CFD
The CFD set up for this example is shown below. FPGA is set in 120 x 35 mm duct
with an airflow of 21 CFM. The extruded aluminum heat sink dimensions are: 100 x 100
x 30 mm 40
Air temperature entering the duct is 35 °C.
The CFD analysis provides the
Stratix® 10 case and die
Stratix® 10case temperature profile shown below indicates a maximum temperature of 83.6 °C
which is less
than the 84 °C required by the EPE. This means that the maximum junction temperature
will also be less than the design limit of 95 °C.
Figure 19. Case
Temperature Profile from CFD
Stratix® 10die temperature profile shown below is only valid for the core fabric die
temperature and not the transceiver die temperatures.
Figure 20. Die Temperature Profile from CFD Analysis
die temperatures manually as
Determine the maximum core fabric temperature calculated by CFD
°C from the "Die Temperature Profile from CFD Analysis" figure above). This value is
the core die operating temperature or FPGA Core Junction
the "EPE Thermal Worksheet Solution Table" and FPGA Core
Junction Temperature (90.26 °C), linearly interpolate Overall Total Power (TTDP) and ΨJC.
FPGA Core Junction Temperature (°C)
Calculate the junction temperature (TJ)
using the following equation:
TJ = TCASE + TTDP
case temperature of 83.6 °C from the "Case Temperature Profile from CFD
Analysis" figure above
TTDP = 148 W from
ΨJC = for TJ_max,
use the highest
ΨJC value of any die which is 0.069
transceiver die from
Notice that the
calculated HSSI_2_0 junction temperature (TJ_max) is almost 7
°C higher than temperature calculated by CFD for this die. This is because CFD uses
uniform power dissipation for the transceiver dies and, therefore, cannot calculate the
Other junction temperatures can be calculated in the
In some designs, it might be possible to further reduce the transceiver
temperatures by spreading the channels to reduce the power density. For example, in
the above design the HSSI_2_0 transceiver has 8 high speed transceiver channels that
are laid out in half of the die. The effect of spreading these channels to all the
die area can be shown in the EPE by the following transceiver placement.
The new placement relaxes the cooling requirement from a ΨCA of 0.332 to 0.342 °C/W and now the core fabric die has the highest
ΨJC. Repeating the CFD analysis using the original
cooling solution with the new power dissipations results in the following IHS
Calculating the new junction temperatures with the updated power values and CFD
HSSI_2_0 die temperature: TJ = 84
= 90.3 °C
This example demonstrates that the channel spreading could reduce the cooling
requirement or result in lower junction temperatures for the same cooling solution.
1.13.4. TSD offset Assessment for the Example
As indicated previously the temperature sensors are not always in the exact position
of the hot spots on the transceivers and depending on the transceiver placement, the
EPE calculates the offset value which needs to be added to the field reading.
The transceiver TSDs in the first example should report the following values:
TSD_HSSI_2_0 = 85.5 °C
TSD_HSSI_0_0 = 85.5 °C
TSD_HSSI_2_1 = 81.2 °C
TSD_HSSI_0_1 = 80 °C
Adding the offset values to these numbers provide the actual temperatures shown
TJ _HSSI_2_0= 85.5+8 = 93.8 °C
TJ _HSSI_0_0= 85.5+5 = 90.5 °C
TJ _HSSI_2_1=81.2+7 = 88.2 °C
TJ _HSSI_0_1=80+10 = 90 °C
Note: The TSDs have an accuracy of ±5 °C;
therefore, the reported temperature can be off by 5 °C. In order not to exceed the
operating temperature, Intel
recommends building a 5 °C margin to the thermal design.
1.14. Document Revision History for AN 787: Intel Stratix 10 Thermal Modeling and Management with the Early Power Estimator
Modified document title to
Stratix® 10 Thermal Modeling and Management with the Early Power Estimator.
In Section 1.2, Introduction, changed an
Stratix® 10 device "can contain between two and seven dies" to "can contain between two and nine dies"
Updated to account for changes in the latest EPE with HBM and E-Tile updates
Added methodology to use the Thermal worksheet of the EPE tool