Computational Fluid Dynamic, a numerical analysis method for solving the conjugated heat transfer problems.
Compact Thermal Model, a geometric model that is used as an input to CFD tool.
Digital thermal sensor.
Field application engineer.
Early Power Estimator, a tool that estimates the power consumption of the FPGA device.
Field Programmable Gate Array.
High Bandwidth Memory.
Integrated Heat Spreader - case of an
Stratix® 10 FPGA.
Multi-Chip Module - an integrated circuit (IC) with more than one die.
Intel® FPGA Power and Thermal Calculator.
Single Chip Module.
Integrated Heat Spreader or Case Temperature. The case temperature of a component is measured with an attached heat sink. This temperature is measured at the top geometric center of the package case/die.
The total power dissipation of the device. This includes static power, with static power savings subtracted. The PTC reports this value in the Power Summary window.
Thermal Design Power, the power dissipated in a die that is used for thermal analysis purposes.
Ambient Temperature, measured locally surrounding the FPGA. The ambient temperature should be measured just upstream of a passive heat sink or at the fan inlet for an active heat sink.
Core Fabric Die Temperature.
Maximum Junction Temperature, a maximum allowable absolute temperature rating of the device or a targeted value.
Thermal Interface Material.
Temperature Sensor Diode.
Voltage identification code.
Stratix® 10 device has a multi-chip package structure. It can contain between two and nine dies. One or two dies always comprise the main FPGA core fabric, and there can be from one to six transceiver dies, and up to two High Bandwidth Memory (HBM) dies. Due to complex construction and non-uniform power density of the dies, the thermal engineering of an
Stratix® 10 device requires a specific process and familiarity with the following:
Stratix® 10 FPGA comes in a ball grid array (BGA) package with a copper integrated heat spreader (IHS). It can contain up to three types of dies, as follows:
Core fabric die. This is the main FPGA die, which contains the basic logic resources, and is available in various sizes and grades. All
Stratix® 10 devices (except for the 1SG10MH_U1) have a single core fabric die.
Transceiver die. Transceiver dies are offered in four types: L-Tile, H-Tile, E- Tile and P-tile. Packages with E-Tile also have one H-Tile. Each transceiver tile type supports certain protocols and transceiver speeds. Depending on the package size, an
Stratix® 10 device can support up to six transceiver dies. All dies have 24 transceiver channels, except for the P-tile dies which have 16 channels.
HBM die. The HBM die comes in two memory-die stack configurations: 4 high or 8 high. Not all
Stratix® 10 packages have HBM, however those that do can have either one or two HBMs.
3.1. Intel Stratix 10 Physical Package Structure
Figure 1. Physical Package StructureThis is a typical package structure relevant to thermal
analysis and as laid out in the compact thermal models. This package only
shows the core fabric die and transceiver dies.
Stratix® 10 FPGA thermal parameters do not contain the traditional θJC and θJB values, due to its MCM construction. Instead of 2R resistor values, the
Intel® FPGA Power and Thermal Calculator (PTC) provides ΨJC and ΨCA values which are used with MCM packages and are highly design-dependent. Therefore, you cannot use the two resistor models for the thermal modeling of the package.
Intel® offers a compact thermal model (CTM) which is discussed in the following topic. The table below lists the thermal design parameters used in this document.
Table 2. Thermal Design Parameters
Ambient temperature, measured locally surrounding the FPGA. Measure the ambient temperature just upstream of a passive heat sink or at the fan inlet for an active heat sink. This value affects the junction temperature of the main FPGA core fabric die and its power dissipation.
The maximum rated junction temperature of a die, or could be the design goal. For example, a particular die could have a manufacturer's specified TJ-MAX of 100°C, but designers can specify a TJ-MAX of less than 100°C as part of their design requirement.
The junction temperature of a die calculated by the PTC for a condition. The PTC does not report this value directly, but it can be calculated from the information provided.
The PTC reports the power dissipation of each die individually.
Total Thermal Power (TTP)
The total power dissipation of the device. This includes static power, with static power savings subtracted. The PTC reports this value in the Power Summary window.
The thermal resistance between each of the dies in the package and the center of the package integrated heat spreader (IHS). A multi-chip module (MCM) such as the
Stratix® 10 device has as many ΨJC values as the number of dies in the package. The PTC reports the maximum ΨJC value which corresponds to the die with the highest temperature on the device. The ΨJC value is calculated by this equation:
ΨJC = (TJ - TCASE) / TTP
ΨJC values are not constant for a specific package and change as the FPGA resource usage changes.
Refer to the figure Individual Die Thermal Resistance to the Top of IHS, following this table, for an illustration of individual die (PSI_JC).
The thermal resistance between the center of the package IHS and the ambient temperature. You can enter this value into the PTC as a thermal constraint, or it can be reported by the PTC as part of the thermal solution. ΨCA can be used as a figure of merit in assessing the required cooling solution for a design. For example, the lower the ΨCA value, the more aggressive cooling solution is needed. The value of ΨCA is calculated by this equation:
ΨCA = (TCASE - TA) / TTP
The thermal solver in the PTC works on three points in addition to your design. You can specify two of those points — for example, ambient temperature (TA) and maximum junction temperature (TJ-MAX), and the solver determines the third — for example, the thermal resistance from case to ambient, in degrees-per-watt (PSI-CA).
Note: ΨCA values are not constant for a specific package and change as the FPGA resource usage changes. You must recalculate this value for each design.
The temperature at the top center of the IHS. For a design to not exceed its TJ-MAX, the cooling solution must be able to maintain the TCASE temperature at or below the TCASE temperature calculated by the PTC.
Note: TCASE values are not constant for a specific package and change as the FPGA resource usage changes.
Figure 3. Individual Die Thermal Resistance to the Top of IHS
Figure 4. Thermal ResistanceThe diagram shows the thermal resistance from each die to the IHS top surface and also to the air.
4.1. Intel Stratix 10 Compact Thermal Model (CTM)
Stratix® 10 FPGA thermal analysis requires the use of its compact thermal model (CTM) in a computational fluid dynamic (CFD) tool. The results of the CFD analysis are valid only to determine the core fabric power and integrated heat spreader (IHS) temperature. You can use these values to determine the junction temperature of all the dies.
The CFD methodology is appropriate because the CTM does not capture the details of transceiver channel placements; therefore, it cannot predict the correct junction temperature of a transceiver die. The transceiver junction temperature is calculated using the total power dissipation, IHS temperature, and thermal resistance of each die.
Stratix® 10 CTMs are offered in ECXML format, which is compatible with the following CFD tools:
Icepak* from ANSYS
6SigmaET* from Future Facilities
The CTM for some older devices is available only in .PDML and .TZR format, which are compatible with Flotherm* and Icepak* respectively. Contact your
Intel® support representative to obtain CTM models.
If your company does not use any of the above CFD tools,
Intel® can provide a step file of the CTM, by request. The step file model is compatible with other thermal tools, such as:
Thermal Analysis* from SolidWorks
Thermal Analysis by Autodesk
4.2. Intel Stratix 10 Temperature Sensing
Each die in an
Stratix® 10 FPGA contains at least one digital thermal sensor (DTS) and one temperature sensing diode (TSD). Each DTS has no physical connection to device pins and must be read using temperature sensing software.
The TSDs connect to pins on the device, to which you can connect to read, using external devices such as the Maxim Integrated MAX6581, MAX31730, or Texas Instrument TMP468. Both the DTS and TSD report the temperature of their physical location on the die, which may or may not be the hottest location on the die. For this reason, the PTC adds an offset value to each reading, to correct for the maximum temperature of the corresponding die. The offset values may vary, therefore you must specify which sensors the PTC is to use. (Versions of the PTC earlier than version 20.4 show the DTS offsets; therefore,if you require offset values for the TSDs, you must first port your design to the version 20.4 or later software.
4.3. Thermal Sensor Accuracy
Both digital thermal sensors (DTSs) and thermal sensor diodes (TSDs) have a sensor accuracy of ±5°C. This margin of error means that a reported value of 100°C could actually be as high as 105°C, which might adversely affect the reliability and timing closure of the FPGA. Therefore, you should design for a cooling margin of at least 5°C, to ensure that the resulting cooling solution remains within the margin of error of the sensors, and does not exceed the desired maximum junction temperature.
When using TSDs you must calibrate the external device to your design circuitry. Reported temperatures from the external temperature sensors can be incorrect by 10°C or more, depending on which die temperature is measured. For further detail on the bonded sensors and how to read them, refer to AN 769: Intel FPGA Remote Temperature Sensing Diode Implementation Guide.
5. Thermal Design Process for Intel Stratix 10 Devices
This topic describes the stages of the
Stratix® 10 FPGA thermal design process.
Thermal Design Flow
Thermal Design Stages
Supply design information to the
Intel® FPGA Power and Thermal Calculator (PTC). This step provides the necessary data to estimate the power dissipation of each die. The inputs include the FPGA design information as well as the thermal design requirements of TA and TJ-MAX and power margin selection. At this point the design is still in its early stages; be aware that power predictions at this point may have inaccuracies and should not be taken as indicative of the final values for a functional design.
Obtain thermal design parameters. The PTC provides the thermal design parameters. The power dissipation of the transceiver die is provided as a constant value, but the main core die power dissipation is provided as a function of its junction temperature, and should be entered into the computational fluid dynamic (CFD) tool as a function of temperature for most accurate results.
Obtain the compact thermal model (CTM). Contact your Intel representative to obtain the applicable CTM for the CFD analysis.
Run the CFD analysis. Model the system in the CFD tool and apply all the applicable power values to the corresponding dies. The CFD solution provides the core die total thermal power (TTP) and temperature and the TCASE. The CFD cannot predict the transceiver and HBM die temperatures, therefore those must be calculated manually.
Compare the CFD results with the PTC results. If the CFD values for the TCASE , or TJ of the core die are equal to or less than those calculated by the PTC, then the cooling solution is sufficient. If the TCASE , or TJ of the core die are higher than those calculated by the PTC, then additional cooling, or design changes such as transceiver placement optimization, may be needed. Use the data from the CFD analysis to calculate the TJ of each die or the effective ΨCA, using the following equations:
In the above equations, TTP and TCASE are the CFD results. The maximum TJ is the die with the highest ΨJC reported by PTC. To calculate the TJ-MAX, use that ΨJC in the above equation.
6. Power and Thermal Calculator (PTC) for Intel Stratix 10 Devices
Intel® FPGA Power and Thermal Calculator (PTC) can estimate the power consumption of an
Stratix® 10 FPGA and generate the thermal parameters needed for a system thermal simulation.
The PTC is available in two versions:
A standalone version that you can use without having an actual RTL design. You provide all input to this version manually, or from an imported .ptc file. The standalone version is convenient at the early stages of projects when power and cooling requirements must be determined and no RTL design is available.
An embedded version that is part of the
Quartus® Prime software. Input data for this version is generated when you compile an RTL design and run the
Quartus® Prime Power Analyzer on it. This version can provide more accurate results, and should be used when possible.
Quartus® Prime PTC can export a design file to the standalone PTC for ease of use and faster what-if analysis.
Like the Early Power Estimator (EPE), the PTC allows you to enter and select relevant information for your FPGA design and calculate relevant power and relevant thermal design information for that design. The data provided to the PTC includes device information, FPGA logic design information, and thermal information. The effects of the above inputs determine the overall power dissipation of each die and the thermal characteristics of the package to use for system thermal modeling. Below are the necessary inputs provided to the PTC where we also enter the data for the example mentioned before
The following figure depicts the PTC Main page, where you can select your FPGA device and enter some power setting and thermal parameters. As you enter data into the PTC, the power summary information on the left-side table updates automatically. (If you want, you can disable automatic power updates on the top menu.)
Intel® FPGA Power and Thermal Calculator Main Page
The following figure shows the main page of the PTC updated with device information and all other fields at default values. The power summary at the left side predicts 6.447 watts of power dissipation. This is typical static power for this device when not instantiated, and with all dies at 25°C.
If we change the power characteristics to maximum or increase the junction temperature, the power consumption also increases. For more information on these relationships, refer to the Static Power and Typical Power topic, below.
Figure 8. PTC Main Page Updated with Device Information
Note in the message window in the above figure, Typical power calculations should not be used for regulator sizing and thermal analysis. This message occurs because the Power characteristics field is set to Typical, however the PTC activates it's thermal calculation only when the Power Characteristic field is set to maximum. To calculate all thermal parameters with maximum static power, set Power characteristics to maximum.
6.2. Logic Design Information
Intel® FPGA Power and Thermal Calculator (PTC) has several pages for data entry.
Main Design Entry
The subject of this document is thermal analysis, therefore it focuses primarily on the thermal-related settings. For broader and more detailed information, refer to the
Intel® FPGA Power and Thermal Calculator User Guide.
The following table shows values for the example using an 1SM21BE
Stratix® 10 device.
Table 4. Values for Example 1SM21BE
Stratix® 10 Design
Values for this Design
1,000,000 Half ALM
Clock: 400 MHz
Toggle rate: 25%
True dual port
Data width: 8
RAM depth: 32 clock
Clock: 400 MHz
Toggle rate: 25%
Clock: 400 MHz
Total fan out: 50,000
Global enable: 100%
Local enable: 100%
PLL Type: ATX PLL
PLL block: 3
Output frequency: 6,000 MHz
In this example, 12 channels in each transceiver die are activated.
See the figure Transceiver Placements, below.
Both HBM dies (Bottom and Top) are fully active
Traffic pattern is set to pat 7
Total 16 channels
Figure 9. Transceiver Placements
Determining Thermal Parameters
After you have completed your design entry, you must activate the Thermal page of the PTC, as follows:
Set the Power characteristic field to Maximum.
Set the Calculation Mode to Find a cooling solution for maximum junction temperature, or any other option other than the default.
With the Thermal page active, you must choose appropriate values for Calculation Mode, Recommended margin, and TSD mode:
Calculation Mode. Choose one of the following:
Use a constant junction temperature.
This mode assumes that all junction temperatures are the same. This is usually not the case and should be used cautiously where the power calculated is too high and the PTC does not produce any thermal parameter for it.
Enter the junction temperature in this mode.
Find a cooling solution for a maximum junction temperature.
Enter maximum TJ suitable for the design.
Enter the ambient temperature.
Find a maximum junction temperature for a cooling solution.
Enter the ΨCA for your system's cooling solution.
Enter the ambient temperature.
Find an ambient temperature for a cooling solution.
Enter the maximum TJ suitable for the design.
Enter the ΨCA for the cooling solution.
Apply recommended margin. Choose one of the following:
This field should be set to Yes if the power model status in the main page is not final. Turning it on almost adds over 20% to the total power of the FPGA and makes the thermal design very conservative. Consult your FAE and your system and circuit designers to determine the necessity of this option.
If the power models are final it is not recommended to turn this field to yes. However, as the logic usage grows in the design process along with other uncertainties, it is a good practice to provide some cooling margin early in the thermal design. Keep in mind, any cooling design has a finite capacity and FPGA power dissipation can not grow over that. This can make it very difficult to add features later to the design if there was not any spare cooling to be tapped into.
TSD mode. There are two sets of thermal sensors in the
Stratix® 10 FPGA. Choose one from the dropped down menu in the thermal page:
Using the Pinned out diodes (TSD), is where a set of sensors, one for each die are bonded out and only can be read by an outside device such as a Maxim 6581 or similar. These temperatures can be read while the FPGA is not active. In this mode the FPGA does not need to be on since the sensing is done by an outside device. For further detail on the bonded sensors and how to read them, refer to AN 769: Intel FPGA Remote Temperature Sensing Diode Implementation Guide
Using DTS, with the temperature sensor IP, these are a set of sensors that are only readable through the
Stratix® 10 Temperature IP Sense Software and only when the FPGA is on and instantiated. For more details on these sensors see Intel Stratix 10 Analog to Digital Converter user guide
These two sets of sensors do not coincide with each other nor necessarily with the hot spot on the die for a given design. Therefore the PTC provides two sets of offsets that need to be added to the sensor values read in the field, so the resulting temperatures are the maximum for the given die.
Note: The HBM temperature can be read only through the
Stratix® 10 temperature IP sensor, and no offset value is required for it.
6.3. Thermal Settings and Parameters
The maximum junction temperature rating of all
Stratix® 10 dies is 100°C unless stated otherwise.
The 1SM21BE device which we are using as the example in this document, is an exception, with a maximum HBM die temperature rating of 95°C. Consequently, for this design we use the Maximum junction temperature mode and set it to 90°C, to provide the design with a margin of 5°C. However, if the analysis indicates that the HBM is not the die with highest temperature, we could then increase the max TJ until the HBM or another die reaches its limit with the margin.
The ambient temperature is set to 50°C.
Many designs utilize both sets of sensors, so temperature offset data from both modes is needed. We also assume that our test project uses both sets of thermal sensors. The figure below shows the PTC Thermal page output and the thermal settings and parameters for the design.
Figure 10. Thermal Settings and Parameters
The die with the highest ΨJC is the die that has the highest temperature and reaches the maximum TJ for the cooling solution calculated by the PTC. In this case the highest ΨJC of 0.103°C/W belongs to HSS_2_1, E tile transceiver tile. With the maximum TJ set to 90°C, the HBM die has a margin of 5°C, calculating the HBM junction temperature.
TJ_HBM = TCASE + TTP x ΨJC_HBM
TJ_HBM = 79 + 105.453 x -0.007 = 78.26
As expected, the above calculations indicate that the HBM die does not have the highest temperature in this design and the maximum TJ can be increased to 95°C if necessary. Doing so also increases the total power to 108 W and ΨCA to 0.312°C/W from the current 0.276°C/W. The higher the ΨCA value, the easier it is to cool the system; however, the total thermal power (TTP) also increases.
The next step in the process is to contact your Intel Field Application Engineer (FAE) to obtain the compact thermal model (CTM) for the device in the format that you need.
6.4. Thermal Design Optimization
After you have captured your design in the
Intel® FPGA Power and Thermal Calculator (PTC), it is good practice to evaluate whether any thermal optimization is possible to make the cooling easier.
Improved cooling can be achieved by reducing overall power consumption, or by reducing the design's ΨJC value.
There are two types of power consumed in an FPGA: static power and dynamic power.
Static power is the power that the configured device consumes when powered up but with no user clocks operating. Static power is dependent on device size, device grade, power characteristics, and junction temperature. For
Stratix® 10 devices, this excludes DC bias power of analog blocks, such as I/O and transceiver analog circuitry.
Reducing junction temperatures can save power. For example, if a given design has a total static power of 38W when the maximum TJ is 90°C, and you decrease the maximum TJ, the static power also decreases, with no change to the operation of the device. However, reducing the maximum TJ requires additional cooling effects, such as a reduction to the ambient temperature, increased airflow, or the use of a larger heat sink. You should always consider methods of reducing static power consumption—especially when evaluating operating costs for large data centers or central offices.
Dynamic power is the additional power consumed due to signal activity or toggling. For example, if you reduce the number of half ALMs or flip flops in the core die, or the clock frequency or toggle rate, the dynamic power goes down. Such action may not always be possible, but you should consider it, especially for dies that seem to be the limiting factor in the cooling system.
Transceiver Channel Spreading
Consider our example, where the
Stratix® 10 device has 3 E-Tile transceivers with 2 of them differing only in channel placement. However, one of those two otherwise-identical E-Tiles has higher thermal resistance and power consumption. The table below depicts this situation.
The HSSI_2_1 tile has all of its 12 active channels physically adjacent to each other, which results in higher power density and makes it more difficult to cool. The higher thermal resistance results in a higher temperature, as calculated below:
∆T= TTP * ∆(Ψjc) =105.5 * (0.103 - 0.065) = 4 °C
It might not be possible to optimize channel placement, depending on the design's constraints and requirements; nevertheless, such optimization should be considered.
Stratix® 10 FPGA thermal design parameters are unique for every project. Thermal design parameters are mainly determined by the power, local power density, and power ratio of dies. Any change to the design requires design information to be updated accordingly in the
Intel® FPGA Power and Thermal Calculator (PTC). That is, a set of thermal parameters calculated early in the project may no longer be valid if the FPGA utilization is changed later.
After you have compiled the RTL design, you should use the
Quartus® Prime Power Analyzer to update the thermal parameters. A .qptc file can be exported out of the power analyzer and used in standalone PTC for ease of use if necessary.
6.6. Intel Stratix 10 Device with PCIe Thermal Design Example 1
This topic explores the computational fluid dynamic (CFD) analysis of a 1SM21BE
Stratix® 10 device on a two-slot PCIe board.
For simplicity, this topic assumes that the FPGA is the only active device on the circuit board.
The first step in the thermal design process is to enter design parameters into the
Intel® FPGA Power and Thermal Calculator (PTC) and obtain the thermal parameters. For details on this step, refer to Power and Thermal Calculator (PTC), earlier in this document.
The second step is to obtain the Compact Thermal Model (CTM) for the
Stratix® 10 device. The figure below shows the device layout. We take the device to our CFD model and assign the powers to the dies as shown below.
Figure 11. Die Power Assignment for CFD Analysis
The figure below illustrates the board layout in the CFD model.
Figure 12. Die Power Assignment for CFD Analysis
Figure 13. CFD Model
6.7. Heat Sink
FPGAs appear in a wide variety of products and layout configurations, consequently it can be difficult to find off-the-shelf heat sinks that meet all physical and performance requirements. Consequently, most applications require custom heat sinks to maximize performance.
For the purpose of this analysis, we are assuming a heat sinks that fits the board and has a vapor chamber base that maximizes thermal conductivity.
Intel does not recommend a specific TIM material; however, as a general rule TIMs with higher thermal conductivity perform better. The TIM size is a function of the package interface size to the heat sink, and in the case of our example is about 52x44 mm. Generally, a TIM slightly smaller than the interface size is preferred, to insure better contact; in this example, a TIM of 50x42 mm is recommended. The TIM thickness should not introduce too much resistance, while also ensuring that there is no air pocket between the TIM and the heat sink. In our example, we have chosen a TIM with 0.25 mm thickness and thermal conductivity of 5 W/mk.
CTM Case Temperature Sensor
Not all CTMs have a built-in temperature sensor at the center of the integrated heat spreader (IHS). If the CTM for your project does not have a built-in temperature sensor, Intel recommends that you add one. In our example, we have added one. You should place the temperature sensor at the center of the IHS as shown in the figure below. The temperature sensor should not touch the heat sink.
Figure 14. IHS Temperature Sensor Placement
For the purpose of this analysis, the airflow at the board inlet boundary is set to 25 CFM at 50°C. The total board power is only the FPGA power at 105 W.
The following figures depict the board, the IHST surface temperature, and the die temperatures from the CFD analysis.
The maximum IHST and core die temperatures are 90.8°C and 95.3°C, respectively. These values are about 10-11°C above the limit of the thermal calculations from the PTC. This indicates that the current cooling solution does not meet the design requirement. If the current cooling solution is implemented, the maximum TJ will be above 90°C and the die powers will be more than what used for this analysis. As stated before, the CFD analysis results are valid only for the core die and the IHS temperatures; the temperature of the transceiver dies are not valid in the CFD analysis
Figure 15. Board Surface Temperature
Figure 16. IHST Surface Temperature
Figure 17. Temperature of the Dies From CFD Analysis
There are several possible methods for improving the cooling; in practice, your thermal/mechanical engineer must make the most appropriate decision. In this example, we will direct air to the heat sink and the DDR memory using a guiding plate, as shown in the figure below.
Figure 18. Circuit Board with Airflow Directing Plate
Using the above solution, the FPGA core and IHST temperatures drop to 80°C and 75.1°C, respectively. These temperature reductions indicate that the cooling system is maintaining the temperatures below the limits of 84°C and 79°C, as recommended by the PTC. The total FPGA power dissipation is now less than what was used in the CFD model, because the lower temperatures reduce the static power consumption.
A second run of the PTC with the maximum TJ reduced to 86°C yields a core TJ of 80°C and total power of 103 W. This result is about 2.3 W less than the power used in the current CFD analysis. One method to capture this iteration properly is to enter the core power as a function of temperature. For this, run the PTC with several different maximum TJ settings to obtain the needed powers for the temperature range, and then build the core power curve as a function of its maximum junction temperature. In this case the following curve shows the core die power which we will use in the CFD analysis. For this purpose, we can ignore the static power reduction in the transceiver dies and still use constant power values.
Running the CFD analysis with the new core power setting yields a core die temperature of 78.2°C, which is 2°C less than the previous result. The figure below shows the core power and system operating point. Obviously if the cooling capacity is reduced, the operating point shifts to the right and FPGA total power consumption increases.
Figure 19. Core Die Power and System Operating Point
6.8. Intel Stratix 10 Device with PCIe Thermal Design Example 2 (Alternate Method)
This example presents an alternative way of analyzing the thermal design, by determining the effective ΨCA of the individual FPGA in the system and its ambient temperature.
The Thermal page of the
Intel® FPGA Power and Thermal Calculator (PTC) reports the required ΨCA for the FPGA. Consequently, if you can determine the ΨCA for the FPGA in question, you can then readily determine whether the system has sufficient cooling. Or by inserting the effective ΨCA and ambient temperature, you could determine the FPGA junction temperatures and other thermal parameters. This method is especially useful for large systems and deployed systems in the field where the cooling design is fixed. For this method you can measure the effective ΨCA and effective ambient temperature experimentally, or determine them by CFD analysis. In either case, the result is valid only if other components of the system—especially upstream components—are powered up as they would be in the actual system, and are not subject to change.
To perform the measurement or analysis keep all the transceiver powers constant but vary the core die powers by ±25% and determine the FPGA case temperature as a function of total thermal power (TTP). You can then use a simple linear regression to determine the ΨCASE-AMB,EFF and TAMB,EFF for the FPGA.
Figure 20. Calculating Effective Ambient Temperature and ΨCA
6.8.1. Analyzing 1U Sled with 3 FPGAs
This topic illustrates the analysis of a system with 1U sled and 3 FPGAs.
Figure 21. Example 2, 1U Sled Top View
In this 3-FPGA system, the first two FPGAs have static functions that do not change with time, and the third FPGA’s power image can change with various applications. To determine the maximum power image of the third FPGA, we must determine its effective ΨCA and ambient temperature.
The following figure shows the PTC Thermal page for this design, with core die power of 99 watts and total FPGA power of 152 watts.
Figure 22. Example 2, PTC Thermal Page
If we conduct the CFD simulation of this system three times, with core powers of 76 watts, 99 watts, and 124 watts, we get the following graph of FPGA total power versus case temperature.
Figure 23. Example 2, Linear Regression
The above graph shows that the effective ΨCA is 0.2°C/W and the effective ambient temperature is 41°C. Referring to the following figure, we can see that by entering these values into the PTC, we find that the maximum junction temperature for this case is 84°C. These results indicate that, depending on the maximum required TJ, there may still be some margin.
Figure 24. Example 2, PTC Thermal Results
If we assume that the maximum allowable TJ is 90°C, then by adding functionality to the core we can determine at what core power level the maximum TJ reaches 90°C. The figure below shows the results on the PTC Thermal page, where the total FPGA power is found to be 186 watts.
Another way to approach this, would be to enter your design with the effective ambient temperature of 41°C and the desired maximum TJ where the viable solutions would be those with ΨCA ≤ 0.2°C/W.
Figure 25. Example 2, Maximum FPGA Power
7. Maximum Power and Typical Power
Part of the power dissipated in an FPGA is static power, which is the power that the configured device consumes when powered up but with no user clocks operating. Static power is dependent on device size, device grade, power characteristics, and junction temperature. The static power for individual devices within a given batch, may not be identical at a given temperature.
For example, the
Stratix® 10 1SM21BE device discussed in this document has a maximum static power of 48 watts if all its dies are at 100°C. It has a typical static power of 34 watts and a minimum static power of about 26 watts. So for this device the total power dissipation may vary by as much as 22 watts.
In recognition of the static power variation between devices, Intel recommends the following steps:
Always assume the maximum power for thermal and power supply calculations. This ensures that the resulting values remain valid in cases where individual devices are dissipating the maximum power.
Assume typical power when performing power and cooling calculations for large installations. For example, if a datacenter or central office is planning a large installation containing 10,000 1SM21BE devices, the difference in power consumption between the maximum and typical cases could be significant—as much as 140 kilowatts, plus another 140 kilowatts in cooling, depending on the power usage effectiveness (PUE) of the datacenter. When planning the power and cooling operating expenses (OPEX) for large installations, calculations based on typical power may deliver a more accurate result than calculations assuming maximum power.
8. Document Revision History for AN 943: Thermal Modeling for Intel Stratix 10 FPGAs with the Intel FPGA Power and Thermal Calculator