Intel Quartus Prime Standard Edition User Guide: Power Analysis and Optimization
Version Information
Updated for: |
---|
Intel® Quartus® Prime Design Suite 18.1 |
1. Power Analysis
The Intel® Quartus® Prime Design Suite provides the Early Power Estimator (EPE) spreadsheet and Power Analyzer for estimating the power consumption in your design.
Power estimation and analysis allows you to confirm that your design does not exceed thermal or power supply requirements throughout the design process:
- Thermal—Thermal power is the power that dissipates as heat from the FPGA. Devices use a heatsink or fan to act as a cooling solution. This cooling solution must be sufficient to dissipate the heat that the device generates. Additionally, the computed junction temperature must fall within normal device specifications.
- Power supply—Power supply is the power that the device needs to operate. Power supplies must provide adequate current to support device operation.
1.1. Power Analysis Tools
The Intel® Quartus® Prime Design Suite provides tools to analyze the power consumption of your FPGA design at different stages of the design process.
- Early Power Estimator (EPE) spreadsheet—estimates power consumption for power supply planning before compiling the design.
- Intel® Quartus® Prime Power Analyzer—estimates power consumption for a post-fit design, allowing you establish guidelines for the power budget.
Characteristic | EPE | Intel® Quartus® Prime Power Analyzer |
---|---|---|
When to use | Any time Note: For post-fit power analysis, you get better
results with the
Intel®
Quartus® Prime
Power Analyzer.
|
Post-fit |
Software requirements | Spreadsheet program | The Intel® Quartus® Prime software |
Accuracy | Medium | Medium to very high |
Data inputs |
|
|
Data outputs Note: The EPE and Power Analyzer outputs vary by
device family.
|
|
|
Estimation of transceiver power for dynamic reconfiguration features | Includes an estimation of the incremental power consumption by these features. | Not included |
1.2. Running the Power Analyzer
To run the Power Analyzer:
- To specify device power characteristics, operating voltage, and temperature conditions for power analysis, click Assignments > Settings > Operating Settings and Conditions, as Device Operating Condition Settings for Power Analysis describes.
- To run full compilation of your design, click Processing > Start Compilation.
- Click Processing > Power Analyzer Tool.
- Specify the source of signal activity data, as Generating Signal Activity Data for Power Analysis describes.
- To generate a Signal Activity (.saf) file during analysis, turn on Write out signal activities used during power analysis, and specify the file name.
- To direct the Power Analyzer to generate an Early Power Estimation file, turn on Write out Early Power Estimation file, and specify the file name. The Early Power Estimation file summarizes the resource utilization and allows you to perform what-if analyses in EPE.
- Specify the Default toggle rates for unspecified signals, as Specifying the Default Toggle Rate describes.
- To specify temperature range and cooling options, click Cooling Solution and Temperature.
-
Click Start.
Figure 2. Progress Bar in Intel® Quartus® Prime Power Analyzer
- When power analysis is complete, click Report to open the Power Analyzer reports that Viewing Power Analysis Reports describes.
1.3. Specifying Power Analyzer Input
The Power Analyzer accuracy is driven by design factors, operating conditions, and signal activity data that affect power consumption. The following figure shows how the Power Analyzer interprets these inputs and generates results in the Power Analysis report:
To obtain accurate I/O power estimates, the Power Analyzer requires full compilation of your design, in addition to specifying the following settings:
- The electrical standard on each I/O cell.
- The board trace model on each I/O standard in the design.
- Timing assignments for all the clocks in your design, or use a simulation-based flow to generate activity data.
1.3.1. Device Operating Condition Settings for Power Analysis
The Power Analyzer reads the following settings to determine the device operating conditions for power analysis:
Option | Settings |
---|---|
Device power characteristics |
|
Voltage tab | Specifies the operating voltage conditions for each power rail in the device, and the supply voltages for power rails with selectable supply voltages. |
Temperature tab | Specifies the thermal operating
conditions of the device, including:
|
1.3.2. Specifying Signal Activity Data
The accuracy of the power estimation depends on how representative signal activity data is during power analysis. The Power Analyzer allows you to specify signal activity data from the following sources:
- .vcd files generated by supported third-party simulators
- User-entered node, entity, and clock assignments
- User-entered default toggle rate assignment
- Vectorless estimation (selected devices)
You can mix and match the signal activity data sources on a signal-by-signal basis. The following figure shows the priority scheme applied to each signal.
1.3.2.1. Using Simulation Signal Activity Data in Power Analysis
You can specify a Verilog Value Change Dump File (.vcd) generated by a supported1 simulator as the source of signal activity data for power analysis.
Third-party simulators can output a .vcd that contains signal activity and static probability information that inform the power analysis. The generated .vcd includes all of the routing resources and the exact logic array resource usage.
For third-party simulators, use the EDA Tool Settings to specify the Generate Value Change Dump (VCD) file script option in the Simulation page of the Settings dialog box. These scripts instruct the third-party simulators to generate a .vcd that encodes the simulated waveforms. The Intel® Quartus® Prime Power Analyzer reads this file directly to derive the toggle rate and static probability data for each signal.
1.3.2.1.1. Generating Signal Activity Data for Power Analysis
Follow these steps to generate and use simulation signal activity data for power analysis:
- To run full compilation on your design, click Processing > Start Compilation.
- To specify settings for output of simulation files, click Assignments > Settings > EDA Tool Settings > Simulation. Select your simulator in Tool name and the Format for output netlist and Output directory.
-
Turn on Map illegal HDL characters.
This setting directs the EDA Netlist Writer to map illegal characters for VHDL
or Verilog HDL, and results in more accurate data for power analysis.
Figure 6. EDA Tool Settings for Simulation
- In the Intel® Quartus® Prime software, click Processing > Power Analyzer Tool. The Power Analyzer tab appears.
-
Under Input file, turn
on Use input files to initialize toggle rates and
static probabilities during power analysis, and then click
Add Power Input Files. The
Power Analyzer Settings page
appears.
Figure 7. Specifying Power Analysis Input Files
- To specify a .vcd for power analysis, click Add and specify the File name, Entity, and Simulation period for the .vcd.
- To enable glitch filtering during power analysis with the .vcd you generate, turn on Perform glitch filtering on VCD files.
- To run the power analysis, click Start on the Power Analyzer tab. View the toggle rates in the power analysis results.
1.3.2.1.2. Generating Standard Delay Output for Power Analysis
- Click Assignments > Settings > EDA Tool Settings > Simulation. In Tool name select ModelSim® and Verilog for Format for output netlist.
-
Click More EDA Netlist Writer
Settings. Set Enable SDO Generation
for Power Estimation to On. Set Generate Power Estimate
Scripts to ALL_NODES.
Figure 8. More EDA Netlist Writer Settings
- To run the Fitter, click Processing > Start > Start Fitter (Finalize).
- Create a representative testbench (.vt) that exercises the design functions appropriately.
-
To specify the appropriate hierarchy level for signals in the
output .vcd, add the following line to the
project .qsf file:
set_global_assignment -name EDA_TEST_BENCH_DESIGN_INSTANCE_NAME <DUT instance path> -section_id eda_simulation
2 -
After Fitter processing is complete, click Processing > Start > Start EDA Netlist Writer. EDA Netlist Writer generates the following files in /<project>/simulation/modelsim/power/:
- <project>.vo (contains a reference to the .sdo file by default)
- <project>_dump_all_vcd_nodes.tcl—specifies nodes to save in .vcd
- <project>_v.sdo—back-annotated delay estimates
- Create a ModelSim® script (.do) to load the design and testbench, start ModelSim® , and then source the .do script.
- To specify the signals ModelSim® includes in the .vcd file, source *_dump_all_vcd_nodes.tcl in ModelSim® .
- To generate the .vcd file, simulate the test bench and netlist in ModelSim® . The .vcd file generates according to your specifications.
- Specify the .vcd as an input to power analysis, as Generating Signal Activity Data for Power Analysis describes.
1.3.2.1.3. Simulation Glitch Filtering
You can enable glitch filtering in the .vcd that you generate in a third-party simulator for use in power analysis by turning on the Perform glitch filtering on VCD files option.
The Power Analyzer defines a glitch as two signal transitions so closely spaced in time that the pulse, or glitch, occurs faster than the logic and routing circuitry can respond. The output of a transport delay model simulator contains glitches for some signals. The logic and routing structures of the device form a low-pass filter that filters out glitches that are tens to hundreds of picoseconds long, depending on the device family.
Some third-party simulators use different models than the transport delay model as the default model. Different models cause differences in signal activity and power estimation. The inertial delay model, which is the ModelSim default model, filters out more glitches than the transport delay model and usually yields a lower power estimate.
Glitch filtering in a simulator can also filter a glitch on one logic element (LE) (or other circuit element) output from propagating to downstream circuit elements to ensure that the glitch does not affect simulated results. Glitch filtering prevents a glitch on one signal from producing non-physical glitches on all downstream logic, which can result in a signal toggle rate and a power estimate that are too high. Circuit elements in which every input transition produces an output transition, including multipliers and logic cells configured to implement XOR functions, are especially prone to glitches. Therefore, circuits with such functions can have power estimates that are too high when glitch filtering is not used.
The .vcd file reader performs glitch filtering that is complementary to simulation glitch filtering, but is often less precise. While the .vcd file reader has the ability to remove glitches on logic blocks, the file reader cannot determine how a given glitch potentially affects downstream logic and routing. Filtering the glitches during simulation avoids switching downstream routing and logic automatically.
1.3.2.1.4. Generating a .vcd in a EDA Simulation Tool
To create a .vcd for the design, follow these steps:
- On the Assignments > Settings.
- In the Category list, under EDA Tool Settings, click Simulation.
- In the Tool name list, select the EDA simulator.
- In the Format for output netlist list, select Verilog HDL, or SystemVerilog HDL, or VHDL.
-
Turn on Generate Value Change Dump
(VCD) file script.
This option turns on the Map illegal HDL characters and Enable glitch filtering options.
The Map illegal HDL characters option ensures that all signals have legal names and makes signal toggle rates available to the Power Analyzer.
The Enable glitch filtering option directs the EDA Netlist Writer to perform glitch filtering when generating VHDL Output Files, Verilog Output Files, and the corresponding Standard Delay Format Output Files for use with other EDA simulation tools. This option is available regardless of whether or not you want to generate .vcd scripts.Note: For ModelSim® simulations , the +nospecify option in the vsim command disables the Specify path delays and timing checks option. By enabling glitch filtering on the Simulation page, the simulation models include specified path delays. Thus, ModelSim® might fail to simulate a design. As a best practice, remove the +nospecify option from the ModelSim® vsim command to ensure accurate simulation for power estimation. -
Click Script Settings.
Select the signals that you want to write to the .vcd.
- If you choose All signals, the generated script instructs the third-party simulator to write all connected output signals to the .vcd file.
- If you choose All signals except combinational lcell outputs, the generated script instructs the third-party simulator to write all connected output signals to the .vcd, except logic cell combinational outputs.
Note: The file can become extremely large if you write all output signals to the file, because the file size depends on the number of output signals being monitored and the number of transitions that occur. - Click OK.
- In the Design instance name box, type a name for the testbench.
- Compile the design with the Intel® Quartus® Prime software, and generate the necessary EDA netlist and script that instructs the third-party simulator to generate a .vcd.
- In the third-party EDA simulation tool, call the generated script in the simulation tool before running the simulation.
- Perform the simulation.
1.3.2.1.5. Generating a .vcd from ModelSim Software
To generate a .vcd with the ModelSim® software, follow these steps:
- In the Intel® Quartus® Prime software, on the Assignments menu, click Settings.
- In the Category list, under EDA Tool Settings, click Simulation.
- In the Tool name list, select your preferred EDA simulator.
- In the Format for output netlist list, select Verilog HDL, or SystemVerilog HDL, or VHDL.
- Turn on Generate Value Change Dump (VCD) file script.
- To generate the.vcd, perform a full compilation.
- In the ModelSim® software, compile the files necessary for simulation.
- Load your design by clicking Start Simulation on the Tools menu, or use the vsim command.
-
Use the .vcd script created
in 6 using the following command:
source <design>_dump_all_vcd_nodes.tcl
- Run the simulation (for example, run 2000ns or run -all).
- Quit the simulation using the quit -sim command, if required.
-
Exit the
ModelSim®
software.
If you do not exit the software, the ModelSim® software might end the writing process of the .vcd improperly, resulting in a corrupt .vcd.
1.3.2.2. Signal Activities from RTL (Functional) Simulation, Supplemented by Vectorless Estimation
In the functional simulation flow, simulation provides toggle rates and static probabilities for all pins and registers in your design. Vectorless estimation fills in the values for all the combinational nodes between pins and registers, giving good results. This flow usually provides a compilation time benefit when you use the third-party RTL simulator.
1.3.2.2.1. RTL Simulation Limitation
RTL simulation may not provide signal activities for all registers in the post-fitting netlist because synthesis loses some register names. For example, synthesis might automatically transform state machines and counters, thus changing the names of registers in those structures.
1.3.2.3. Signal Activities from Vectorless Estimation and User-Supplied Input Pin Activities
The vectorless estimation flow provides a low level of accuracy, because vectorless estimation for registers is not entirely accurate.
1.3.2.4. Signal Activities from User Defaults Only
The user defaults only flow provides the lowest degree of accuracy.
1.3.3. Specifying the Default Toggle Rate
You can specify the Default toggle rates for unspecified signals in your design for power analysis. The Power Analyzer uses the default toggle rate when no other method specifies the signal activity data.

You specify the toggle rate in absolute terms (transitions per second), or as a fraction of the clock rate in effect for each node. The toggle rate for a clock derives from the timing settings for the clock. For example, if the Power Analyzer specifies a clock with an fMAX constraint of 100 MHz and a default relative toggle rate of 20%, nodes in this clock domain transition in 20% of the clock periods, or 20 million transitions occur per second.
In some cases, the Power Analyzer cannot determine the clock domain for a node because the clock domain is ambiguous. For example, the Power Analyzer cannot determine a clock domain for a node unless you specify sufficient timing constraints for the clock domains. If the Power Analyzer cannot determine the clock domain for a node, the Power Analyzer substitutes and reports a toggle rate of zero.
1.3.4. Specifying Toggle Rates for Specific Nodes
You can assign toggle rates and static probabilities to individual nodes in the design. These assignments have the highest priority, overriding data from all other signal activity sources.
You must use the Assignment Editor or Tcl commands to create the Power Toggle Rate and Power Static Probability assignments. You can specify the power toggle rate as an absolute toggle rate in transitions per second using the Power Toggle Rate assignment, or you can use the Power Toggle Rate Percentage assignment to specify a toggle rate relative to the clock domain of the assigned node for a more specific assignment made in terms of hierarchy level.
Assigning toggle rates and static probabilities to individual nodes is appropriate for signals in which you have knowledge of the signal being analyzed. For example, if you know that a 100 MHz data bus or memory output produces data that is essentially random (uncorrelated in time), you can directly enter a 0.5 static probability and a toggle rate of 50 million transitions per second.
The Power Analyzer treats bidirectional I/O pins differently. The combinational input port and the output pad for a pin share the same name. However, those ports might not share the same signal activities. For reading signal activity assignments, the Power Analyzer creates a distinct name <node_name~output> when configuring the bidirectional signal as an output and <node_name~result> when configuring the signal as an input. For example, if a design has a bidirectional pin named MYPIN, assignments for the combinational input use the name MYPIN~result, and the assignments for the output pad use the name MYPIN~output.
1.3.4.1. Clock Node Toggle Rates
For clock nodes, the Power Analyzer uses timing requirements to derive the toggle rate when neither simulation data nor user-entered signal activity data is available. fMAX requirements specify full cycles per second, but each cycle represents a rising transition and a falling transition. For example, a clock fMAX requirement of 100 MHz corresponds to 200 million transitions per second for the clock node.
1.3.5. Avoiding Simulation Node Name Match
Node name mismatches happen when you have .vcd applied to entities other than the top-level entity. In a modular design flow, the gate-level simulation files created in different Intel® Quartus® Prime projects might not match their node names with the current Intel® Quartus® Prime project.
For example, you may have a file named 8b10b_enc.vcd, which the Intel® Quartus® Prime software generates in a separate project called 8b10b_enc while simulating the 8b10b encoder. If you import the .vcd into another project called Top, you might encounter name mismatches when applying the .vcd to the 8b10b_enc module in the Top project. This mismatch happens because the Intel® Quartus® Prime software might name all the combinational nodes in the 8b10b_enc.vcd differently than in the Top project.
You can avoid name mismatching with only RTL simulation data, in which register names do not change, or with an incremental compilation flow that preserves node names along with a gate-level simulation.
1.4. Viewing Power Analysis Reports
Summary
Following successful power analysis, click the Power Analyzer Reports button to view the Power Analysis section of the Compilation Report.
The Power Analysis reports contains the following sections:
The Summary section of the report shows the estimated total thermal power consumption of your design. This includes dynamic, static, and I/O thermal power consumption. The I/O thermal power includes the total I/O power drawn from the VCCIO and VCCPD power supplies and the power drawn from VCCINT in the I/O subsystem including I/O buffers and I/O registers. The report also includes a confidence metric that reflects the overall quality of the data sources for the signal activities. For example, a Low power estimation confidence value reflects that you have provided insufficient toggle rate data, or most of the signal activity information used for power estimation is from default or vectorless estimation settings. For more information about the input data, refer to the Power Analyzer Confidence Metric report.
Power Savings Summary
Lists any savings (in mW) and the type of savings method, such as SmartVID Power Savings.
Parallel Compilation
When you enable parallel compilation, the Parallel Compilation report list the number of processors you use during Power Analysis
Settings
The Settings section of the report shows the Power Analyzer settings information of your design, including the default input toggle rates, operating conditions, and other relevant setting information.
Simulation Files Read
The Simulation Files Read section of the report lists the simulation output file that the .vcd used for power estimation. This section also includes the file ID, file type, entity, VCD start time, VCD end time, the unknown percentage, and the toggle percentage. The unknown percentage indicates the portion of the design module unused by the simulation vectors.
Operating Conditions Used
The Operating Conditions Used section of the report shows device characteristics, voltages, temperature, and cooling solution, if any, during the power estimation. This section also shows the entered junction temperature or auto-computed junction temperature during the power analysis.
Thermal Power Dissipated by Block
The Thermal Power Dissipated by Block section of the report shows estimated thermal dynamic power and thermal static power consumption categorized by atoms. This information provides you with estimated power consumption for each atom in your design.
By default, this section does not contain any data, but you can turn on the report with the Write power dissipation by block to report file option on the Power Analyzer Settings page.
Thermal Power Dissipation by Block Type (Device Resource Type)
This Thermal Power Dissipation by Block Type (Device Resource Type) section of the report shows the estimated thermal dynamic power and thermal static power consumption categorized by block types. This information is further categorized by estimated dynamic and static power and provides an average toggle rate by block type. Thermal power is the power dissipated as heat from the FPGA device.
Thermal Power Dissipation by Hierarchy
This Thermal Power Dissipation by Hierarchy section of the report shows estimated thermal dynamic power and thermal static power consumption categorized by design hierarchy. This information is further categorized by the dynamic and static power that was used by the blocks and routing in that hierarchy. This information is useful when locating modules with high power consumption in your design. (Available for Intel® Agilex™ devices.)
Core Dynamic Thermal Power Dissipation by Clock Domain
The Core Dynamic Thermal Power Dissipation by Clock Domain section of the report shows the estimated total core dynamic power dissipation by each clock domain, which provides designs with estimated power consumption for each clock domain in the design. If the clock frequency for a domain is unspecified by a constraint, the clock frequency is listed as “unspecified.” For all the combinational logic, the clock domain is listed as no clock with zero MHz.
Current Drawn from Voltage Supplies
The Current Drawn from Voltage Supplies section of the report lists the current drawn from each voltage supply. The VCCIO and VCCPD voltage supplies are further categorized by I/O bank and by voltage. This section also lists the minimum safe power supply size (current supply ability) for each supply voltage. Minimum current requirement can be higher than user mode current requirement in cases in which the supply has a specific power up current requirement that goes beyond user mode requirement, such as the VCCPD power rail in Stratix® III and Stratix® IV devices, and the VCCIO power rail in Stratix® IV devices.
The I/O thermal power dissipation on the summary page does not correlate directly to the power drawn from the VCCIO and VCCPD voltage supplies listed in this report. This is because the I/O thermal power dissipation value also includes portions of the VCCINT power, such as the I/O element (IOE) registers, which are modeled as I/O power, but do not draw from the VCCIO and VCCPD supplies.
The reported current drawn from the I/O Voltage Supplies (ICCIO and ICCPD) as reported in the Power Analyzer report includes any current drawn through the I/O into off-chip termination resistors. This can result in ICCIO and ICCPD values that are higher than the reported I/O thermal power, because this off-chip current dissipates as heat elsewhere and does not factor in the calculation of device temperature. Therefore, total I/O thermal power does not equal the sum of current drawn from each VCCIO and VCCPD supply multiplied by VCCIO and VCCPD voltage.
For SoC devices or for Arria® V SoC and Cyclone® V SoC devices, there is no standalone ICC_AUX_SHARED current drawn information. The ICC_AUX_SHARED is reported together with ICC_AUX.
Confidence Metric Details
The Confidence Metric is defined in terms of the total weight of signal activity data sources for both combinational and registered signals. Each signal has two data sources allocated to it; a toggle rate source and a static probability source.
The Confidence Metric Details section also indicates the quality of the signal toggle rate data to compute a power estimate. The confidence metric is low if the signal toggle rate data comes from poor predictors of real signal toggle rates in the device during an operation. Toggle rate data that comes from simulation, user-entered assignments on specific signals or entities are reliable. Toggle rate data from default toggle rates (for example, 12.5% of the clock period) or vectorless estimation are relatively inaccurate. This section gives an overall confidence rating in the toggle rate data, from low to high. This section also summarizes how many pins, registers, and combinational nodes obtained their toggle rates from each of simulation, user entry, vectorless estimation, or default toggle rate estimations. This detailed information helps you understand how to increase the confidence metric, letting you determine your own confidence in the toggle rate data.
Signal Activities
The Signal Activities section lists toggle rates and static probabilities assumed by power analysis for all signals with fan-out and pins. This section also lists the signal type (pin, registered, or combinational) and the data source for the toggle rate and static probability. By default, this section does not contain any data, but you can turn on the report with the Write signal activities to report file option on the Power Analyzer Settings page.
Intel recommends that you keep the Write signal activities to report file option turned off for a large design because of the large number of signals present. You can use the Assignment Editor to specify that activities for individual nodes or entities are reported by assigning an on value to those nodes for the Power Report Signal Activities assignment.
Messages
The Messages section lists the messages that the Intel® Quartus® Prime software generates during the analysis.
1.5. Power Analysis in Modular Design Flows
In modular or hierarchical design flows you develop each design block separately, and then instantiate these blocks into a higher-level design to form a complete design. The Intel Quartus Prime software supports simulation and power analysis of the top-level design or individual blocks with the design.
You can associate multiple .vcd simulation output files with specific node names, enabling the integration of partial design simulations into a complete design power analysis. When specifying multiple .vcd files for a node, more than one simulation file can contain signal activity information for the same signal. In those cases, the Power Analyzer follows these rules:
- When you apply multiple .vcd files to the same design node, the Power Analyzer calculates the signal activity as the equal-weight arithmetic average of each .vcd.
- When you apply multiple simulation files to design nodes at different levels in the design hierarchy, the signal activity in the power analysis derives from the simulation file that applies to the most specific design node.
The following figure shows an example of a hierarchical design:
The top-level module of the design, called Top, consists of three 8b/10b decoders, followed by a mux. The software encodes the output of the mux to produce the final output of the top-level module. An error-handling module handles any 8b/10b decoding errors. The Top module contains the top-level entity of the design and any logic not defined as part of another module. The design file for the top-level module can be a wrapper for the hierarchical entities or can contain its own logic.
The following usage scenarios show common ways that you can simulate the design and import the .vcd into the Power Analyzer:
1.5.1. Complete Design Simulation Power Analysis Flow
You can simulate the entire design and generate a .vcd from a third-party simulator. The Power Analyzer can then import the .vcd (specifying the top-level design). The resulting power analysis uses the signal activities information from the generated .vcd, including those that apply to submodules, such as decode [1-3], err1, mux1, and encode1.
1.5.2. Modular Design Simulation Power Analysis Flow
You can independently simulate the top-level design, and then import all the resulting .vcd files into the Power Analyzer. For example, you can simulate the 8b10b_dec independent of the entire design and mux, 8b10b_rxerr, and 8b10b_enc. You can then import the .vcd files generated from each simulation by specifying the appropriate instance name. For example, if the files produced by the simulations are 8b10b_dec.vcd, 8b10b_enc.vcd, 8b10b_rxerr.vcd, and mux.vcd, you can use the import specifications in the following table:
File Name | Entity |
---|---|
8b10b_dec.vcd | Top|8b10b_dec:decode1 |
8b10b_dec.vcd | Top|8b10b_dec:decode2 |
8b10b_dec.vcd | Top|8b10b_dec:decode3 |
8b10b_rxerr.vcd | Top|8b10b_rxerr:err1 |
8b10b_enc.vcd | Top|8b10b_enc:encode1 |
mux.vcd | Top|mux:mux1 |
The resulting power analysis applies the simulation vectors in each file to the assigned instance. Simulation provides signal activities for the pins and for the outputs of functional blocks. If the inputs to an instance are input pins for the entire design, the simulation file associated with that instance does not provide signal activities for the inputs of that instance. For example, an input to an instance such as mux1 has its signal activity specified at the output of one of the decode instances.
1.5.3. Multiple Simulation Power Analysis Flow
You can perform multiple simulations of an entire design or specific modules of a design. For example, in the process of verifying the top-level design, you can have three different simulation testbenches: one for normal operation, and two for corner cases. Each of these simulations produces a separate .vcd. In this case, apply the different .vcd file names to the same top-level entity, as shown in the following table.
File Name | Entity |
---|---|
normal.vcd | Top |
corner1.vcd | Top |
corner2.vcd | Top |
The resulting power analysis uses an arithmetic average of the signal activities calculated from each simulation file to obtain the final signal activities used. If a signal err_out has a toggle rate of zero transition per second in normal.vcd, 50 transitions per second in corner1.vcd, and 70 transitions per second in corner2.vcd, the final toggle rate in the power analysis is 40 transitions per second.
If you do not want the Power Analyzer to read information from multiple instances and take an arithmetic average of the signal activities, use a .vcd that includes only signals from the instance that you care about.
1.5.4. Overlapping Simulation Power Analysis Flow
You can perform a simulation on the entire design, and more exhaustive simulations on a submodule, such as 8b10b_rxerr. The following table lists the import specification for overlapping simulations:
File Name | Entity |
---|---|
full_design.vcd | Top |
error_cases.vcd | Top|8b10b_rxerr:err1 |
In this case, the software uses signal activities from error_cases.vcd for all the nodes in the generated .vcd and uses signal activities from full_design.vcd for only those nodes that do not overlap with nodes in error_cases.vcd. In general, the more specific hierarchy (the most bottom-level module) derives signal activities for overlapping nodes.
1.5.5. Partial Design Simulation Power Analysis Flow
You can perform a simulation in which the entire simulation time is not applicable to signal activity calculation. For example, if you run a simulation for 10,000 clock cycles and reset the chip for the first 2,000 clock cycles. If the Power Analyzer performs the signal activity calculation over all 10,000 cycles, the toggle rates are only 80% of their steady state value (because the chip is in reset for the first 20% of the simulation). In this case, you must specify the useful parts of the .vcd for power analysis. The Limit VCD Period option enables you to specify a start and end time when performing signal activity calculations.
1.5.5.1. Specifying Start and End Time for Signal Activity Calculations
To specify a start and end time for signal activity calculations using the Limit VCD period option, follow these steps:
- In the Intel® Quartus® Prime software, click Assignments > Settings.
- Under the Category list, click Power Analyzer Settings.
- Turn on the Use input file(s) to initialize toggle rates and static probabilities during power analysis option.
- Click Add.
- In the File name and Entity fields, browse to the necessary files.
- Under Simulation period, turn on VCD file and Limit VCD period options.
- In the Start time and End time fields, specify the desired start and end time.
- Click OK.
You can also use the following Tcl or .qsf assignment to specify .vcd files:
set_global_assignment -name POWER_INPUT_FILE_NAME "test.vcd" -section_id test.vcd set_global_assignment -name POWER_VCD_FILE_START_TIME "10 ns" -section_id test.vcd set_global_assignment -name POWER_VCD_FILE_END_TIME "1000 ns" -section_id test.vcd set_instance_assignment -name POWER_READ_INPUT_FILE test.vcd -to test_design
1.5.6. Vectorless Estimation Power Analysis Flow
Vectorless estimation statistically estimates the signal activity of a node based on the signal activities of nodes feeding that node, and on the actual logic function that the node implements. Vectorless estimation cannot derive signal activities for primary inputs. Vectorless estimation is accurate for combinational nodes, but not for registered nodes. Therefore, the Power Analyzer requires simulation data for at least the registered nodes and I/O nodes for accuracy.
1.6. Scripting Support
You can run procedures and create settings described in this chapter in a Tcl script. Alternatively, you can run procedures at a command prompt. For more information about scripting command options, refer to the Intel® Quartus® Prime Command-Line and Tcl API Help browser. To run the Help browser, type the following command at the command prompt:
quartus_sh --qhelp
1.6.1. Running the Power Analyzer from the Command–Line
quartus_pow --helpor
quartus_sh --qhelp
The following lists the examples of using the quartus_pow executable. Type the command listed in the following section at a system command prompt:
-
To instruct the Power Analyzer to generate a EPE File:
quartus_pow sample --output_epe=sample.csv ←
-
To instruct the Power Analyzer to generate a EPE File without
performing the power estimate:
quartus_pow sample --output_epe=sample.csv --estimate_power=off ←
-
To instruct the Power Analyzer to use a .vcd as input (sample.vcd):
quartus_pow sample --input_vcd=sample.vcd ←
-
To instruct the Power Analyzer to use two .vcd files as input files (sample1.vcd and sample2.vcd), perform glitch filtering on the .vcd and use a default input I/O toggle rate of
10,000 transitions per second:
quartus_pow sample --input_vcd=sample1.vcd --input_vcd=sample2.vcd \ --vcd_filter_glitches=on --\ default_input_io_toggle_rate=10000transitions/s
-
To instruct the Power Analyzer not to use an input file,
specify a default input I/O toggle rate of 60%, with vectorless estimation off,
and a default toggle rate of 20% on all remaining signals:
quartus_pow sample --no_input_file --default_input_io_toggle_rate=60% \ --use_vectorless_estimation=off --default_toggle_rate=20%
The quartus_pow executable creates a report file, <revision name> .pow.rpt. You can locate the report file in the main project directory. The report file contains the same information that the Power Analyzer Compilation Report.
1.7. Power Analysis Revision History
The following revision history applies to this chapter:
Document Version | Intel® Quartus® Prime Version | Changes |
---|---|---|
2020.12.07 | 20.3.0 | Added note to the Specifying the Default Toggle Rate topic. |
2019.12.04 | 19.1.0 |
|
2019.08.02 | 19.1.0 |
|
2019.07.03 | 19.1.0 |
|
2018.09.24 | 18.1.0 |
|
2018.06.11 | 18.0.0 |
|
2017.05.08 | 17.0.0 | Removed references to PowerPlay name. Power analysis occurs in the Intel Quartus Prime Power Analyzer. |
2015.11.02 | 15.1.0 | Changed instances of Quartus II to Intel Quartus Prime. |
2014.12.15 | 14.1.0 |
|
2014.08.18 | 14.0a10.0 | Updated "Current Drawn from Voltage Supplies" to clarify that for SoC devices or for Arria V SoC and Cyclone V SoC devices, there is no standalone ICC_AUX_SHARED current drawn information. The ICC_AUX_SHARED is reported together with ICC_AUX. |
November 2012 | 12.1.0 |
|
June 2012 | 12.0.0 |
|
November 2011 | 10.1.1 |
|
December 2010 | 10.1.0 |
|
July 2010 | 10.0.0 |
|
November 2009 | 9.1.0 |
|
March 2009 | 9.0.0 |
|
November 2008 | 8.1.0 |
|
May 2008 | 8.0.0 |
|
2. Power Optimization
This chapter describes the power-driven compilation feature and flow in detail, as well as low power design techniques that can further reduce power consumption in your design. The techniques primarily target Arria® , Stratix® , and Cyclone® series of devices. These devices utilize a low-k dielectric material that dramatically reduces dynamic power and improves performance. Arria® series, Stratix® IV, and Stratix® V device families include efficient logic structures called adaptive logic modules (ALMs) that obtain maximum performance while minimizing power consumption. Cyclone® device families offer the optimal blend of high performance and low power in a low-cost FPGA.
This chapter focuses on design optimization options and techniques that help reduce core dynamic power and I/O power. In addition to these techniques, there are additional power optimization techniques available for specific devices, including Programmable Power Technology and Device Speed Grade Selection.
2.1. Factors Affecting Power Consumption
- Design Activity and Power Analysis
- Device Selection
- Environmental Conditions
The main environmental parameters affecting junction temperature are operating temperature and the cooling solution. - Device Resource Usage
Power consumption depends on the number and types of device resources that a design uses. - Signal Activity
2.1.1. Design Activity and Power Analysis
Power consumption of a device also depends on the design's activity over time. Static power (PSTATIC) is the thermal power that a chip dissipates independent of user clocks. PSTATIC includes leakage power from all FPGA functional blocks, except for I/O DC bias power and transceiver DC bias power, which are accounted for in the I/O and transceiver sections. Dynamic power is the additional power consumption of a device due to signal activity or switching.
2.1.2. Device Selection
Device families have different power characteristics. Many parameters affect the device family power consumption, including choice of process technology, supply voltage, electrical design, and device architecture.
Power consumption also varies in a single device family. A larger device with more transistors consumes more static power than a smaller device in the same family. In devices that employ global routing architectures, dynamic power can also increase with device size.
The choice of device package also affects the ability of the device to dissipate heat, and you may need to use a different cooling solution to comply with junction temperature constraints.
Process variation can affect power consumption. Process variation primarily impacts static power, because sub-threshold leakage current varies exponentially with changes in transistor threshold voltage. Therefore, you must consult device specifications for static power, and not rely on empirical observation. Process variation has a weak effect on dynamic power.
2.1.3. Environmental Conditions
The following table lists the environmental conditions that influence power consumption.
Environmental Condition | Description |
---|---|
Airflow | Measures how quickly the device replaces heated air from
the vicinity of the device with air at ambient temperature. You can either specify airflow as “still air” when you are not using a fan, or as the linear feet per minute rating of the fan in the system. Higher airflow decreases thermal resistance. |
Heat Sink and Thermal Compound | A heat sink allows more efficient heat transfer from the device to the surrounding area because of its large surface area exposed to the air. The thermal compound that interfaces the heat sink to the device also influences the rate of heat dissipation. The case-to-ambient thermal resistance (θCA) parameter describes the cooling capacity of the heat sink and thermal compound employed at a given airflow. Larger heat sinks and more effective thermal compounds reduce θCA. |
Junction Temperature | The junction temperature of a device
is equal to:
TJunction=TAmbient+PThermal·θJAin which θJA is the total thermal resistance from the device transistors to the environment, in degrees Celsius per watt. The value θJA is equal to the sum of the junction-to-case (package) thermal resistance (θJC), and the case-to-ambient thermal resistance (θCA) of the cooling solution. |
Board Thermal Model | The junction-to-board thermal resistance (θJB) is the thermal resistance of the path through the board, in degrees Celsius per watt. To compute junction temperature, you can use this board thermal model along with the board temperature, the top-of-chip θJA and ambient temperatures. |
2.1.4. Device Resource Usage
2.1.4.1. Number, Type, and Loading of I/O Pins
2.1.4.2. Number and Type of Hard Logic Blocks
A design with more logic elements (LEs), multiplier elements, memory blocks, transceiver blocks, or HPS system tends to consume more power than a design with fewer circuit elements. The operating mode of each circuit element also affects its power consumption.
For example, a DSP block performing 18 × 18 multiplications and a DSP block performing multiply-accumulate operations consume different amounts of dynamic power, because of different amounts of charging internal capacitance on each transition. The operating mode of a circuit element also affects static power.
2.1.4.3. Number and Type of Global Signals
Global signal networks span large portions of the device and have high capacitance, resulting in significant dynamic power consumption. The type of global signal is important as well. Global clocks cover the entire device, whereas quadrant clocks only span one-fourth of the device. For example, Stratix® V devices support global clocks and quadrant (regional) clocks. Clock networks that span smaller regions have lower capacitance and tend to consume less power. The location of the logic array blocks (LABs) driven by the clock network can also have an impact because the Intel® Quartus® Prime software automatically disables unused branches of a clock.
2.1.5. Signal Activity
The behavior of each signal in the design is an important factor in estimating power consumption. To get accurate results from the power analysis, the signal activity must represent the actual operating behavior of the design.
The two most important behaviors of a signal are toggle rate and static probability.
2.1.5.1. Toggle Rate
Dynamic power increases linearly with the toggle rate as you charge the board trace model more frequently for logic and routing. The Intel® Quartus® Prime software models full rail-to-rail switching. For high toggle rates, especially on circuit output I/O pins, the circuit can transition before fully charging the downstream capacitance. The result is a slightly conservative prediction of power by the Power Analyzer.
2.1.5.2. Static Probability
The static probability of input signals impacts the design's static power consumption, due to state-dependent leakage in routing and logic. This effect becomes more important for smaller geometries. In output I/O standards that drive termination resistors, the static power also depends on the static probability on I/O pins.
2.2. Power Dissipation
The following figure shows the power dissipation of Stratix® and Cyclone® devices in different designs. The analysis considers a fixed clock rate of 100 MHz and exhibits varied logic resource utilization across available resources.
Notes:
- These results originate from 103 designs.
- These results originate from 96 designs.
- In designs using DSP blocks, DSPs consumed 5% of core dynamic power.
In Stratix® and Cyclone® device families, a series of column and row interconnect wires of varying lengths provide signal interconnections between logic array blocks (LABs), memory block structures, and digital signal processing (DSP) blocks or multiplier blocks. These interconnects dissipate the largest component of device power.
FPGA combinational logic is another source of power consumption. For more information about ALMs and LEs in Cyclone® or Stratix® devices, refer to the respective device handbook.
Memory and clock resources are other major consumers of power in FPGAs. Stratix® devices feature the TriMatrix memory architecture. TriMatrix memory includes 512-bit M512 blocks, 4-Kbit M4K blocks, and 512-Kbit M-RAM blocks, which are configurable to support many features. Stratix® IV TriMatrix on-chip memory is an enhancement based upon the Stratix® II FPGA TriMatrix memory and includes three sizes of memory blocks: MLAB blocks, M9K blocks, and M144K blocks. Stratix® IV and Stratix® V devices feature Programmable Power Technology, an advanced architecture that enables a smooth trade-off between speed and power. The core of each Stratix® IV and Stratix® V device is divided into tiles, each of which may be put into a high-speed or low-power mode. The primary benefit of Programmable Power Technology is to reduce static power, with a secondary benefit being a small reduction in dynamic power. Cyclone® IV GX devices have 9-Kbit M9K memory blocks.
2.3. Design Space Explorer II for Power-Driven Optimization
The DSE II offers two options in Exploration mode that target power optimization: Power (High Effort) and Power (Aggressive). In both cases, the target is an overall improvement in the design's power; specifically, reducing the total thermal power in the design.
When the optimization targets power, the DSE II runs the Intel® Quartus® Prime Power Analyzer for every group of settings. The resultant reports help you debug the design and determine trade-offs between power requirements and performance optimization.
2.4. Power-Driven Compilation
Intel® Quartus® Prime software settings that control power-driven compilation are located in the Power optimization during synthesis list in the Advanced Settings (Synthesis) dialog box, and the Power optimization during fitting list on the Advanced Fitter Settings dialog box. The following sections describes these power optimization options at the Analysis and Synthesis and Fitter levels.
2.4.1. Power-Driven Synthesis
The Power Optimization During Synthesis logic option determines how aggressively Analysis & Synthesis optimizes the design for power. To access this option at a project level, click Assignments > Settings > Compiler Settings > Advanced Settings (Synthesis).
Settings | Description | Optimization Techniques Included |
---|---|---|
Off | The Compiler does not perform netlist, placement, or routing optimizations to minimize power. | - |
Normal compilation (Default) | The Compiler applies low compute effort algorithms to minimize power through netlist optimizations that do not reduce design performance. |
|
Extra effort | Besides the techniques in the Normal compilation setting, the Compiler applies high-compute-effort algorithms to minimize power through netlist optimizations. Selecting this option might impact performance. |
|
You can also control memory optimization options from the Intel® Quartus® Prime Settings dialog box. The Default Parameters page allows you to edit the Low_Power_Mode parameter. The settings for this parameter are equivalent to the values of the Power Optimization During Synthesis logic options. The Low_Power_Mode parameter always takes precedence over the Optimize Power for Synthesis option for power optimization on memory.
Parameter Value | Equivalent Setting in Power Optimization During Synthesis Logic Option |
---|---|
None | Off |
Auto | Normal compilation |
All | Extra effort |
2.4.1.1. Memory Block Optimization
Memory blocks can represent a large fraction of total design dynamic power. Minimizing the number of memory blocks accessed during each clock cycle can significantly reduce memory power.
In the default implementation of a simple dual-port memory block, write-clock enable signals and read-clock enable signals connect to VCC, making both read and write memory ports active during each clock cycle.
Memory transformation moves the read-enable and write-enable signals to the respective read-clock enable and write-clock enable signals. This technique reduces the design’s memory power consumption, because memory ports are shut down when they are not accessed.
For Stratix® IV and Stratix® V devices, the memory transformation takes place at the Fitter level by selecting the Normal compilation settings for the power optimization option.
In Cyclone® IV GX and Stratix® IV devices, the read-during-write behavior impacts the power of single-port and bidirectional dual-port RAMs. As a best practice, you can allow optimization by setting the read-during-write parameter to “Don’t care” at the HDL level, and set the read-enable signal to the inversion of the existing write-enable signal (if one exists). This allows the core of the RAM to shut down, which prevents switching, saving a significant amount of power.
2.4.1.2. Power-Aware Logic Mapping
2.4.1.3. Power-Aware Memory Balancing
The Compiler includes this optimization technique when the Power Optimization During Synthesis logic option is set to Extra effort.
The following figure is an example of a 4k × 4 (4k deep and 4 bits wide) memory implementation in two different configurations using M4K memory blocks available in some Stratix® devices.
The minimum logic area implementation configures M4K blocks as 4k × 1. The Intel® Quartus® Prime software uses this implementation as the default, because the resulting design has the minimum logic area (0 logic cells) and the highest speed. However, all four M4K blocks are active on each memory access, which increases RAM power.
The minimum RAM power implementation configures four M4K blocks as 1k × 4 for optimal power saving. The RAM IP core includes an address decoder to select which of the four M4K blocks is active on a given cycle, based on the state of the top two user address bits. The RAM IP core implements a multiplexer to feed the downstream logic by choosing the appropriate M4K output. This implementation reduces RAM power because only one M4K block is active on any cycle, but it requires extra logic cells, costing logic area and potentially impacting design performance.
There is a trade-off between power saved by accessing fewer memories and power consumed by the extra decoder and multiplexor logic. The Intel® Quartus® Prime software automatically balances the power savings against the costs to choose the lowest power configuration for each logical RAM. The benchmark data shows that the power-driven synthesis can reduce memory power consumption by as much as 60% in Stratix® devices.
You can also set the MAXIMUM_DEPTH parameter manually to configure the memory for low power optimization. This technique is the same as the power-aware memory balancer, but it is manual rather than automatic like the Extra effort setting in the Power optimization list. The MAXIMUM_DEPTH parameter always takes precedence over the Optimize Power for Synthesis options for power optimization on memory optimization. You can set the MAXIMUM_DEPTH parameter for memory modules manually in the Intel® FPGA IP instantiation or in the IP Catalog.
2.4.2. Power-Driven Fitter
Option | Description |
---|---|
Off | The Fitter does not perform optimizations to minimize power. |
Normal compilation (Default) |
The Fitter applies low compute effort algorithms to minimize power through placement and routing optimizations. These techniques do not reduce design performance. Includes DSP optimizations that create power-efficient DSP block configurations for DSP functions. |
Extra effort | Besides the optimization techniques of the Normal Compilation option, the Fitter applies high compute effort algorithms to minimize power through placement and routing optimizations. These techniques might impact performance. The Extra effort setting for the Fitter requires extensive effort to optimize the design for power and can increase compilation time. |
For Stratix® IV and Stratix® V devices, the Normal compilation setting enables the Programmable Power Technology to configure tiles as high-speed mode or low-power mode. Programmable Power Technology is always turned ON even when the OFF setting is selected for the Power optimization option. Tiles are the combination of LAB and MLAB pairs (including the adjacent routing associated with LAB and MLAB), which can be configured to operate in high-speed or low-power mode. This level of power optimization does not have any affect on the fitting, timing results, or compile time.
The Extra effort setting the Fitter works to minimize power even after the design meets timing requirements by moving the logic closer during placement to localize high-toggling nets and choosing routes with low capacitance.
The Extra effort setting uses a Value Change Dump (.vcd) file that guides the Fitter to fully optimize the design for power, based on the signal activity of the design. The best power optimization during fitting results from using the most accurate signal activity information. If there is no .vcd file, the Intel® Quartus® Prime software estimates the signal activities from the settings in the Power Analyzer Settings page in the Settings dialog box, such as assignments, clock assignments, and vectorless estimation values. The benchmark data shows that the power-driven Fitter technique can reduce power consumption by as much as 19% in Stratix® devices. On average, you can reduce core dynamic power by 16% with the Extra effort synthesis and Extra effort fitting settings, as compared to the Off settings in both synthesis and Fitter options for power-driven compilation.
2.4.3. Area-Driven Synthesis
The Intel® Quartus® Prime software provides Speed, Balanced, or Area for the Optimization Technique option. You can also specify this logic option for specific modules in your design with the Assignment Editor in cases where you want to reduce area using the Area setting (potentially at the expense of register-to-register timing performance) while leaving the default Optimization Technique setting at Balanced (for the best trade-off between area and speed for certain device families). The Speed Optimization Technique can increase the resource usage of your design if the constraints are too aggressive and can also result in increased power consumption.
The benchmark data shows that the area-driven technique can reduce power consumption by as much as 31% in Stratix® devices and as much as 15% in Cyclone® devices.
2.4.4. Gate-Level Register Retiming
The Perform gate-level register retiming option in the Intel® Quartus® Prime software enables the movement of registers across combinational logic to balance timing, allowing the software to trade off the delay between critical and noncritical paths.
Retiming uses fewer registers than pipelining. In this example of gate-level register retiming, the 10 ns critical delay is reduced by moving the register relative to the combinational logic, resulting in the reduction of data depth and switching activity.
Gate-level register retiming makes changes at the gate level. If you are using an atom netlist from a third-party synthesis tool, you must also select the Perform WYSIWYG primitive resynthesis option to undo the atom primitives to gates mapping (so that register retiming can be performed), and then to remap gates to Intel primitives.
When using Intel® Quartus® Prime integrated synthesis, retiming occurs during synthesis before the design is mapped to Intel primitives. The benchmark data shows that the combination of WYSIWYG remapping and gate-level register retiming techniques can reduce power consumption by as much as 6% in Stratix® devices and as much as 21% in Cyclone® devices.
2.4.5. Intel Quartus Prime Compiler Settings
To set the optimization mode on the Intel® Quartus® Prime software, click Assignments > Settings > Compiler Settings.
The two power optimization modes direct the Compiler to prioritize one optimization metric.
Power (High effort—increases runtime)
High effort modes enable additional optimizations that increase compilation time and do not affect design performance. High Power Effort mode guides the Compiler to spend additional compilation time reducing routing utilization, which saves dynamic power.
Power (Aggressive—increases runtime, reduces performance)
Aggressive modes increase compilation time and make trade-offs that may harm other optimization metrics (performance, area, etc.). In Aggressive Power mode, the Compiler attempts to reduce the routing usage of signals with the highest specified (via Signal Activity File) or estimated toggle rates, saving additional dynamic power but potentially affecting performance.
2.4.6. Assignment Editor Options
The Optimization Technique logic option specifies the overall optimization goal for Analysis & Synthesis: attempt to maximize performance or minimize logic usage.

The Power Optimization During Synthesis logic option determines how aggressively Analysis & Synthesis optimizes the design for power.

Settings | Description | Optimization Techniques Included |
---|---|---|
Off | The Compiler does not perform netlist, placement, or routing optimizations to minimize power. | - |
Normal compilation (Default) | The Compiler applies low compute effort algorithms to minimize power through netlist optimizations that do not reduce design performance. |
|
Extra effort | Besides the techniques in the Normal compilation setting, the Compiler applies high-compute-effort algorithms to minimize power through netlist optimizations. Selecting this option might impact performance. |
|
2.5. Design Guidelines
2.5.1. Clock Power Management
The Intel® Quartus® Prime software optimizes clock routing power automatically, enabling only those portions of the clock network that are necessary to feed downstream registers.
2.5.1.1. Clock Enable in Memory Blocks
When a memory block is clocked, a sequence of timed events occur within the block to execute a read or write. The circuitry that the clock controls consumes the same amount of power, independent of changes in address or data from one cycle to the next. Thus, the toggle rate of input data and the address bus have no impact on memory power consumption.
The key to reducing memory power consumption is to reduce the number of memory clocking events. You can achieve this reduction through network-wide clock gating, or on a per-memory basis through use of the clock enable signals on the memory ports.
The clock enable signal enables the memory only when necessary, and shuts down for the rest of the time, reducing the overall memory power consumption. You include these enable signals when generating the memory block function.
For example, consider a design that contains a 32-bit-wide M4K memory block in ROM mode that is running at 200 MHz. Assuming that the output of this block is only required approximately every four cycles, this memory block consumes 8.45 mW of dynamic power according to the demands of the downstream logic. By adding a small amount of control logic to generate a read clock enable signal for the memory block only on the relevant cycles, the power can be cut 75% to 2.15 mW.
You can also use the MAXIMUM_DEPTH parameter in your memory IP core to save power in Cyclone® IV GX, Stratix® IV, and Stratix® V devices; however, this approach might increase the number of LEs required to implement the memory and affect design performance.
The Intel® Quartus® Prime software automatically chooses the best design memory configuration for optimal power. However, you can set the MAXIMUM_DEPTH parameter for memory modules during the IP core instantiation.
2.5.1.1.1. Memory Power Reduction Example
M4K Configuration | Number of M4K Blocks | ALUTs |
---|---|---|
4K × 1 (Default setting) | 36 | 0 |
2K × 2 | 36 | 40 |
1K × 4 | 36 | 62 |
512 × 9 | 32 | 143 |
256 × 18 | 32 | 302 |
128 × 36 | 32 | 633 |
Using the MAXIMUM_DEPTH parameter can save power. For all implementations, a user-provided read enable signal is present to indicate when read data is required. Using this power-saving technique can reduce power consumption by as much as 60%.
As the memory depth becomes more shallow, memory dynamic power decreases because unaddressed M4K blocks can be shut off using a decoded combination of address bits and the read enable signal. For a 128-deep memory block, power used by the extra LEs starts to outweigh the power gain achieved by using a more shallow memory block depth. The power consumption of the memory blocks and associated LEs depends on the memory configuration.
2.5.1.2. LAB Clock Power
To reduce LAB-wide clock power consumption without disabling the entire clock tree, use the LAB-wide clock enable to gate the LAB-wide clock. The Intel® Quartus® Prime software automatically promotes register-level clock enable signals to the LAB-level. A shared gated clock controls all registers within an LAB that share a common clock and clock enable. To take advantage of these clock enables, use a clock enable construct in the relevant HDL code for the registered logic.
2.5.1.2.1. LAB-Wide Clock Enable Example
IF clk'event AND clock = '1' THEN IF logic_is_enabled = '1' THEN reg <= value; ELSE reg <= reg; END IF; END IF;
2.5.1.3. Clock Enables
Use clock enables instead of gated clocks:
assign clk_gate = clk1 & gateA & gateB; always @ (posedge clk_gate) begin sr[N-1:1] <= sr[N-2:0]; sr[0]<=din1; end
assign enable = gateA & gateB; always @(posedge clk2) begin if (enable) begin sr[N-1:1] <= sr[N-2:0]; sr[0]<=din2; end end

Reduce LAB-wide clock power consumption without disabling the entire clock tree, use the LAB-wide clock enable to gate the LAB-wide clock.
always @(posedge clk) begin if (ena) temp <= dataa; else temp <= temp; end end
2.5.1.4. Global Signals
Intel® FPGAs have different kinds of global signal resources available. Global signals can span the entire chip or smaller regions. Choose the clock networks that can cover all the fanout on a specific domain. For example, you can reduce clock power by switching from a clock network that spans the entire chip to one quarter of the chip, provided all the fanout for that clock is within that region of the chip.
2.5.1.4.1. Viewing Clock Details in the Chip Planner
- Open the Chip Planner (Tools > Chip Planner).
-
In the Task pane, under
Clock Reports, double-click
Report Clock Details.
Figure 28. Chip Planner Task PaneFigure 29. Report Clock Details
-
Click OK.
The Report pane generates the Clock folder.
- Expand the Clock folder and select Used spine clock regions to highlight on the Chip planner.
-
In the Layers Settings
pane, turn on Regional/Periphery clock
region to see whether used spine clock regions are within.
Figure 30. Clock Highlight in Chip PlannerThis example uses a Regional clock Region instead of a global signal.
2.5.1.5. Merge Clocks
Evaluate the possibility of merging clocks and PLLs in the design.
Design | 2clks & 2PLLs | 1 Clk & 1 PLL |
---|---|---|
Oc_dma_stamp25 |
6.079W | 5.46W |
- 2clks & 2PLLs
Clk1:350Mhz, Fanout 46788
Clk2: 365Mhz, Fanout 2450
- 1Clk & 1PLL
Merge clks
clk: 365Mhz, Fanout 51277
2.5.2. Pipelining and Retiming
For example, consider a 2-input XOR gate where one input changes from 1 to 0, and moments later the other input changes from 0 to 1. For a short time, both inputs become 1 (high), resulting in 0 (low) at the output of the XOR gate. Then, when the second input transition takes place, the XOR gate output becomes 1 (high). Therefore, before the output becomes stable, the input delay produces a glitch in the output.
A glitch can propagate to subsequent logic and create unnecessary switching activity, increasing power consumption. Circuits with many XOR functions, such as arithmetic circuits or cyclic redundancy check (CRC) circuits, tend to have many glitches if there are several levels of combinational logic between registers.
Registers stop glitches from propagating through combinational paths. Pipelining is a technique that breaks combinational paths by inserting registers. By reducing logic-level numbers between registers, pipelining can result in higher clock speed operations. However, pipelining increases the latency of a circuit in terms of the number of clock cycles to a first result.
The following figure shows how pipelining breaks a long combinational path.
This reduction in switching activity lowers power dissipation in combinational logic. However, for designs with few glitches, pipelining can increase power consumption by adding unnecessary registers. Pipelining can also increase resource utilization. Benchmark data shows that pipelining can reduce dynamic power consumption by as much as 30% in Cyclone® and Stratix® devices.
2.5.3. Architectural Optimization
The Stratix® device family allows you to efficiently target small, medium, and large memories with the TriMatrix memory architecture. Each TriMatrix memory block is optimized for a specific function. M512 memory blocks are more power-efficient than the distributed memory structures in some competing FPGAs. With M4K memory blocks you can implement buffers for a wide variety of applications, including processor code storage, large look-up table implementation, and large memory applications. The M-RAM blocks are useful in applications when storing a large volume of data on-chip is necessary. Effective utilization of these memory blocks can have a significant impact on power reduction in the design.
The latest Stratix® and Cyclone® device families have configurable M9K memory blocks that provide memory functions such as RAM, FIFO buffers, and ROM.
2.5.4. I/O Power Guidelines
Nonterminated I/O Standards
Nonterminated I/O standards such as LVTTL and LVCMOS have a rail-to-rail output swing. The voltage difference between logic-high and logic-low signals at the output pin is equal to the VCCIO supply voltage. If the capacitive loading at the output pin is known, the following expression determines the dynamic power consumed in the I/O buffer:
- F is the output transition frequency
- C is the total load capacitance being switched
- V is equal to VCCIO supply voltage
Transistor-to-transistor logic (TTL) I/O buffers consume very little static power. As a result, the total power that a LVTTL or LVCMOS output consumes is highly dependent on load and switching frequency.
Resistively Terminated I/O Standards
In resistively terminated I/O standards like SSTL and HSTL, the output load voltage swings by a small amount around a bias point. The dynamic power equation above is valid as well, but V is the actual load voltage swing. This voltage is much smaller than VCCIO, resulting in lower dynamic power when comparing to nonterminated I/O under similar conditions.
Resistively terminated I/O standards dissipate significant static (frequency-independent) power, because the I/O buffer is constantly driving current into the resistive termination network. However, the lower dynamic power of these I/O standards means they often have lower total power than LVCMOS or LVTTL for high-frequency applications. As a best practice, when using resistively terminated standards choose the lowest drive strength I/O setting that meets the speed and waveform requirements to minimize I/O power.
You can save a small amount of static power by connecting unused I/O banks to the lowest possible VCCIO voltage.
2.5.5. Memory Optimization (M20K/MLAB)
Some guidelines to optimize the use of memories are:
-
Port shallow memories from M20K to MLAB.
For example, implement in HDL with ramstyle attribute:
(* ramstyle = "MLAB" *) reg [0:7] my_ram[0:63];
-
Avoid read-during-write behavior and set to Don’t care (at the HDL level) wherever possible.
Read-during-write behavior impact the power of single-port and bidirectional dual-port RAMs. Don’t care allows an optimization that sets the read-enable signal to the inversion of the existing write-enable signal (if one exists). This allows the core of the RAM to shut down, which prevents switching, saving a significant amount of power.
-
Pack input/output registers in M20K.
2.5.5.1. Implementation
Memory Block | Depth (bits) | Programmable Width |
---|---|---|
MLAB | 32 | x16, x18, or x20 |
64 3 | x8, x9, x10 | |
M20K | 512 | x40, x32 |
1K | x20, x16 | |
2K | x10, x8 | |
4K | x5, x4 | |
8K | x2 | |
16K | x1 |

2.5.5.2. Rd/Wr Enables

2.5.6. DDR Memory Controller Settings
Low Power Mode Settings
- Enable Auto Power-Down—directs the controller to place the memory device in power-down mode after a specific number of idle controller clock cycles. You can configure the idle wait time. All ranks must be idle to enter auto power-down.
- Auto Power-Down Cycles—specifies the number of cycles the controller must be IDLE before entering the power down state. You determine the number based on the traffic pattern. If the number is too small, the control enters power down too frequently, affecting efficiency. The Intel® Arria® 10 device family supports from 1 to 65534 cycles.
2.5.7. DSP Implementation
When you maximize the packing of DSP blocks, you reduce Logic Utilization, power consumption, and increase efficiency. The HDL coding style grants you control of the DSP resources available in the FPGA.
Implement Multiplier + Accumulator in 1 DSP
always @ (posedge clk) begin if (ena) begin dataout <= dataa * datab + datac * datad; end end
Implement multiplication in 2 DSPs and the adder in LABs
always @ (posedge clk) begin if (ena) begin mult1 <= dataa * datab; mult2 <= datac * datad; end end always @(posedge clk) begin if (ena) begin dataout <= mult1 + mult2 end end
2.5.8. Reducing High-Speed Tile (HST) Usage
-
In the Advanced Fitter
Settings pane, The Programmable Power
Technology Optimization logic option controls how the fitter
configures tiles to operate in high-speed mode or low-power mode. Select
Minimize Power Only.
Figure 36. Programmable Power Technology Optimization
-
Identify entity modules that use HST by plotting entity modules and HST heatmap on the Chip Planner and modify the floorplan to reduce usage.
Figure 37. Entity Modules and HST Heatmap on the Chip Planner
2.5.9. Unused Transceiver Channels
Transceivers in the device degrade over time unless you preserve them. The Intel® Quartus® Prime software generates a warning message if a design contains unused XCVRs.
You do not need to preserve transceivers under 8Gbps. For transceivers over 8Gbps, the best practice is to preserve if there is a possibility for future usage. Otherwise, you can turn the transceivers off. You enable unused transceivers through dynamic reconfiguration or a new device programming file.
2.5.10. Periphery Power reduction XCVR Settings
2.5.10.1. Transceiver Settings
- Use min VCCR/T possible (depending on data rate).
- Certain devices have DFE ON by default. If possible, turn off the channel, This depends on the how lossy is the channel.
-
Turn off PDN compensation.
This setting induces jitter, which is necessary to check system tolerance.
- Use one equalizer stage.
DFE | Adaptation | Equalizer Stage | Transmitter High-Speed Compensation |
---|---|---|---|
Disabled | Disabled | Non-S1 Mode | Disabled |
Disabled | Disabled | Non-S1 Mode | Enabled |
Disabled | Disabled | N/A | Enabled |
2.5.10.2. I/O Current Strength
2.6. Power Optimization Advisor
The Power Optimization Advisor organizes the recommendations into stages that suggest the implementation order. Each recommendation includes a description, summary of the effect of the recommendation, and the action required to make the appropriate setting.
An icon indicates whether each recommended setting is made in the current project. Checkmark icons appear next to recommendations that are already implemented, warning icons appear next to recommendations that are not followed for this compilation. Information icons indicate general suggestions.
Recommendations include a link to the location in the Intel® Quartus® Prime GUI where you can change the setting. After implementing the recommended changes, recompile your design. You can verify power results with the Power Analyzer.
2.6.1. Set Realistic Timing Constraints
2.6.1.1. Find Timing Information
-
To find False or Multi-Cycle Paths, click Report Ignored Constraints in the Timing Analyzer
Tasks pane.
Figure 39. Report Ignored Constraints
-
To see a list of the 10 paths with highest delay in the design, in the Reports pane find Fitter Summary Report > Estimate Delay Added for Hold Timing > Details.
2.6.2. Appropriate Device Family
2.6.3. Dynamic Power

2.6.4. Static Power
Small Device
Use the smallest device which can fit your design.
2.6.5. Appropriate I/O Standards
2.6.6. Use RAM Blocks
2.6.7. Shut Down RAM Blocks
2.6.8. Clock Enables on Logic
2.6.9. Pipeline Logic to Reduce Glitching
Long chains of cascaded logic blocks can create glitches due to path delay differences between the input signals. Inserting Flip-Flops to cut these long chains terminates the propagation of glitches to consecutive logic cells.
Circuits that heavily use of XIO functions (for example, Cyclic redundancy check) tend to glitch significantly when cascaded. Add pipeline registers or re-architect to reduce signal toggling.
Glitch Prone Design

2.7. Power Optimization Revision History
The following revision history applies to this chapter:
Document Version | Intel® Quartus® Prime Version | Changes |
---|---|---|
2020.12.07 | 18.1.0 | Added note to the Toggle Rate topic. |
2019.08.02 | 18.1.0 |
|
2018.09.24 | 18.1.0 |
|
2018.06.11 | 18.0.0 |
|
2018.05.07 | 18.0.0 |
|
2016.10.31 | 16.1.0 |
|
2015.11.02 | 15.1.0 | Changed instances of Quartus II to
Intel®
Quartus® Prime
.
|
2014.12.15 | 14.1.0 |
|
2014.06.30 | 14.0.0 | Updated the format. |
May 2013 | 13.0.0 | Added a note to “Memory Power Reduction Example” on Qsys and SOPC Builder power savings limitation for on-chip memory block. |
June 2012 | 12.0.0 | Removed survey link. |
November 2011 | 10.0.2 | Template update. |
December 2010 | 10.0.1 | Template update. |
July 2010 | 10.0.0 |
|
November 2009 | 9.1.0 |
|
March 2009 | 9.0.0 |
|
November 2008 | 8.1.0 |
|
May 2008 | 8.0.0 |
|
A. Intel Quartus Prime Standard Edition User Guides
Refer to the following user guides for comprehensive information on all phases of the Intel® Quartus® Prime Standard Edition FPGA design flow.