Agilex™ 7 Power Management User Guide

ID 683373
Date 11/25/2024
Public
Document Table of Contents

5.1.2.4. Fault Management and Error Reporting

The SDM firmware has the capability to detect errors, faults, or warnings in the PMBus throughout the initialization and monitor states. The firmware analyzes any error and puts it into the Error Message Queue (EMQ). During configuration, the CONFIG_STATUS mailbox command notifies you about the error.

In the master mode while running the monitor state, the SDM firmware queries the voltage regulator with a STATUS_BYTE command for every 500 ms. If the value returned from the STATUS_BYTE is not equal to zero, it indicates an error, fault, or warning within the voltage regulator. This firmware reports the error through the EMQ and asserts the SEU_ERROR pin to notify you of this error.

In the slave mode, the SDM firmware asserts the PWRMGT_ALERT signal whenever an error occurs. The external PMBus master has to initiate the ARA flow to handshake with the FPGA to read the error from the firmware.

The STATUS_BYTE Polling

The STATUS_BYTE polling is an optional feature. To change the setting of the STATUS_BYTE polling, refer to the Specifying Power Management and VID Parameters and Option section and Table: Power Management and VID Parameters.

Figure 28. Fault Management and Error Reporting
Note: The polling of the STATUS_BYTE every 500 ms in the master mode is applicable when you select a voltage regulator that has been validated by Altera. If you select "Others" in the Quartus® Prime GUI, the STATUS_BYTE polling is disabled.

The following table shows the error of the STATUS_BYTE based on the return bit.

Table 26.   STATUS_BYTE Error Definition
Command Error Definition
STATUS_BYTE (78h) Bit[7]: Busy, unable to respond
Bit[6]: Off, not enabled
Bit[5]: Output over voltage fault occurred
Bit[4]: Output over current fault occurred
Bit[3]: Input under voltage fault occurred
Bit[2]: Temperature fault or warning occurred
Bit[1]: Communication, memory, or logic fault occurred
Bit[0]: The fault occurred that are not listed above
Each bit set in the STATUS_BYTE indicates a separate error occurring in the voltage regulator. The firmware reports these errors to the EMQ. For example, a value of 0x6 (b'0000_0110) returned from reading the STATUS_BYTE signifies that the voltage regulator has:
  • Communication, memory, or logic fault
  • Temperature fault or warning

Consequently, the firmware creates two separate entries in the EMQ, reflecting each identified error or fault.

The Importance of Safety Limit Settings in the Voltage Regulator

You must program non-volatile memory (NVM) in the voltage regulator correctly to ensure the error flag is not asserted incorrectly for the expected operating condition.

For Agilex™ 7 SmartVID devices, VCC and VCCP operate within the 0.70 V to 0.90 V voltage range. The following is an example of settings that works for this voltage range. You may revise the settings based on your system requirements.

VOUT_OV_WARN_LIMIT to VID_MAX 927mV
VOUT_OV_FAULT_LIMIT to VID_MAX 930mV
VOUT_MAX to VID_MAX 950mV
VOUT_UV_WARN_LIMIT to VID_MIN 690mV
VOUT_UV_FAULT_LIMIT to VID_MIN 680mV

Limit should be wider or larger than the expected operating conditions, but within the absolute maximum rating for the device. For more information, refer to the Agilex™ 7 FPGAs and SoCs Device Data Sheet: F-Series and I-Series .