Intel Agilex® 7 Power Management User Guide

ID 683373
Date 12/04/2023
Public
Document Table of Contents
Give Feedback

5.1.2.4. Fault Management and Error Reporting

The SDM firmware has the capability to detect error, fault, or warning in the PMBus throughout the initialization and monitor states. The firmware analyzes any error and put it into the Error Message Queue (EMQ). During configuration, the CONFIG_STATUS mailbox command notifies you about the error.

In the master mode while running the monitor state, the SDM firmware queries the voltage regulator with a STATUS_BYTE command for every 500 ms. If the value returned from the STATUS_BYTE is not equal to zero, it indicates an error, fault, or warning within the voltage regulator. This firmware reports the error through the EMQ and assert the SEU_ERROR pin to notify you of this error.

In the slave mode, the SDM firmware asserts the PWRMGT_ALERT signal whenever an error occurs. The external PMBus master has to initiate the ARA flow to handshake with the FPGA to read the error from the firmware.

The STATUS_BYTE Polling

The STATUS_BYTE polling is an optional feature. To change the setting of the STATUS_BYTE polling, refer to the Specifying Power Management and VID Parameters and Option section and Table: Power Management and VID Parameters.

Figure 28. Fault Management and Error Reporting
Note: The polling of the STATUS_BYTE every 500 ms in the master mode is applicable when you select the voltage regulator that has been validated by Intel® . If you select "Others" in the Intel® Quartus® Prime GUI, the STATUS_BYTE polling is disabled.

The following table shows the error of the STATUS_BYTE based on the return bit.

Table 26.   STATUS_BYTE Error Definition
Command Error Definition
STATUS_BYTE (78h) Bit[7]: Busy, unable to respond
Bit[6]: Off, not enabled
Bit[5]: Output over voltage fault occurred
Bit[4]: Output over current fault occurred
Bit[3]: Input under voltage fault occurred
Bit[2]: Temperature fault or warning occurred
Bit[1]: Communication, memory, or logic fault occurred
Bit[0]: The fault occurred that are not listed above

Each bit returns from the STATUS_BYTE indicate a different error occurring in the voltage regulator and firmware reports each of them into the EMQ. For example, a value of 0x6 (b'0000_0110) returns from the STATUS_BYTE read reports that the voltage regulator is having communication, memory, or logic fault and temperature fault or warning asserted, the firmware inputs there 2 error entries into the EMQ for each of the error or fault occurred.

The Importance of Safety Limits Settings in the Voltage Regulator

You must program non-volatile memory (NVM) in the voltage regulator correctly to ensure the error flag is not asserted incorrectly for the expected operating condition.

For Intel Agilex® 7 SmartVID devices, VCC and VCCP operate within the 0.70 V to 0.90 V voltage range. The following is the example settings that works for this voltage range. You may revise the settings based on your system requirements.

VOUT_OV_WARN_LIMIT to VID_MAX 927mV
VOUT_OV_FAULT_LIMIT to VID_MAX 930mV
VOUT_MAX to VID_MAX 950mV
VOUT_UV_WARN_LIMIT to VID_MIN 690mV
VOUT_UV_FAULT_LIMIT to VID_MIN 680mV

Limit should be wider or larger than the expected operating conditions, but within the absolute maximum rating for the device. For more information, refer to the Intel Agilex® 7 FPGAs and SoCs Device Data Sheet: F-Series and I-Series .