Intel® Agilex™ Power Management User Guide

ID 683373
Date 9/06/2022
Public
Document Table of Contents

5.1.2.4. Fault Management and Error Reporting

The SDM firmware has the capability to detect error, fault, or warning in the PMBus throughout the initialization and monitor states. The firmware analyzes any error and put it into the Error Message Queue (EMQ). During configuration, the CONFIG_STATUS mailbox command notifies you about the error.

In the master mode while running the monitor state, the SDM firmware queries the voltage regulator with a STATUS_BYTE command for every 500 ms. If the value returned from the STATUS_BYTE is not equal to zero, it indicates an error, fault, or warning within the voltage regulator. This firmware reports the error through the EMQ and assert the SEU_ERROR pin to notify you of this error.

In the slave mode, the PWRMGT_ALERT is an optional pin. If the PWRMGT_ALERT pin is connected, the SDM firmware asserts the ALERT_n signal whenever there is an error occurred. The external PMBus master has to initiate the ARA flow to handshake with the FPGA to read the error from the firmware. If the PWRMGT_ALERT pin is not connected, the external PMBus master sends the STATUS_BYTE command to the FPGA device periodically every 200 ms to check for errors.

The STATUS_BYTE Polling

Figure 30. Fault Management and Error Reporting
Note: The polling of the STATUS_BYTE every 500 ms in the master mode is applicable when you select the voltage regulator that has been validated by Intel® . If you select "Others" in the Intel® Quartus® Prime GUI, the STATUS_BYTE polling is disabled.

The following table shows the error of the STATUS_BYTE based on the return bit.

Table 22.   STATUS_BYTE Error Definition
Command Error Definition
STATUS_BYTE (78h) Bit[7]: Busy, unable to respond
Bit[6]: Off, not enabled
Bit[5]: Output over voltage fault occurred
Bit[4]: Output over current fault occurred
Bit[3]: Input under voltage fault occurred
Bit[2]: Temperature fault or warning occurred
Bit[1]: Communication, memory, or logic fault occurred
Bit[0]: The fault occurred that are not listed above

Each bit returns from the STATUS_BYTE indicate a different error occurring in the voltage regulator and firmware reports each of them into the EMQ. For example, a value of 0x6 (b'0000_0110) returns from the STATUS_BYTE read reports that the voltage regulator is having communication, memory, or logic fault and temperature fault or warning asserted, the firmware inputs there 2 error entries into the EMQ for each of the error or fault occurred.

You must program non-volatile memory (NVM) in the voltage regulator correctly to ensure the error flag is not asserted incorrectly for the expected operating condition.

For Intel® Agilex™ SmartVID devices, VCC and VCCP operate within the 0.70 V to 0.90 V voltage range. The following is the example settings that works for this voltage range. You may revise the settings based on your system requirements.

VOUT_OV_WARN_LIMIT to VID_MAX 920mV
VOUT_OV_FAULT_LIMIT to VID_MAX 930mV
VOUT_MAX to VID_MAX 950mV
VOUT_UV_WARN_LIMIT to VID_MIN 690mV
VOUT_UV_FAULT_LIMIT to VID_MIN 680mV

Limit should be wider or larger than the expected operating conditions, but within the absolute maximum rating for the device. For more information, refer to the Intel® Agilex™ Device Data Sheet.

Did you find the information on this page useful?

Characters remaining:

Feedback Message