Nios® V Processor: Lockstep Implementation User Guide

ID 833274
Date 4/17/2025
Public
Document Table of Contents

6.7.5. UC_05: Fail Safe after Fault Discrimination and Functional Downgrade

This use case is an advanced scenario for improved availability. This use case is based on a similar scheme as UC_04. The difference is the system may successfully extend the availability capacity, by using the fRSmartComp comparator CONTEXT information.
This use case outlines the following steps:
  1. The comparator flags a mismatch due to a fault into one of the two CPUs.
  2. The fault is categorized as WARNING by fRSmartComp.
  3. The System Supervisor uses the WARNING information to put the system in a temporary safe state.
  4. fRSmartComp generates the first CPU reset request to Reset Controller.
  5. Reset Controller triggers a warm reset to the two CPUs (restarting them).
  6. If it is a transient fault, the following steps occur:
    1. The fault disappears.
    2. The processor application restarts typical execution.
  7. If it is a permanent fault in the typical application only, the following steps occur:
    1. The fault is detected again, causing a second reset.
    2. System Supervisor read fRSmartComp CONTEXT information on the affected comparator slice.
    3. If applicable, System Supervisor can restart the HOST CPU in safe mode, (limited functionality), which avoids using the faulty resource.
    4. Reset Controller triggers a warm reset to the two CPUs (initiate the safe mode application).
    5. The fault disappears.
    6. The processor application restarts under safe mode.
  8. If it is a permanent fault in both typical and safe mode application, the following steps occur:
    1. The fault is detected again, causing third reset.
    2. The programmable Reset counter threshold (configured as 2) is met.
    3. The fault is categorized as ERROR by fRSmartComp.
    4. fRSmartComp sets the primary OKNOK output to NOT_OK.
    5. System Supervisor uses the NOT_OK status to put the system in safe state.
    6. The system is permanently kept in safe state mode.

To simplify the flow diagram, the fRSmartComp configurations are labeled as CONF_3:

  • Configure ALARM severity.
    1. Set ALARM1 to WARNING
    2. Set ALARM16 to ERROR
    3. Set ALARM18 to ERROR
  • Set RST_COUNT as 2 to generate alarms after three resets.
    1. Transient fault results in one reset.
    2. Permanent fault in typical application results in two resets.
    3. Permanent fault in safe mode application results in three resets.
  • Set RRACM as 2’b10 to enable CPUs reset request after mismatch.
  • If INTREQ signal is used, set INTREQ configuration as 6’b011001 to generate interrupt upon WARNING.

You can decide the development of the safe mode since it is closely related to the system design and to the application. Note that, the safe mode is purely on the processor software application (not referring to FPGA reconfiguration).

For example, if the fault was detected in Data TCM1, the safe mode may decide to avoid it and run the software application in other memories. Thus, the routine allows the system to continue operating in a functional reduced mode, assuring a certain amount of availability.

Figure 28. Fail Safe after Functional Downgrade Flowchart Diagram