AN 737: SEU Detection and Recovery in Intel Arria 10 Devices
1. SEU Detection and Recovery in Intel Intel Arria 10 Devices
This application note describes the implementation of Intel® Intel® Arria® 10 single event upset (SEU) detection and recovery features by presenting the following information:
- Error detection and correction feature architecture in Intel® Arria® 10 devices.
- General implementation guidelines for error detection cyclic redundancy check (EDCRC) and error correction feature.
- General implementation guidelines for embedded memory error correction code (ECC) feature.
- Intel® Arria® 10 EDCRC reference design with detailed development flow.
1.1. Intel Arria 10 Error Detection and Correction Feature Architecture
1.1.1. Error Detection and Correction for CRAM
1.1.1.1. Error Detection Cyclic Redundancy Check
In user mode, the contents of the configured configuration RAM (CRAM) bits can be affected by soft errors. These soft errors, which are caused by an ionizing particle, are not common in Intel FPGA devices. However, high-reliability applications that require error-free device operation may require your design to consider these errors.
The hardened on-chip EDCRC circuitry allows you to perform the following operations without any impact on the fitting or performance of the device:
- Auto-detection of cyclic redundancy check (CRC) errors during configuration.
- Optional soft errors (SEU and multiple bit upset) detection and identification in user mode.
- Fast soft error detection. The error detection speed is improved.
- Two types of check-bits:
- Frame-based check-bits—stored in CRAM and used to verify the integrity of the frame.
- Column-based check-bits—stored in registers and used to protect integrity of all frames.
During error detection in user mode, a number of EDCRC engines run in parallel for Intel® Arria® 10 devices. The number of error detection CRC engines depends on the frame length—total bits in a frame.
Each column-based error detection CRC engine reads 128 bits from each frame and processes within four cycles. To detect errors, the error detection CRC engine needs to read back all frames.
Name | Description |
---|---|
Error message registers (EMR) | Contains error details for single-bit and double-adjacent errors. The error detection circuitry updates this register each time the circuitry detects an error. |
User update register | This register is automatically updated with the contents of the EMR one clock cycle after the contents of this register are validated. The user update register includes a clock enable, which must be asserted before its contents are written to the user shift register. This requirement ensures that the user update register is not overwritten when its contents are being read by the user shift register. |
User shift register | This register allows user logic to access the contents of the user
update register via the core interface. You can use the Error Message Register Unloader Intel FPGA IP core to shift-out the EMR information through user shift register. For more information, please refer to related information. |
JTAG update register | This register is automatically updated with the contents of the EMR one clock cycle after the content of this register is validated. The JTAG update register includes a clock enable, which must be asserted before its contents are written to the JTAG shift register. This requirement ensures that the JTAG update register is not overwritten when its contents are being read by the JTAG shift register. |
JTAG shift register | This register allows you to access the contents of the JTAG update register via the JTAG interface using the SHIFT_EDERROR_REG JTAG instruction. |
Hard Processor System (HPS) update register | This register is automatically updated with the contents of the EMR one clock cycle after the content of this register is validated. The (HPS) update register includes a clock enable, which must be asserted before its contents are written to the HPS shift register. This requirement ensures that the HPS update register is not overwritten when its contents are being read by the HPS shift register. |
HPS shift register |
This register allows you to access the contents of the HPS update register via the HPS interface. |
1.1.1.1.1. Column-Based and Frame-Based Check-Bits
EDCRC Check-Bits Updates
Frame-based check-bits are calculated on-chip during configuration. Column-based check-bits are updated after configuration.
When you enable the EDCRC feature, after the device enters user mode, the EDCRC function starts reading CRAM frames. The data collected from the read-back frame is validated against the frame-based check-bits.
After the initial frame-based verification is completed, the column-based check-bits is calculated based on the respective column CRAM. The EDCRC hard block recalculates the column-based check-bits in one of the following scenarios:
- FPGA re-configuration
- After successful partial reconfiguration (PR) session
- After configuration via protocol (CvP) session
1.1.1.1.2. Error Message Register
The EMR contains information on the error type, the location of the error, and the actual syndrome. This register is 78 bits wide in Intel® Arria® 10 devices. The EMR does not identify the location bits for uncorrectable errors. The location of the errors consists of the frame number, double word location and bit location within the frame and column.
You can shift out the contents of the register through the following:
- EMR Unloader IP core—core interface
- SHIFT_EDERROR_REG JTAG instruction—JTAG interface
- HPS Shift register—HPS interface
Name | Width (Bits) | Description |
---|---|---|
Frame Address | 16 | Frame Number of the error location |
Column-Based Double Word | 2 | There are 4 double words per frame in a column. It indicates the double word location of the error |
Column-Based Bits | 5 | Error location within 32-bit double word |
Column-Based Type | 3 | Types of error shown in Table 3 |
Frame-Based syndrome register | 32 | Contains the 32-bit CRC signature calculated for the current frame. If the CRC value is 0, the CRC_ERROR pin is driven low to indicate no error. Otherwise, the pin is pulled high. |
Frame-Based Double Word | 10 | Double word location within the CRAM frame. |
Frame-Based Bit | 5 | Error location within 32-bit double word |
Frame-Based Type | 3 | Types of error shown in Table 3 |
Reserved | 1 | Reserved bit |
Column-Based Check-Bits Update | 1 | Logic high if there is error encountered during the column check-bits update stage. The CRC_ERROR pin will be asserted and stay high until the FPGA is reconfigured. |
Error Type in EMR
Error Types | Bit 2 | Bit 1 | Bit 0 | Description |
---|---|---|---|---|
Frame-based | 0 | 0 | 0 | No error |
0 | 0 | 1 | Single-bit error | |
0 | 1 | X | Double-adjacent error | |
1 | 1 | 1 | Uncorrectable error | |
Column-Based | 0 | 0 | 0 | No error |
0 | 0 | 1 | Single bit error | |
0 | 1 | X | Double-adjacent error in a same frame | |
1 | 0 | X | Double-adjacent error in a different frame | |
1 | 1 | 0 | Double-adjacent error in a different frame | |
1 | 1 | 1 | Uncorrectable error |
1.1.1.2. Recovering from CRC Errors
Intel® Arria® 10 devices support the internal scrubbing capability. The internal scrubbing feature corrects correctable CRAM upsets automatically when an upset is detected. However, internal scrubbing can not fix the FPGA to a known good state. The time between the error and completion of scrubbing can be tens of millisecond. This duration represents thousands of clock cycles in which the corrupted data was written to memory or status registers. It is a good practice to always follow any SEU event with a soft-reset to bring the FPGA operation to a known good state.
If a soft-reset is unable to bring the FPGA to a known good state, you can reconfigure the device to rewrite the CRAM and reinitialize the design registers. The system that hosts the Intel® Arria® 10 device must control the device reconfiguration. When reconfiguration completes successfully, the Intel® Arria® 10 device operates as intended.
1.1.2. Memory Blocks Error Correction Code Support
ECC allows you to detect and correct data errors at the output of the memory. ECC can perform single-error correction, double-adjacent-error correction, and triple-adjacent-error detection in a 32-bit word. However, ECC cannot detect four or more errors.
The M20K blocks have built-in support for ECC when in x32-wide simple dual-port mode:
- The M20K runs slower than non-ECC simple-dual port mode when ECC is engaged. However, you can enable optional ECC pipeline registers before the output decoder to achieve higher performance compared to non-pipeline ECC mode at the expense of one cycle of latency.
- The M20K ECC status is communicated with two ECC status flag signals—e (error) and ue (uncorrectable error). The status flags are part of the regular output from the memory block. When ECC is engaged, you cannot access two of the parity bits because the ECC status flag replaces them.
1.2. Guidelines for Error Detection CRC and Error Correction Feature
1.2.1. Error Detection
1.2.1.1. Enabling Error Detection
There are two methods to turn on Intel® Arria® 10 error detection CRC feature based on your application needs.
- If your design detects and reads the EMR using user logic, you need to instantiate the EMR Unloader IP core which will automatically turn the EDCRC feature on.
- If you want to monitor SEU with the external host and do not need to read the EMR from user logic, you can turn on EDCRC feature by enabling CRC_ERROR pin in your Intel® Quartus® Prime project.
1.2.1.1.1. Enabling the Error Detection CRC_ERROR Pin
To enable the CRC_ERROR pin for external host monitoring purpose, perform the following steps:
- On the Assignments menu, click Device.
- Click Device and Pin Options and select the Error Detection CRC at the left panel.
- Check the Enable Error Detection CRC_ERROR pin.
- Select the EDCRC clock divisor from the list of Divide error check frequency by.Note: This option provides you with a flexibility to run the EDCRC at a slower speed. However, Intel recommends you to set to the smallest EDCRC clock divisor. Setting a high divisor can impact the error detection time performance. Refer to Arria 10 Handbook SEU Mitigation chapter of the Arria 10 handbook for detection time specification.
- Check the Enable open drain on CRC_ERROR pin if you have an external pull up resistor on your board.
- Click OK.
1.2.1.2. Reading EMR
1.2.1.2.1. Reading EMR using EMR Unloader IP Core
You can instantiate the EMR Unloader IP core to detect SEU and unload EMR content in user logic. The EDCRC feature will be turned on automatically when the EMR Unloader IP core is instantiated. EMR Unloader IP core helps you to read the EMR whenever there is an SEU event by:
- Unloading the EMR via core logic
- Accessing the hard CRC Block
- Providing access to the user logic to read the EMR data
1.2.1.2.2. Reading EMR using HPS
The FPGA Manager in the HPS has the ability to monitor the CRC_ERROR status pin and to retrieve the error symptom, location and type. You can choose to enable the CRC error interrupt from the FPGA Manager, followed by CRC error information extraction from respective registers.
1.2.1.2.3. Reading EMR using JTAG Interface
To unload the contents of the EMR using a JTAG port, use the SHIFT_EDERROR_REG JTAG instruction. This JTAG instruction connects the EMR to the JTAG pin in the error detection block between the TDI and TDO pins. You can execute the instruction whenever the CRC_ERROR pin goes high. You must unload the contents of the EMR before the register is overwritten by the information of the next CRC error.
JTAG Instruction | Instruction Code | Description |
---|---|---|
SHIFT_EDERROR_REG | 00 0001 0111 | The JTAG instruction connects the EMR to the JTAG pin in the error detection block between TDI and TDO pins. |
The following shows the Jam™ Standard Test and Programming Language (STAPL) Format File (.jam) used to execute the SHIFT_EDERROR_REG JTAG instruction to unload the contents of the EMR.
Example of .jam File to Unload the Contents of the EMR for Arria 10 Device
ACTION UNLOAD_EMR = EXECUTE; DATA EMR_DATA; BOOLEAN out[78]; ENDDATA; PROCEDURE EXECUTE USES EMR_DATA; DRSTOP IDLE; IRSTOP IDLE; STATE IDLE; IRSCAN 10, $017; WAIT IDLE, 10 CYCLES, 1 USEC, IDLE; DRSCAN 78,$0, CAPTURE out[77..0]; WAIT IDLE, 10 CYCLES, 25 USEC, IDLE; PRINT " "; PRINT "Data read out from the "; PRINT "EMR_Register :" , out[77], out[76], out[75], out[74], out[73], out[72], out[71], out[70], out[69], out[68], out[67], out[66], out[65], out[64], out[63],out[62], " ", out[61], out[60], " ", out[59], out[58], out[57], out[56], out[55], " ", out[54], out[53], out[52], " ", out[51], out[50], out[49], out[48], out[47], out[46], out[45], out[44], out[43], out[42], out[41], out[40], out[39], out[38], out[37], out[36], out[35], out[34], out[33], out[32], out[31], out[30], out[29], out[28], out[27], out[26], out[25], out[24], out[23], out[22], out[21], out[20], " ", out[19], out[18], out[17], out[16], out[15], out[14], out[13], out[12], out[11], out[10], " ", out[9] , out[8], out[7], out[6], out[5], " ", out[4], out[3], out[2], " ", out[1], " ", out[0]; 'PRINT " "; PRINT "Frame Address :", out[77], out[76], out[75], out[74], out[73], out[72], out[71], out[70], out[69], out[68], out[67], out[66], out[65], out[64], out[63], out[62]; PRINT "Column-Based Double Word Location :", out[61], out[60]; PRINT "Column-Based Bit :", out[59], out[58], out[57], out[56], out[55]; PRINT "Column-Based Type :", out[54], out[53], out[52]; PRINT "Frame-Based Syndrome :" , out[51], out[50], out[49], out[48], out[47], out[46], out[45], out[44], out[43], out[42], out[41], out[40], out[39], out[38], out[37], out[36], out[35], out[34], out[33], out[32], out[31], out[30], out[29], out[28], out[27], out[26], out[25], out[24], out[23], out[22], out[21], out[20]; PRINT "Frame-Based Double Word Location :", out[19], out[18], out[17], out[16], out[15], out[14], out[13], out[12], out[11], out[10]; PRINT "Frame-Based Bit :", out[9] , out[8], out[7], out[6], out[5]; PRINT "Frame-Based Type :", out[4], out[3], out[2]; PRINT "Reserved bit :", out[1]; PRINT "Column-based EDCRC Check Bits Update:", out[0]; STATE IDLE; EXIT 0; ENDPROC;
1.2.2. Enabling Error Correction (Internal Scrubbing)
Intel® Arria® 10 supports the internal scrubbing feature to automatically scrub away the flipped bit induced by the SEU. To enable the internal scrubbing feature, follow these steps:
- On the Assignments menu, click Device.
- Click Device and Pin Options and select the Error Detection CRC tab.
- Turn on Enable internal scrubbing.
- Click OK.
1.2.3. Interpreting CRC_ERROR
It is important to determine the error type when an SEU is detected. This section explains the CRC_ERROR pin behavior and how to interpret whether the error type is correctable or uncorrectable.
1.2.3.1. CRC_ERROR Pin Behavior
The Intel® Arria® 10 fast EDCRC feature runs all the column-based check-bits engine in parallel. When an SEU is detected, the column-based check-bits asserts the CRC_ERROR, the detected frame location is then passed to the frame-based check-bits to further localize the affected bit. This process causes the CRC_ERROR pin to assert twice. Column-based check-bits assert the first CRC_ERROR pulse and followed by the frame-based check-bits asserting the second pulse.
In Intel® Arria® 10, as soon as an SEU is detected, the CRC_ERROR is asserted high and remains high until the EMR is ready to be read. You can unload the EMR data as soon as the CRC_ERROR pin goes low. Once EMR data is unloaded, can determine the error type and the affected location. With these information you can decide how your system should respond to the specific SEU event.
In the rare event of an uncorrectable and un-locatable error, the CRC_ERROR signal is asserted only once. There is no second pulse assertion by frame-based check-bits due to the uncorrectable error location cannot be located. The statistical likelihood of uncorrectable multi-bit SEU is less than one in 10,000 years for a device in typical environmental conditions.
Example of CRC_ERROR pin behavior for column-based/frame-based check-bits with a single pulse observed in one SEU event.
1.2.3.2. Correctable and Uncorrectable Error
When an SEU is detected, you can read the EMR data to determine whether the error is correctable or uncorrectable. Intel recommends you to use Altera EMR Unloader IP core in your design. The Altera EMR Unloader IP core interprets the error and reports it at the output.
Case | EDCRC Operation | CRC_ERROR Pulse | Column-Based Field | Frame-Based Field | Correctable | Remark |
---|---|---|---|---|---|---|
A 1 | Frame-based check-bits | 1 | All 0's |
Type = 3'b001 or Type = 3'b010 & bit ≠ 5'h1F or Type = 3'b011 & bit = 5'h1F |
Yes | Error will be corrected if internal scrubbing is turned On |
B1 | Frame-based check-bits | 1 | All 0's |
Type = 3'b111 or Type = 3'b010 & bit = 5'h1F or Type = 3'b011 & bit ≠ 5'h1F EMR Unloader IP core will set type = 3'b111 if any of above condition met |
No | The frame-based check-bits will retry for 2 times and enter dead state where CRC_ERROR stays high until FPGA reconfiguration |
C | Stuck in dead state | 1 pulse and stay high after 2nd assertion | EMR Unloader IP core set Type = 3'b111 | EMR Unloader IP core set type = 3'b111 | No | CRC_ERROR stays high until FPGA reconfiguration. Refer Case B to understand how EDCRC can stuck in dead state |
D | Column-based check-bits | 1 |
Type = 3'b111 or Type = 3'b010 & bit = 5'h1F or Type = 3'b011 & bit ≠ 5'h00 EMR Unloader IP core will set type = 3'b111 if any of above condition met |
All 0's | No | Detected uncorrectable error during column-based check-bits |
E | Column-based check-bits and frame-based check-bits | 2 |
Any type except: Type = 3'b111 or Type = 3'b010 & bit = 5'h1F or Type = 3'b011 & bit ≠ 5'h00 |
Type = 3'b111 or Type = 3'b010 & bit = 5'h1F or Type = 3'b011 & bit ≠ 5'h1F EMR Unloader IP core will set type = 3'b111 if any of above condition met |
No | Detected uncorrectable error |
F | Column-based check-bits and frame-based check-bits | 2 |
Any type except: Type = 3'b111 or Type = 3'b010 & bit = 5'h1F or Type = 3'b011 & bit ≠ 5'h00 |
Any type except: Type = 3'b111 or Type = 3'b010 & bit = 5'h1F or Type = 3'b011 & bit ≠ 5'h1F |
Yes | Error will be corrected if internal scrubbing is turned On |
1.3. Guidelines for Embedded Memory ECC Feature
The Intel® Agilex™ Intel® Stratix® 10 Intel® Arria® 10 and Intel® Cyclone® 10 GX FIFO Intel® FPGA IP cores support embedded memory ECC for M20K memory blocks. The built-in ECC feature in the Intel® Agilex™ Intel® Stratix® 10 Intel® Arria® 10 and Intel® Cyclone® 10 GX devices can perform:
- Single-error detection and correction
- Double-adjacent-error detection and correction
- Triple-adjacent-error detection
You can turn on FIFO Embedded ECC feature by enabling enable_ecc parameter in the FIFO Intel® FPGA IP GUI.
When you enable the ECC feature, a 2-bit wide error correction status port (eccstatus[1:0]) will be created in the generated FIFO entity. These status bits indicate whether the data that is read from the memory has an error in single-bit with correction, fatal error with no correction, or no error bit.
- 00: No error
- 01: Illegal
- 10: A correctable error occurred and the error has been corrected at the outputs; however, the memory array has not been updated.
- 11: An uncorrectable error occurred and uncorrectable data appears at the output
1.4. Intel Arria 10 EDCRC Reference Design
The EDCRC reference design demonstrates the following main SEU detection and recovery for Intel® Arria® 10:
- Instantiating various SEU-related IP cores such as EMR Unloader IP core, Advanced SEU Detection IP core, and Fault Injection IP core
- Demonstrating how the Advanced SEU Detection IP core retrieves the SMH information from the EPCQ-L with Serial Flash Controller IP core2
- Integrating the reference design into your system and characterize your system response to the SEU event with the Intel Fault Injection feature.
1.4.1. System Requirements
This reference design is targeted for the following hardware and software:
- Intel® Arria® 10 development kit that is using 10AX115S2F45I2SG device.
-
Intel®
Quartus® Prime software version 16.0Note: You must have a licensed version of the Intel® Quartus® Prime software to generate SMH files.
1.4.2. Creating Intel Arria 10 SEU Fault Injection and Hierarchy Tagging Design with Qsys
The a10-seu.zip reference design consists of:
- a10_seu.qar—the project archive file
- top.v—the top level module of the project
- top.sdc—the timing constraint file
- top.stp—the Signal Tap file
In this design, you will use Platform Designer (Standard) to connect the Intel SEU-related IP cores together. IP core to be connected are EMR Unloader IP core, Fault Injection IP core and Advanced SEU Detection IP core. Some other IP cores are also needed to make the design complete, which are Altera IOPLL IP core, AVST Splitter and Serial Flash Controller IP core.
1.4.2.1. Starting Intel Quartus Prime Software and Opening the Reference Design Project
To open the Intel® Quartus® Prime project, perform the following steps:
- In the Intel® Quartus® Prime software, click Open Existing Project on the splash screen, or on the File menu, click Open Project. The Open Project dialog box appears.
- Browse to the <qar file directory> where you store your .qar file.
- Select the file a10_seu.qar and click Open.
- Change the destination folder name if required, or leave it default as <qar file directory>/a10_seu_restored. Click OK.
1.4.2.2. Creating New Qsys System
To create a new Qsys system, click Qsys on the Tools menu in the Intel® Quartus® Prime software. Qsys starts and displays the System Contents tab.
1.4.2.2.1. Specifying Target FPGA and Clock Settings
To specify target FPGA and clock settings in Qsys, perform the following steps:
- Click Device Family in View menu, select the Device Family that matches the Intel® Arria® 10 device you are targeting. Warning will appear if the selected device family does not match Intel® Quartus® Prime project settings, you need to make sure your selected device in Intel® Quartus® Prime project settings match to your selected Device Family in Qsys.
-
On the System Contents tab, double click the
clk_0 component. In the Parameters tab
for clk_0, set the Clock frequency to
50MHz.
Next, you begin to add other IP cores to the Qsys system.
1.4.2.2.2. Adding Altera IOPLL IP Core
You must instantiate Altera IOPLL IP core in this reference design to generate 3 different clock sources, 10MHz, 20MHz and 100MHz. To add the Altera IOPLL IP core, perform the following steps:
- On the IP Catalog Tab, expand Basic Functions, expand Clock; PLLs and Resets, PLL, and then click Altera IOPLL.
- Click Add. The Altera IOPLL parameter editor appears.
- On PLL tab, at General section, set the Reference Clock Frequency to 50.
- Uncheck Enable locked output port.
- At Output Clocks section, set Number Of Clocks to 3.
-
Set the clocks as the following:
- For outclk0, set the Clock Name to clk_100and set the Desired Frequency to 100MHz.
- For outclk1, set the Clock Name to clk_20 and set the Desired Frequency to 20MHz.
- For outclk2, set the Clock Name to clk_10 and set the Desired Frequency to 10MHz.
- Click Finish to return to Qsys.
- On System Contents tab, an instance of the iopll_0 appears in the system contents table.
- Connect the clk port of the clk_0 clock source to the refclk port of the iopll_0.
- Connect the clk_reset port of the clk_0 clock source to the reset port of the iopll_0.
- Double click the outclk2 of the iopll_0 at Export column to export outclk2 as the clock source for other component outside of this Qsys system. Rename the exported signal as clk_10.
- Double click the outclk0 of the iopll_0 at Export column to export outclk0 as the clock source for other component outside of this Qsys system. Rename the exported signal as clk_100.
1.4.2.2.3. Adding EMR Unloader IP Core
You must instantiate EMR Unloader IP core to unload the EMR whenever there is SEU event. To add the EMR Unloader IP core, perform the following steps:
- On the IP Catalog tab, expand Basic Functions, expand Configuration and Programming, and then click Altera Error Message Register Unloader.
- Click Add. The Altera Error Message Register Unloader parameter editor appears.
- In CRC error check clock divisor list, select 2.
- Check the Input clock is driven from Internal Oscillator. This reference example uses Internal Oscillator to drive EMR Unloader IP core.
- Click Finish to return to Qsys. On System Contents tab, an instance of the emr_unloader2_0 appears in the system contents table.
- Connect the clk_reset port of the clk_0 clock source to the reset port of emr_unloader2_0.
- Double click the crcerror, and emr_read of emr_unloader2_0 at Export column to export them for external access. Leave the name as default.
1.4.2.2.4. Adding Advance SEU Detection IP Core
You must instantiate the ASD IP core for sensitivity processing and to validate the hierarchy tagging feature. To add ASD IP core, perform the following steps:
- On the IP Catalog tab, expand Basic Functions, expand Configuration and Programming, and then click Altera Advanced SEU Detection.
- Click Add. The Altera Advanced SEU Detection parameter editor appears.
- Leave the CRC error cache depth list default selection at 8.
- Set Largest ASD region ID used to 3.
- Check the Use on-chip sensitivity processing.
- Set Memory interface address width to 32.
- Set Sensitivity Data start address to 0x02000000.
- Click Finish to return to Qsys. On System Contents tab, an instance of the adv_seu_detection_0 appears in the system contents table.
- Connect the clk_reset port of the clk_0 clock source to the reset port of the adv_seu_detection_0.
- Double click the cache_comparison_off, and errors port of adv_seu_detection_0 at Export column to export them for external access, leave the name default.
1.4.2.2.5. Adding Fault Injection IP Core
You must instantiate the Fault Injection IP core to inject the fault to the CRAM. The faults can be a Single Bit Error (SBE), Double Adjacent Error (DAE) or Uncorrectable Multi Bit Error (UMBE). To add Fault Injection IP core, perform the following steps:
- On the IP Catalog tab, expand Basic Functions, expand Configuration and Programming, and then click Altera Fault Injection.
- Click Add.The Altera Fault Injection parameter editor appears.
- Click Finish to return to Qsys.On the System Contents tab, an instance of the fault_injection_0 appears in the system contents table.
- Connect clk_reset port of the clk_0 clock source to the reset port of the fault_injection_0.
- Connect intosc port of the fault_injection_0 to clock port of emr_unloader2_0.
- Connect intosc port of the fault_injection_0 to clock port of adv_seu_detection_0.
- Connect crcerror_pin port of emr_unloader2_0 to crcerror_pin port of fault_injection_0.
- Double click the error_injected, and error_scrubbed of the fault_injection_0 at Export column to export them for external access, leave the name default.
1.4.2.2.6. Adding Avalon-ST Splitter
EMR Unloader core sends the EMR data to the downstream IP cores with Avalon-ST protocol. Both ASD IP core and Fault Injection IP core require EMR data from EMR Unloader core. You need to instantiate the Avalon-ST Splitter to distribute the EMR data from EMR Unloader to ASD IP core and Fault Injection IP core. To add the Avalon-ST Splitter, perform the following steps:
- On the IP Catalog tab, expand Basic Functions, expand Bridges and Adaptors, expand Streaming, and click Avalon-ST Splitter.
- Click Add. The Avalon-ST Splitter parameter editor appears.
- Set NUMBER_OF_OUTPUTS to 3.
- Check only USE_VALID, USE_ERROR and USE_DATA, uncheck all other check boxes.
- Set DATA_WIDTH to 119.
- Set ERROR_WIDTH to 1.
- Set BITS_PER_SYMBOL to 119.
- Click Finish to return to Qsys. On the System Contents tab, an instance of the st_splitter_0 appears in the system contents table.
- Connect clk_reset port of the clk_0 clock source to reset port of st_splitter_0.
- Connect intosc port of the fault_injection_0 to clk port of st_splitter_0.
- Connect avst_emr_src port of emr_unloader2_0 to in port of st_splitter_0.
- Connect out0 port of st_splitter_0 to avst_emr_snk port of adv_seu_detection_0.
- Connect out1 port of st_splitter_0 to avst_emr_snk port of fault_injection_0.
- Double click the out2 port of the st_splitter_0 at Export column to export it for external access, leave the name default. This port will be used for Signal Tap purpose to read the EMR value after the fault injection.
1.4.2.2.7. Adding Serial Flash Controller
You must use the Serial Flash Controller IP core to access to the EPCQ-L1024 that stores the SMH file in this reference design. The ASD IP core reads the SMH data from EPCQ-L1024 via Serial Flash Controller IP core. To add the Serial Flash Controller, perform the following steps:
- On the IP Catalog tab, expand Basic Functions, expand Configuration and Programming, and then click Altera Serial Flash Controller.
-
Click Add. The Altera Serial Flash
Controller parameter editor appears. Set the parameters as the
follows:
- On Configuration device type list, select EPCQL1024.
- On Choose I/O mode, select QUAD.
- On Number of Chip Selects used list, select 1.
- Click Finish to return to Qsys. On the System Content tab, an instance of the epcq_controller_0 appears in the system contents table.
-
Connect outclk1 port of iopll_0 to
clock_sink port of
epcq_controller_0.
Note: The Fmax for Serial Flash Controller is 25MHz
- Connect clk_reset port of clk_0 clock source to reset port of epcq_controller_0.
- Connect asd_sp_master port of adv_seu_detection_0 to avl_mem port of epcq_controller_0.
1.4.2.3. Generating Qsys System
To generate the Qsys system, perform the following steps:
- Click Generate HDL from Generate menu.
- Click Generate. Click Yes when the Save Changes? dialog box appears.
- Type asd_fi_system in the File name box and click Save. The Generate dialog box appears and system generation process begins.
- Click Close to close the dialog box.
-
On the File menu, click Exit to
close Qsys and return to the Quartus
Prime software.
You are ready to integrate the Qsys system into Intel® Quartus® Prime project.
1.4.2.4. Integrating Qsys System into Quartus Prime Project
To complete the reference design, you must perform the following tasks:
- Generate In-System Source and Probe (ISSP) IP core
-
Intel®
Quartus® Prime project setting and add the following files (provided in
download package) to the project:
- Top.v—instantiate the Qsys system module and connect all other IP cores
- Top.stp—monitor some key signals with Signal Tap tool
- Top.sdc—timing constraint
- Assign ASD regions to up counter and down counter
- Assign FPGA device and pin locations
- Compile the project
1.4.2.4.1. Generating In-System Source and Probe IP Core
To generate ISSP, perform the following steps:
- On IP Catalog, expand Basic Functions, expand Simulation; Debug and Verification, expand Debug and Performance and double click Altera In-System Sources and Probes.
- IP Parameter Editor appears, key in issp in Entity name, click OK.
- Set Probe Port Width [0..511] to 0.
- Set Source Port Width [0..511] to 4.
- Leave default to all other setting.
- Click Generate HDL from Generate Menu, click Generate.
- Click close and click Exit from File menu.
- Click Yes if prompted to add the Intel® Quartus® Prime IP File to the project.
1.4.2.4.2. Quartus Prime Project Settings
To set the Intel® Quartus® Prime project setting, add the top level file, Signal Tap file and SDC file to the project, perform the following steps:
- Click Device at Assignments menu, and then click Device and Pin Options in Device dialog box.
- Under Configuration Category, select Active Serial x4 for the Configuration scheme.
- Under Error Detection CRC Category, check the Enable Error Detection CRC_ERROR pin.
-
Leave Enable internal scrubbing uncheck.
Note: You can enable Enable internal scrubbing during internal scrubbing feature tryout.
- Set the Divide error check frequency by list to 2.
- Check the Generate SEU sensitivity map file (.smh).
- Click OK to exit Device and Pin Options dialog box.
- Click OK again to exit Device dialog box.
- Click Settings at Assignments menu, select Files category at left panel, add top.v, top.stp and top.sdc to the project.
- Select TimeQuest Timing Analyzer category at left panel, add the top.sdc to SDC files to include in the project.
- Select Signal Tap Logic Analyzer category at left panel, check Enable Signal Tap Logic Analyzer and select the top.stp as the Signal Tap File name.
- Click OK to close the Settings window.
- Click Processing Menu, click Start > Analysis and Synthesis.
1.4.2.4.3. Assigning ASD Regions
This reference design uses 3 ASD regions. To assign the ASD regions, perform the following steps:
- At Project Navigator window, select Hierarchy, expand top, right click down_counter:down_counter_inst, select Design Partition, Set as Design Partition.
- Repeat step 1 for up_counter:up_counter_inst to set the Design Partition.
- In the Design Partition Window , set the Netlist Type and ASD Region for the following Partition Name:
1.4.2.4.4. Assigning FPGA Pin Location
To assign the clock source pin to your design, perform the following steps:
- Launch the Pin Planner from the Assignment menu.
- Assign AU33 to inclk input.
- Close the Pin Planner.
1.4.2.4.5. Compiling the Project
You must compile the project to generate the .sof file and .smh file. To compile the project, perform the following steps:
-
Click Start Compilation in the
Processing menu.
The full compilation process begins and this may take a while to complete the compilation.
- After the compilation complete, you will get the .sof file and .smh file in the output_files folder, you need these files for hardware verification later.
1.4.3. Design Testing with Fault Injection Debugger
The following are the main steps to test your reference design:
- Convert .sof file and .smh file to .jic file.
- Program .jic file to EPCQ-L.
- Launch Signal Tap Logic Analyzer and Fault Inject Debugger.
- Configure the .sof to Arria 10 and reading .smh file with Fault Injection Debugger.
- Start Signal Tap to monitor the signal and injecting an error with Fault Injection Debugger.
- Observe the Signal Tap output.
This section will go through some simple steps to inject faults to the CRAM. For more information about the Fault Injection Debugger, refer to Fault Injection Debugger User Guide.
1.4.3.1. Converting .sof File and .smh File to .jic File
To program .sof file and .smh file into EPCQ-L, you must convert them to a .jic file. The converted .jic file is consist of:
- The bit stream for Intel® Arria® 10 FPGA configuration in Active Serial mode upon power up
- The .smh file content at certain offset that you can define in Convert Programming File tool
To convert, perform the following steps:
-
Go to your output_files folder, duplicate the
top.smh file and rename it to
top.hex.
Note: The .smh file is in Intel HEX standard format, i.e. bytes addressing little endian. You may need to convert the .smh file to match the endianness of your system
- Launch Convert Programming File tool from File menu.
- At Output programming file section, select JTAG Indirect Configuration File (.jic) from the Programming file type list.
- Select EPCQL1024 from Configuration device list.
- Select Active Serial x4 from Mode list.
-
Give the File name as output_files/top.jic.
Optional to check Create Memory Map File (Generate top.map) and Create config data RPD (Generate top_auto.rpd).
- At Input files to convert section, select Flash Loader at the column of File/Data area.
- Click Add Device button and select Arria 10, 10AX115S2 and click OK.
- Select SOF Data at File/Data area column, click Add File button and select the top.sof inside the output_files folder.
- Select top.sof under SOF Data, click the Properties button, enable Compression and click OK to close the SOF File Properties dialog.
- Click Add Hex Data at Input files to convertsection.
-
Select Relative
addressing and set the start
address to 0x2000000.
Leave
Big endian as the default selection
for Endianness. Select
top.hex from your output_files folder, and click OK.
The figure below shows the final setting for the .jic file generation. Verify and click Generate button.
- Click Close button to close the Convert Programming File after .jic is generated successfully.
1.4.3.2. Programing .jic File into EPCQ-L
Before performing this task, ensure that your board configuration scheme is set to Active Serial by setting the MSEL[2:0] pins to b'0101 or b'011. Refer to the Configuration, Design Security, and Remote System Upgrades in Arria 10 Devices for more information.
To program the generated .jic file into the EPCQ-L, perform the following steps:
- Launch Programmer at Tools menu.
- Ensure that the valid programming cable is selected at Hardware Setup.
- Click Auto Detect button and you should see the detect JTAG chain displayed in the programmer window.
- Select Arria 10 FPGA, click the Change File button and select top.jic file in your output_files folder.
-
Check the output_files/top.jic Program/Configure, the
Factory default SFL image Program/Configure will be
checked automatically.
The diagram below shows the final setting of the programmer.
- Click Start to program top.jic file, this operation may take several minutes to complete.
1.4.3.3. Launching Signal Tap Logic Analyzer
To observe the signals monitored by the Signal Tap, you must launch the Signal Tap Logic Analyzer and start the Signal Tap operation before the fault injection operation. To launch the Signal Tap Logic Analyzer, perform the following steps:
- Launch Signal Tap Logic Analyzer from Tools menu.
-
Make sure the Hardware and Device
is selected.
Your Signal Tap operation cannot be started at this point until the FPGA is configured.
1.4.3.4. Configuring Intel Arria 10 and Reading .smh File with Fault Injection Debugger
To configure the Intel® Arria® 10 with Fault Injection Debugger, perform the following steps:
- Launch Fault Injection Debugger from Tools menu.
- Make sure a valid programming cable is selected in Hardware Setup.
- Click Auto Detect, the windows should display the detected Intel® Arria® 10 in the JTAG chain.
- Select Arria 10 device, click Select File, select the top.sof from the output_files folder and click Open.
- Check the Program/Configure.
- Click Start to start the configuration operation.
- Right click the Arria 10 device, click Select SMH file.
- Select the top.smh from output_files folder and click Open.
- Right click the Arria 10 device, click Show Device Sensitivity Map.
-
SelectASD region(s) - 1 in the
Sensitivity Map window as shown in the
figure below.
- Close the Sensitivity Map window.
1.4.3.5. Injecting Error with Fault Injection Debugger
You can now inject the error to the CRAM with the Fault Injection Debugger. Prior to error injection, you must start the Signal Tap to monitor the targeted signals. Perform the following steps:
- In Signal Tap Logic Analyzer window, select the Signal Tap instance and click Run Analysis in Processing menu, or hit F5.
- Back to the Fault Injection Debugger window, check Inject Fault and click Start.You may see the Intel® Quartus® Prime System message shows Injects 1 error (s) into device(s).
-
Click Read EMR, the System message shows the injected
error location as in the figure below.
The Signal Tap Logic Analyzer will read the error as the critical error and reports the affected region as 0x1, this should match to the System message that reports the error located at ASD region 1.
1.5. Implementing ECC Feature in Intel Arria 10 ROM Design
- Instantiate the RAM: 2-PORT IP with the following settings:
Parameters | Settings |
---|---|
Operation Mode | Select With one read port and one write port. |
Use different data width on different ports | Disable |
RAM Block Type | Select M20K. |
Create byte enable for port A and Create byte enable for port A | Disable |
Enable Error Correction Checking | Enable |
Do you want to specify the initial content of the memory? | Select Yes, use this file for the memory content data and specify the location of the file. |
-
Connect the signals of the IP according to the following figure.
Figure 11. ROM with ECC Feature Using RAM: 2-PORT IP
1.5.1. Examples of Error Detection and Correction
Address | ROM content |
---|---|
00h | 32h |
01h | 33h |
02h | 34h |
: : |
: : |
1Dh | 4Fh |
1Eh | 50h |
1Fh | 51h |
Single-bit Error
The following figure shows an example of a single-bit error waveform following an SEU event impact on ROM content of address 1Fh. The waveform indicates that there is a two-clock cycle latency on the output with respect to the associated read address. When the ROM content is free from bit-flip, the eccstatus signal shows 2b’00. The ROM content of address 1Fh was initialized with data 51h using the .mif file as shown in the Example of ROM Content Initialization table. The ECC status signal shows 2b’10 indicating a single error bit is detected at the ROM content of address 1Fh. The IP corrects the error at the output.
Three Adjacent Bits Error
The following figure shows an example of three adjacent bits error waveform following a multi-bit upset (MBU) event on the ROM content of address 1Fh. The waveform indicates that there is a two-clock cycle latency on the output with respect to the associated read address. The ROM content of address 1Fh was initialized with data 51h using the .mif file as shown in the Example of ROM Content Initialization table. The ECC status signal shows 2b’11 which indicates 3 adjacent bits error detected at the ROM content of the address 1Fh and uncorrectable data appears at the output.
1.6. Modifying Single-Device .jam Files for Use in a Multi-Device JTAG Chain
-
Check the instruction register lengths of all the other devices
in the JTAG chain.
- IR length:
- Intel® FPGA and CPLD devices: 10
- Hardware processor system (HPS) in Intel® SoC FPGA devices: 4
- DR length in any device: 1
- IR length:
- Locate the PROCEDURE EXECUTE line in the .jam file codes and add codes in the following steps to new lines after it.
-
If there are devices in the chain before the target device, add
the following codes:
POSTIR <total IR length before the target device>; POSTDR <total DR length before the target device>;
-
If there are devices in the chain after the target device, add
the following codes:
PREIR <total IR length after the target device>; PREDR <total DR length after the target device>;
Other Devices Exist in JTAG Chain Before or After Target Device
For each example chain, add the codes after the PROCEDURE EXECUTE line:
- Download cable TDI → other
device 1 (IR=10) → target device → download cable TDO:
POSTIR 10; POSTDR 1;
- Download cable TDI → target
device → other device 1 (IR=10) → download cable TDO:
PREIR 10; PREDR 1;
- Download cable TDI → target
device → other device 1 (IR=10) → other device 2 (IR=10) → download cable
TDO:
PREIR 20; PREDR 2;
- Download cable TDI → other
device 1 (IR=4) → target device → other device 2 (IR=10) → download cable
TDO:
POSTIR 4; POSTDR 1; PREIR 10; PREDR 1;
1.7. Document Revision History for AN 737: SEU Detection and Recovery in Intel Arria 10 Devices
Document Version | Changes |
---|---|
2020.04.13 |
|
2019.08.09 | Added steps to modify .jam file for use in a multi-device JTAG chain. |
2018.09.04 |
|
Date | Version | Changes |
---|---|---|
March 2017 | 2017.03.15 | Rebranded as Intel. |
February 2017 | 2017.02.13 |
|
October 2016 | 2016.10.31 |
|
March 2016 | 2016.03.03 | Updated CRC_ERROR pin behavior when uncorrectable error cannot be located. |
March 2016 | 2016.03.02 | Initial release. |