Intel® Acceleration Stack User Guide: Intel® FPGA Programmable Acceleration Card N3000-N/2

ID 683362
Date 11/01/2021
Public
Document Table of Contents

13.1. OPAE Handling of SEU

An OPAE tool fpgad monitors for SEU events and records any such occurrence in the log file /var/lib/opae/fpgad.log

To start fpgad:
sudo systemctl start fpgad
  • Intel® MAX® 10 SEU:
    The fpgad.log file would show the below output:
    tail -f /var/lib/opae/fpgad.log
    fpgad-vc: failed to get value object for sensor38.
    fpgad-vc: poll count = 1
    fpgad-vc: SEU error occurred on bmc @ 0000:b2:00.0
    fpgad-vc: failed to get value object for sensor15.
    fpgad-vc: failed to get value object for sensor38.
    
    Ignore the message: failed to get value object for sensor. Sensor 15 and sensor 38 indicate QSFP temperature. This failure indicates that the QSFP cable was not plugged in.
  • FPGA SEU:
    The fpgad.log file would show the below output:
    tail -f /var/lib/opae/fpgad.log
    fpgad-vc: failed to get value object for sensor38.
    fpgad-vc: poll count = 1
    fpgad-vc: SEU error occurred on fpga @ 0000:b2:00.0
    fpgad-vc: failed to get value object for sensor15.
    fpgad-vc: failed to get value object for sensor38.
    
    Ignore the message: failed to get value object for sensor. Sensor 15 and sensor 38 indicate QSFP temperature. This failure indicates that the QSFP cable was not plugged in.
To recover from both Intel® MAX® 10 and FPGA SEU event, reset the Intel® FPGA PAC N3000-N/2 using the following command:
$ rsu bmcimg [PCIe B:D.F]
For testing your system's response to an SEU event, Intel provides a mechanism to inject an error which will be logged by fpgad similar to the way an SEU event is logged.
  1. Start fpgad
    $ sudo systemctl start fpgad
  2. Terminal 2: monitor fpgad.log
    $ sudo tail -f /var/lib/opae/fpgad.log
  3. Terminal 1: Inject error
    $ sudo sh -c "echo 1 > /sys/class/fpga/intel-fpga-dev.0/\
    intel-fpga-fme.0/errors/inject_error"
    Sample output:
    fpgad-vc: error interrupt event received.
    fpgad-vc: poll count = 1.
    fpgad-vc: detect inject_error 0x1 @ 0000:15:00.0
    fpgad-vc: detect catfatal_errors 0x800 @ 0000:15:00.0
    Note: poll count =1: indicates an error was detected.
  4. To clear the error injection:
    $ sudo sh -c "echo 0 > /sys/class/fpga/intel-fpga-dev.0/intel-fpga-fme.0/errors/inject_error"