5. Containerized FPGA AI Suite SoC Design Example Quick-Start Tutorial

AN 1008: Using the FPGA AI Suite Docker* Image

Download PDF

ID 820119

Date 12/16/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

5. Containerized FPGA AI Suite SoC Design Example Quick-Start Tutorial

This quick-start tutorial demonstrates how you can run the FPGA AI Suite SoC design example quick-start tutorial in a containerized FPGA AI Suite instance.

When running the container, the environment for FPGA AI Suite, OpenVINO™ , and Python environment (openvino_env) are all set for you when the container is started.

To run the FPGA AI Suite SoC example design streaming-to-memory (S2M) streaming demonstration application in a containerized FPGA AI Suite instance:

Set up the FPGA AI Suite Docker* image.
For instructions refer to Setting Up the FPGA AI Suite Docker Image.
Start the FPGA AI Suite Docker* container.
For instructions, refer to Running the FPGA AI Suite Docker Container.

Starting the container sets all the required environment variables, including the CODEDLA_WORK and COREDLA_ROOT environment variables.

The remaining steps are performed in the Windows command prompt session that you start in this step.

Confirm that the FPGA AI Suite compiler is working correctly by running the following command:

dla_compiler \
     --march $COREDLA_ROOT/example_architectures/AGX7_Performance.arch \
     --fanalyze-area

This command should generate output similar to the following example output:

Exporting input transform to file
Exporting output transform to file
Executing area estimate
Estimated area:
  ALMs: 56186
  ALUTs: 59999
  Registers: 217674
  DSPs: 602
  M20Ks: 1209
  Memory ALMs: 2426

Create a working directory for the FPGA AI Suite SoC example design files and copy the files into the working directories:
```
mkdir ~/coredla_work && cd coredla_work

source dla_init_local_directory.sh
```
The FPGA AI Suite SoC example design files include precompiled bitstreams and SD card image files. Instructions for compiling the SD card image are provided later on.
- Agilex™ 7 FPGA I-Series Transceiver-SoC Development Kit
  The files for this development kit will be in in the following location:
```
~/coredla_work/demo/ed4/agx7_soc_s2m
```
  The set of Agilex™ 7 FPGA I-Series Transceiver-SoC Development Kit files include the following files in subfolders:
  - bitstream subfolder:
    AGX7_FP16_Generic/ AGX7_Performance/ AGX7_Small_NoSoftmax/
  - sd-card subfolder:
    coredla-image-agilex7_dk_si_agi027fa.cpio u-boot-spl-dtb.hex.jic coredla-image-agilex7_dk_si_agi027fa.wic u-boot-spl-dtb.hex.sof u-boot-spl-dtb.hex0
- Arria® 10 SX SoC FPGA Development Kit
  The files for this development kit will be in in the following location:
```
~/coredla_work/demo/ed4/a10_soc_s2m
```
  The set of Arria® 10 SX SoC FPGA Development Kit files include the following files in subfolders:
  - bitstream subfolder:
    A10_FP16_Generic/ A10_Performance/ A10_Small_NoSoftmax/
  - sd-card subfolder:
    coredla-image-arria10.cpio coredla-image-arria10.wic
Build the SD card image:

Tip: Alternatively, you use the .wic files provided in the sd-card subfolders (as shown in the previous step) and skip this step.
1. Navigate to the runtime folder with the following command:
```
cd ~/coredla_work/runtime/
```
2. Build the image for your development kit with one of the following commands:
  - Agilex™ 7 FPGA I-Series Transceiver-SoC Development Kit
```
./create_hps_image.sh \
 -f ~/coredla_work/demo/ed4/agx7_soc_s2m/bitstreams/AGX7_Performance/ \
 -o ../sd-card -u \
 -m agilex7_dk_si_agi027fa
```
  - Arria® 10 SX SoC FPGA Development Kit
```
./create_hps_image.sh \
-f ~/coredla_work/demo/ed4/a10_soc_s2m/bitstreams/A10_Performance/ \
-o ../sd-card -u \
-m arria10
```
  If either of these commands fails with an error message about being unable to clone the linux-socfpga-lts package, complete the following steps:
  1. Review the error messages to determine the branch version of the linux-socfpga-lts repository required. Look for an error message similar to the following message:
```
ERROR: linux-socfpga-lts-6.6.22-lts-git-r0 do_fetch
```
    In this example message, the branch version is 6.6.22.
  2. Run the following commands:
```
rm -rf ~/coredla_work/runtime/build_Yocto

git clone https://github.com/altera-opensource/linux-socfpga.git \
    -b socfpga-<version>-lts \
    ~/coredla_work/runtime/build_Yocto/build/downloads/git2/github.com.altera-opensource.linux-socfpga.git
```
    where <version> is the branch version you determined earlier.
  3. Run the create_hps_image.sh command again.
[FPGA] Prepare the SD card for the FPGA development kit:
1. Write the SD card image to an SD card:
  1. Open Win32 Disk Imager. The UI looks like the following image:
  2. Select the SD card device and then click the folder icon to open the File Explorer to select the .wic image to image the SD card with.
    The .wic image to specify is found in the folder that you specified as part of the docker command -v option when you followed the instructions in Running the FPGA AI Suite Docker Container. In this case, the folder is C:\Users\<username>\<path-to-share>:/mnt/host fpga-ai-suite: 2024.3.
  3. Click Write and then click Yes in the pop-up window.
2. Eject the SD card device properly from Windows* to avoid any data corruption.
3. Ensure the FPGA development kit is powered off and insert the SD card into the FPGA development kit SD card slot.
[FPGA] Prepare and program the FPGA development kit:
- Agilex™ 7 FPGA I-Series Transceiver-SoC Development Kit
  1. With FPGA development kit powered off, set switch S9 to [ON/ON/ON/X] to program the .jic file to the FPGA.
  2. Power on the development kit.
  3. Move the .jic file to the host with the following command:
```
cp ~/coredla_work/sd-card/u-boot-spl-dtb.hex.jic \
   /mnt/c/Users/<user>/Downloads/
```
  4. In a Windows* command prompt session, verify that the host system recognizes the FPGA development kit board with the following command:
```
C:\intelFPGA_pro\24.3\qprogrammer\quartus\bin64\quartus_pgm.exe -m jtag
```
    Take note of the cable/device number in the output of this command. You need this number in the next step.
  5. Program the FPGA device by running the following command at a Windows* command prompt:
```
C:\intelFPGA_pro\24.3\qprogrammer\quartus\bin64\quartus_pgm.exe \
   -m jtag \
   -c <cable_number><device_number>"
```
  6. Power off the development kit and set switch S9 to [ON/OFF/OFF/X] to set the development kit board in fast mode.
  7. Power on the development kit.
- Arria® 10 SX SoC FPGA Development Kit
  Not required.
Obtain the FPGA development kit host name and IP address:
1. With the FPGA development kit powered on, start a minicom session from the Windows* command prompt session where you started the container. Run the following command to start a minicom session:
```
sudo minicom
```
2. In the minicom session, run the following command to get the host name of the FPGA development kit:
```
hostname
```
3. In the minicom session, run the following command to get the IP address of the FPGA development kit:
```
ping <hostname>.local –c4
```
  Where <hostname> is the host name you obtained in the previous step.

Install OpenVINO™ Model Zoo:

Start an Ubuntu command line session.

In the command line session, run the following commands:

cd ~/coredla_work/demo

git clone https://github.com/openvinotoolkit/open_model_zoo.git

cd open_model_zoo

git checkout 2023.3.0

Generate IR files for FPGA AI Suite using the OpenVINO™ Model Optimizer with the following commands:

omz_downloader --name resnet-50-tf \
  --output_dir $COREDLA_WORK/demo/models/

omz_converter --name resnet-50-tf \
  --download_dir $COREDLA_WORK/demo/models/ \  
  --output_dir $COREDLA_WORK/demo/models/

These commands result in the following IR files:

resnet-50-tf.bin
resnet-50-tf.xml
resnet-50-tf.mapping

Compile the model for use on the FPGA device with the FPGA AI Suite compiler. The precompiled SD card image (.wic) provided with the FPGA AI Suite uses one of the following files as the IP architecture configuration file:

Agilex™ 7 FPGA I-Series Transceiver-SoC Development Kit
```
AGX7_Performance.arch
```
Arria® 10 SX SoC FPGA Development Kit
```
A10_Performance.arch
```

To create the AOT file for the M2M variant (which uses the dla_benchmark utility), run the following command:

cd $COREDLA_WORK/demo/models/public/resnet-50-tf/FP32

dla_compiler \
  --march $COREDLA_ROOT/example_architectures/<IP arch config file> \
  --network-file ./resnet-50-tf.xml \
  --foutput-format=open_vino_hetero \
  --o $COREDLA_WORK/demo/RN50_Performance_b1.bin \
  --batch-size=1 \
  --fanalyze-performance

where <IP arch config file> is one of the IP architecture configuration files listed earlier.

To create the AOT file for the S2M variant (which uses the streaming inference app), run the following command:

cd $COREDLA_WORK/demo/models/public/resnet-50-tf/FP32

dla_compiler \
  --march $COREDLA_ROOT/example_architectures/<IP arch config file> \
  --network-file ./resnet-50-tf.xml \
  --foutput-format=open_vino_hetero \
  --o $COREDLA_WORK/demo/RN50_Performance_no_folding.bin \
  --batch-size=1 \
  --fanalyze-performance \
  --ffolding-option=0

where <IP arch config file> is one of the IP architecture configuration files listed earlier.

After running either these commands, the compiled models and demonstration files are in the following locations:

Compiled Models

$COREDLA_WORK/demo/RN50_Performance_b1.bin

$COREDLA_WORK/demo/RN50_Performance_no_folding.bin

Sample Images

$COREDLA_WORK/demo/sample_images/

Architecture File

$COREDLA_ROOT/example_architectures/AGX7_Performance.arch

$COREDLA_ROOT/example_architectures/A10_Performance.arch

(Optional) At this point, you can also try one of the following flows before continuing:
- FPGA AI Suite Architecture Generation Flow
- FPGA AI Suite IP Creation Flow
Copy the required demonstration files to the /home/root/resnet-50-tf folder on the SD card:
1. In the minicom session, create directories to receive the model data and sample images:
```
mkdir ~/resnet-50-tf
```
2. On the development host, use the secure copy (scp) command to copy the data to the board:
```
TARGET_IP=<Development Kit Hostname>.local

TARGET=”root@$TARGET_IP:~/resnet-50-tf”

demodir=$COREDLA_WORK/demo

scp $demodir/*.bin $TARGET/.

scp -r $demodir/sample_images/ $TARGET/.

scp $COREDLA_ROOT/example_architectures/<architecture file> $TARGET/.

scp $COREDLA_ROOT/build_os.txt $TARGET/../app/
```
  where <architecture file> is one of the following files, depending on your development kit:
  - Agilex™ 7 FPGA I-Series Transceiver-SoC Development Kit
```
AGX7_Performance.arch
```
  - Arria® 10 SX SoC FPGA Development Kit
```
A10_Performance.arch
```
3. [Optional] In the minicom session, run the sync command to ensure that the data is flushed to disk.
Verify the FPGA development kit device drivers. The device drivers should be loaded when the HPS boots.
Verify that the device drivers are initialized by checking that uio files are listed in /sys/class/uio by running the following command:
```
ls /sys/class/uio
```
The command should show output similar to the following example:
```
uio0 uio1 uio2
```
If the drivers are not listed, refresh the modules by running the following command before checking again that the drivers are loaded:
```
uio-devices restart
```

Run one of the demonstration applications:

Run the M2M demonstration application

The M2M data flow model uses the dla_benchmark demonstration application. The S2M bitstream supports both the M2M data flow model and the S2M data flow model.

You must know the host name of the FPGA development kit that you determined in an earlier step.

To run inference on the FPGA development kit:

Open an SSH connection to the FPGA development kit:
1. Start a new terminal session
2. Run the following command:
```
build-host:$ ssh <devkit_hostname>
```

In the SSH terminal, run the following commands:

export compiled_model=~/resnet-50-tf/RN50_Performance_b1.bin

export imgdir=~/resnet-50-tf/sample_images

export archfile=~/resnet-50-tf/<architecture file>

cd ~/app

export COREDLA_ROOT=/home/root/app

./dla_benchmark \
   -b=1 \
   -cm $compiled_model \
   -d=HETERO:FPGA,CPU \
   -i $imgdir \
   -niter=5 \
   -plugins_xml_file ./plugins.xml \
   -arch_file $archfile \
   -api=async \
   -groundtruth_loc $imgdir/TF_ground_truth.txt \
   -perf_est \
   -nireq=4 \
   -bgr

where <architecture file> is one of the following files, depending on your development kit:

Agilex™ 7 FPGA I-Series Transceiver-SoC Development Kit
```
AGX7_Performance.arch
```
Arria® 10 SX SoC FPGA Development Kit
```
A10_Performance.arch
```

The dla_benchmark command generates output similar to the following example output for each step:

[Step 11/12] Dumping statistics report
count:             8 iterations
system duration:   174.3530 ms
IP duration:       112.1184 ms
latency:           79.9449 ms
system throughput: 45.8839 FPS
number of hardware instances: 1
number of network instances: 1
IP throughput per instance: 71.3531 FPS
IP throughput per fmax per instance: 0.3568 FPS/MHz
IP clock frequency: 200.0000 MHz
[Step 12/12] Dumping the output values
[ INFO ] Dumping result of Graph_0 to result.txt and result_tensor_boundaries.txt

Run the S2M demonstration application
To run the S2M (streaming) mode demonstration application, you need two terminal connections to the host.

You must know the host name of the FPGA development kit that you determined in an earlier step.
To run the streaming demonstration application:
1. Open an SSH connection to the SoC FPGA development kit:
  1. Start a new terminal session
  2. Run the following command:
```
build-host:$ ssh <devkit_hostname>
```
    Where <devkit_hostname> is the host name you determined earlier.
2. Repeat the previous to open a second SSH connection to the FPGA development kit.
3. In a terminal session, run the following commands:
```
export COREDLA_ROOT=/home/root/app

cd /home/root/app

./run_inference_stream.sh
```
4. In the other terminal session, run the following commands:
```
cd /home/root/app

./run_image_stream.sh
```
The first terminal session (where you ran the run_inference_stream.sh command) then shows output similar to the following example:
```
root@arria10-ea80b8d770e7:~/app# ./run_inference_stream.sh
Runtime arch check is enabled. Check started...
Runtime arch check passed.
Runtime build version check is enabled. Check started...
Runtime build version check passed.
Ready to start image input stream.
1 - class ID 683, score = 40.0146
2 - class ID 954, score = 92.8223
3 - class ID 968, score = 91.6016
4 - class ID 769, score = 96.4844
5 - class ID 872, score = 99.6094
6 - class ID 954, score = 92.8223
7 - class ID 683, score = 40.0146
8 - class ID 968, score = 91.6016
9 - class ID 769, score = 96.4844
10 - class ID 872, score = 99.6094
```

Exit the demonstration application by pressing CTRL+C.
Exit the Docker* container with the exit command.
You can restart the Docker* container with the docker start -i fpga-ai-suite- 2024.3 command.

FPGA AI Suite Architecture Generation Flow

To generate an architecture that is optimized for a graph, the FPGA AI Suite architecture optimizer uses a base architecture and modifies parameters to achieve the highest throughput in frames per second (fps).

The best architecture is saved as an architecture description file with a file name based on the architecture parameters.

Example commands to generate the highest performance:

cd $COREDLA_WORK/demo/models/public/resnet-50-tf/FP32

dla_compiler --gen-arch --mmax-resources=3651200,13272,8528 \
 --gen-min-sb=2048 --network-file resnet-50-tf.xml \
 --march $COREDLA_ROOT/example_architectures/AGX7_Performance.arch \
 --mmax-resources-alm-util=75 --fassumed-fmax-core=600

Example command to generate for an optimized frame rate:

dla_compiler \
 --gen-arch \
 --gen-min-sb=2048 \
 --network-file resnet-50-tf.xml \
 --march=$COREDLA_ROOT/example_architectures/AGX7_Performance.arch \ 
 --mmax-resources-alm-util=75 \ 
 --mmax-resources=427200,2713,1518 \ 
 --fassumed-fmax-core=300 \
 --mtarget-fps=100.0

Important: This command is computationally expensive and can take several hours. If insufficient computing resources (such as memory) are available, WSL2 kills the process.

FPGA AI Suite IP Creation Flow

The FPGA AI Suite IP generation utility reads an input Architecture Description File (.arch) and places generated IP into an IP library that can be imported into Platform Designer or used directly in a pure RTL design.

To run the IP creation flow, run the following command. The generated_arch.arch file in this command is generated by compiler commands such as those in FPGA AI Suite Architecture Generation Flow.

cd $COREDLA_WORK/demo/models/public/resnet-50-tf/FP32

dla_create_ip \
 --flow create_ip \
 --arch=./generated_arch.arch\
 --overwrite \
 --ip_dir ./ip

The newly generated RTL from the create_ip command can be loaded into Quartus® Prime Pro Editionand Platform Designer to see the design and modify or add to the overall design. Quartus® Prime Pro Edition and Platform Designer are not provided as part of FPGA AI Suite and must be obtained separately.

The illustration that follows shows an example of how the FPGA AI Suite IP for an Agilex™ 7 I-Series SoC design looks in Platform Designer.

Figure 1. FPGA AI Suite IP Example in Platform Designer for a Agilex™ 7 I-Series SoC Design

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

AN 1008: Using the FPGA AI Suite Docker* Image

5. Containerized FPGA AI Suite SoC Design Example Quick-Start Tutorial

FPGA AI Suite Architecture Generation Flow

FPGA AI Suite IP Creation Flow