17.2. Using the FPGA AI Suite Software Emulation
If you do not have an FPGA board installed and you want to evaluate the accuracy of your machine learning model against the architecture file for the FPGA AI Suite IP of your choice, you can still run inference on the software emulation model for fast software development iterations.
The software emulation model is a C++ software model of the FPGA AI Suite IP that is bit-accurate(*). The emulation models the numeric details of the IP (including the behavior of the block floating point arithmetic when enabled), relatively low-level transactions, and the processing delays of modules.
The emulation of the FPGA AI Suite IP is accessible through the OpenVINO™ plugin interface. This emulation models the numeric details of the IP, including the behavior of the block floating point arithmetic (when used).
The OpenVINO™ emulation plugin is enabled in the $COREDLA_ROOT/reference/plugins_emulation.xml plugins file. Because it uses the OpenVINO™ plugin architecture, it works with both the OpenVINO™ Python API and the C++ API. Because the emulation executes on the CPU and does not benefit from the FPGA acceleration, the emulation is much slower than inference on the FPGA. Typical inference times for a single image with ResNet50 are on the order of minutes of time. The inference speed varies dramatically depending on the architecture configuration and the graph.
- Build the runtime with the following commands:
cd $COREDLA_WORK/runtime rm -rf build_Release ./build_runtime.sh -target_emulation
- Run inference with the -niter=1 and -nireq=1 options (because the software model is reasonably slow) with the following commands:
modeldir=$COREDLA_WORK/demo/models/public imagedir=$COREDLA_WORK/demo/sample_images curarch=$COREDLA_ROOT/example_architectures/AGX7_Performance.arch gnd=$imagedir/TF_ground_truth.txt cd $COREDLA_WORK/runtime/build_Release/dla_benchmark ./dla_benchmark \ -b 1 `# Run only a single image` \ -niter 1 `# Run only a single image` \ -nireq 1 `# Running emulator: so -nireq=1` \ -m $modeldir/resnet-50-tf/FP32/resnet-50-tf.xml \ `# Same as when running on hardware - specify the graph` \ -d HETERO:FPGA,CPU \ `# Same as when running on hardware - \ use FPGA if possible, fallback to CPU` \ -i $imagedir \ `# Same as when running on hardware - specify image directory` \ -arch_file $curarch \ `# Same as when running on hardware - specify .arch file` \ -dump_output \ `# Dump output result.txt file` \ -plugins emulation \ `# Use the software emulator` \ -groundtruth_loc $gnd `# Location of the ground truth file for $imagedir`
When the ML model is sensitive to a high precision input, you can choose to bootstrap the precision of the initial layers in the ML graph. Less commonly, some graphs may benefit from higher precision for intermediate or final layers.
You must manually modify the OpenVINO™ IR to use this feature.
Use Mixed Precision to Improve Accuracy
When the ML model is sensitive to a high precision input, you can bootstrap the precision of the initial layers in the ML graph. You must manually modify the OpenVINO™ IR to use the mixed precision feature.