2.2. Model Performance
- Arria® 10: 265 MHz
- Agilex™ 7: 400 MHz
The performance results for the designs that follow were achieved using the dla_build_example_design.py script that is included with the FPGA AI Suite. The script uses a standard (-2) speed bin with a single seed and uses high-effort compiler settings.
- Arria® 10 runtime host: CentOS7 host on an Intel® Xeon® processor E5-1650 @ 3.6 GHz
- Agilex™ 7 runtime host: SLES12 host on an Intel® Xeon® processor E5-1650 @ 3.5 GHz.
set_global_assignment -name ALLOW_SHIFT_REGISTER_MERGING_ACROSS_HIERARCHIES ALWAYS set_global_assignment -name DISABLE_REGISTER_MERGING_ACROSS_HIERARCHIES OFF
The architectures in the tables that follow are in the $COREDLA_ROOT/example_architectures/ directory. Review the README file in that directory for information about each architecture.
Details - FPGA AI Suite 2024.1
Architecture | fMAX | ALMs | DSPs | M20Ks | Registers |
---|---|---|---|---|---|
A10_FP16_Generic | 306 MHz | 27.8 k | 186 | 498 | 72 k |
A10_FP16_Performance | 280 MHz | 84.2 k | 1162 | 1482 | 254 k |
A10_Small_NoSoftmax | 345 MHz | 14.6 k | 80 | 247 | 40 k |
A10_Small_Softmax | 325 MHz | 15.7 k | 90 | 255 | 43 k |
A10_Generic | 299 MHz | 30.5 k | 202 | 617 | 81 k |
A10_Performance | 275 MHz | 57.8 k | 650 | 948 | 171 k |
AGX7_FP16_Generic | 600 MHz | 32.1 k | 186 | 501 | 96 k |
AGX7_FP16_Performance | 600 MHz | 99.3 k | 1162 | 1495 | 331 k |
AGX7_Small_NoSoftmax | 616 MHz | 16.3 k | 80 | 296 | 54 k |
AGX7_Small_Softmax | 616 MHz | 17.7 k | 90 | 304 | 60 k |
AGX7_Generic | 616 MHz | 35.1 k | 202 | 759 | 117 k |
AGX7_Performance | 581 MHz | 62.4 k | 650 | 1240 | 202 k |
AGX7_Performance_Giant | 565 MHz | 117.9 k | 1546 | 2333 | 407 k |
public/mobilenet-v1-1.0-224
Architecture | ALMs | DSPs | DDR 1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 27.8 k | 186 | 1181 | 89 | 71.2 | 89.5 |
A10_FP16_Performance | 84.2 k | 1162 | 4685 | 294 | 71.2 | 89.5 |
A10_Small_NoSoftmax | 14.6 k | 80 | 1142 | 98 | 69.8 | 89.1 |
A10_Small_Softmax | 15.7 k | 90 | 1080 | 93 | 69.6 | 89.1 |
A10_Generic | 30.5 k | 202 | 985 | 131 | 69.6 | 89.1 |
A10_Performance | 57.8 k | 650 | 2619 | 297 | 70.0 | 89.0 |
AGX7_FP16_Generic | 32.1 k | 186 | 2262 | 171 | 71.2 | 89.5 |
AGX7_FP16_Performance | 99.3 k | 1162 | 8921 | 560 | 71.2 | 89.5 |
AGX7_Small_NoSoftmax | 16.3 k | 80 | 2788 | 169 | 70.9 | 89.6 |
AGX7_Small_Softmax | 17.7 k | 90 | 2790 | 169 | 70.9 | 89.5 |
AGX7_Generic | 35.1 k | 202 | 3327 | 257 | 70.9 | 89.5 |
AGX7_Performance | 62.4 k | 650 | 6095 | 390 | 70.9 | 89.5 |
AGX7_Performance_Giant | 117.9 k | 1546 | 9130 | 1511 | 70.9 | 89.6 |
public/mobilenet-v2
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 27.8 k | 186 | 1996 | 81 | 71.8 | 89.6 |
A10_FP16_Performance | 84.2 k | 1162 | 3608 | 193 | 71.8 | 89.6 |
A10_Small_NoSoftmax | 14.6 k | 80 | 2417 | 86 | 70.1 | 88.6 |
A10_Small_Softmax | 15.7 k | 90 | 2318 | 82 | 70.0 | 88.6 |
A10_Generic | 30.5 k | 202 | 850 | 106 | 70.0 | 88.6 |
A10_Performance | 57.8 k | 650 | 2090 | 197 | 69.6 | 88.3 |
AGX7_FP16_Generic | 32.1 k | 186 | 3644 | 148 | 71.8 | 89.6 |
AGX7_FP16_Performance | 99.3 k | 1162 | 6968 | 373 | 71.8 | 89.6 |
AGX7_Small_NoSoftmax | 16.3 k | 80 | 4549 | 140 | 71.6 | 89.6 |
AGX7_Small_Softmax | 17.7 k | 90 | 4556 | 140 | 71.8 | 89.4 |
AGX7_Generic | 35.1 k | 202 | 2701 | 202 | 71.8 | 89.4 |
AGX7_Performance | 62.4 k | 650 | 5802 | 279 | 71.7 | 89.4 |
AGX7_Performance_Giant | 117.9 k | 1546 | 6766 | 1154 | 71.7 | 89.4 |
public/mobilenet-v2-1.4-224
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 27.8 k | 186 | 2214 | 66 | 74.8 | 91.9 |
A10_FP16_Performance | 84.2 k | 1162 | 4990 | 165 | 74.8 | 91.9 |
A10_Generic | 30.5 k | 202 | 1594 | 84 | 73.2 | 90.9 |
A10_Performance | 57.8 k | 650 | 2983 | 170 | 72.4 | 90.3 |
AGX7_FP16_Generic | 32.1 k | 186 | 4097 | 122 | 74.8 | 91.9 |
AGX7_FP16_Performance | 99.3 k | 1162 | 8808 | 293 | 74.8 | 91.9 |
AGX7_Generic | 35.1 k | 202 | 4111 | 148 | 74.7 | 91.8 |
AGX7_Performance | 62.4 k | 650 | 7314 | 244 | 74.7 | 91.8 |
AGX7_Performance_Giant | 117.9 k | 1546 | 7817 | 882 | 74.6 | 91.8 |
public/mobilenet-v3-large-1.0-224-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 27.8 k | 186 | 2071 | 93 | 75.8 | 92.1 |
A10_FP16_Performance | 84.2 k | 1162 | 6625 | 141 | 75.8 | 92.1 |
AGX7_FP16_Generic | 32.1 k | 186 | 3797 | 170 | 75.8 | 92.1 |
AGX7_FP16_Performance | 99.3 k | 1162 | 11156 | 238 | 75.8 | 92.1 |
AGX7_Generic | 35.1 k | 202 | 4625 | 185 | 72.3 | 90.7 |
AGX7_Performance | 62.4 k | 650 | 11150 | 238 | 72.3 | 90.5 |
AGX7_Performance_Giant | 117.9 k | 1546 | 8720 | 370 | 72.4 | 90.6 |
public/resnet-50-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 27.8 k | 186 | 1560 | 16 | 76.8 | 92.9 |
A10_FP16_Performance | 84.2 k | 1162 | 6595 | 93 | 76.8 | 92.9 |
A10_Small_NoSoftmax | 14.6 k | 80 | 2024 | 17 | 76.6 | 92.7 |
A10_Small_Softmax | 15.7 k | 90 | 1912 | 16 | 76.4 | 92.6 |
A10_Generic | 30.5 k | 202 | 1367 | 31 | 76.4 | 92.6 |
A10_Performance | 57.8 k | 650 | 4294 | 97 | 76.5 | 92.7 |
AGX7_FP16_Generic | 32.1 k | 186 | 3002 | 32 | 76.8 | 92.9 |
AGX7_FP16_Performance | 99.3 k | 1162 | 11555 | 163 | 76.8 | 92.9 |
AGX7_Small_NoSoftmax | 16.3 k | 80 | 5983 | 28 | 77.0 | 92.9 |
AGX7_Small_Softmax | 17.7 k | 90 | 5985 | 28 | 77.1 | 92.9 |
AGX7_Generic | 35.1 k | 202 | 4310 | 62 | 77.1 | 92.9 |
AGX7_Performance | 62.4 k | 650 | 10094 | 143 | 76.9 | 92.9 |
AGX7_Performance_Giant | 117.9 k | 1546 | 8227 | 242 | 76.9 | 92.8 |
Resnet50 v1 (Caffe)
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 27.8 k | 186 | 1464 | 20 | 74.4 | 91.4 |
A10_FP16_Performance | 84.2 k | 1162 | 6999 | 113 | 74.4 | 91.4 |
A10_Small_NoSoftmax | 14.6 k | 80 | 1419 | 21 | 73.8 | 91.2 |
A10_Small_Softmax | 15.7 k | 90 | 1340 | 20 | 73.9 | 91.0 |
A10_Generic | 30.5 k | 202 | 1372 | 38 | 73.9 | 91.0 |
A10_Performance | 57.8 k | 650 | 4387 | 118 | 73.9 | 91.1 |
AGX7_FP16_Generic | 32.1 k | 186 | 2820 | 38 | 74.4 | 91.4 |
AGX7_FP16_Performance | 99.3 k | 1162 | 11985 | 193 | 74.4 | 91.4 |
AGX7_Small_NoSoftmax | 16.3 k | 80 | 4197 | 37 | 74.1 | 91.4 |
AGX7_Small_Softmax | 17.7 k | 90 | 4197 | 37 | 74.2 | 91.3 |
AGX7_Generic | 35.1 k | 202 | 4602 | 75 | 74.2 | 91.3 |
AGX7_Performance | 62.4 k | 650 | 10435 | 168 | 74.0 | 91.4 |
AGX7_Performance_Giant | 117.9 k | 1546 | 8299 | 268 | 74.1 | 91.4 |
intel/unet-camvid-onnx-0001
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
---|---|---|---|---|
A10_FP16_Generic | 27.8 k | 186 | 430 | 0.55 |
A10_FP16_Performance | 84.2 k | 1162 | 2147 | 3.57 |
AGX7_FP16_Generic | 32.1 k | 186 | 812 | 1.07 |
AGX7_FP16_Performance | 99.3 k | 1162 | 4331 | 7.20 |
AGX7_Small_NoSoftmax | 16.3 k | 80 | 1133 | 1.10 |
AGX7_Small_Softmax | 17.7 k | 90 | 1135 | 1.10 |
AGX7_Generic | 35.1 k | 202 | 1310 | 2.13 |
AGX7_Performance | 62.4 k | 650 | 3743 | 6.37 |
AGX7_Performance_Giant | 117.9 k | 1546 | 5672 | 12.24 |
public/yolo-v3-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Detection mAP @0.5 | Detection mAP @0.5:0.95 |
---|---|---|---|---|---|---|
A10_FP16_Generic | 27.8 k | 186 | 718 | 2.1 | 62.27 | 31.58 |
A10_FP16_Performance | 84.2 k | 1162 | 3074 | 13.5 | 62.25 | 31.58 |
A10_Generic | 30.5 k | 202 | 651 | 4.0 | 62.07 | 31.26 |
A10_Performance | 57.8 k | 650 | 1777 | 11.5 | 62.25 | 31.32 |
AGX7_FP16_Generic | 32.1 k | 186 | 1392 | 4.1 | 62.27 | 31.58 |
AGX7_FP16_Performance | 99.3 k | 1162 | 6268 | 27.6 | 62.25 | 31.58 |
AGX7_Generic | 35.1 k | 202 | 1861 | 8.1 | 62.28 | 31.49 |
AGX7_Performance | 62.4 k | 650 | 2695 | 11.9 | 62.22 | 31.47 |
AGX7_Performance_Giant | 117.9 k | 1546 | 5206 | 31.8 | 62.25 | 31.46 |
public/yolo-v3-tiny-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Detection mAP @0.5 | Detection mAP @0.5:0.95 |
---|---|---|---|---|---|---|
A10_FP16_Generic | 27.8 k | 186 | 548 | 19 | 35.79 | 14.77 |
A10_FP16_Performance | 84.2 k | 1162 | 2280 | 57 | 35.81 | 14.78 |
A10_Generic | 30.5 k | 202 | 773 | 37 | 35.76 | 14.78 |
A10_Performance | 57.8 k | 650 | 1369 | 43 | 35.71 | 14.70 |
AGX7_FP16_Generic | 32.1 k | 186 | 1065 | 37 | 35.79 | 14.77 |
AGX7_FP16_Performance | 99.3 k | 1162 | 4540 | 113 | 35.81 | 14.78 |
AGX7_Generic | 35.1 k | 202 | 2011 | 68 | 35.76 | 14.74 |
AGX7_Performance | 62.4 k | 650 | 1595 | 40 | 35.73 | 14.72 |
AGX7_Performance_Giant | 117.9 k | 1546 | 5234 | 113 | 35.81 | 14.75 |
public/yolo-v8-nano detection
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Detection mAP @0.5 | Detection mAP @0.5:0.95 |
---|---|---|---|---|---|---|
A10_FP16_Performance | 84.2 k | 1162 | 3407 | 53 | 51.15 | 36.52 |
A10_Generic | 30.5 k | 202 | 923 | 20 | 50.62 | 36.05 |
A10_Performance | 57.8 k | 650 | 2306 | 39 | 50.59 | 36.03 |
AGX7_FP16_Performance | 99.3 k | 1162 | 6159 | 96 | 51.15 | 36.52 |
AGX7_Generic | 35.1 k | 202 | 2481 | 51 | 51.14 | 36.50 |
AGX7_Performance | 62.4 k | 650 | 6276 | 100 | 51.10 | 36.48 |
public/yolo-v8-nano classification
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Performance | 84.2 k | 1162 | 3306 | 442 | 67.92 | 87.72 |
A10_Generic | 30.5 k | 202 | 572 | 183 | 66.06 | 87.22 |
A10_Performance | 57.8 k | 650 | 993 | 223 | 65.94 | 87.06 |
AGX7_FP16_Performance | 99.3 k | 1162 | 8687 | 1161 | 67.92 | 87.72 |
AGX7_Generic | 35.1 k | 202 | 5604 | 963 | 67.96 | 87.86 |
AGX7_Performance | 62.4 k | 650 | 10355 | 1384 | 67.72 | 87.72 |
public/squeezenet1.1
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 27.8 k | 186 | 326 | 113 | 58.5 | 81.1 |
A10_FP16_Performance | 84.2 k | 1162 | 2316 | 465 | 58.5 | 81.1 |
A10_Small_NoSoftmax | 14.6 k | 80 | 371 | 126 | 58.9 | 80.9 |
A10_Small_Softmax | 15.7 k | 90 | 350 | 119 | 58.1 | 81.1 |
A10_Generic | 30.5 k | 202 | 499 | 274 | 58.1 | 81.1 |
A10_Performance | 57.8 k | 650 | 1337 | 461 | 58.7 | 81.1 |
AGX7_FP16_Generic | 32.1 k | 186 | 631 | 219 | 58.5 | 81.1 |
AGX7_FP16_Performance | 99.3 k | 1162 | 4553 | 915 | 58.5 | 81.1 |
AGX7_Small_NoSoftmax | 16.3 k | 80 | 929 | 221 | 58.5 | 81.0 |
AGX7_Small_Softmax | 17.7 k | 90 | 930 | 222 | 58.5 | 81.0 |
AGX7_Generic | 35.1 k | 202 | 1754 | 545 | 58.5 | 81.0 |
AGX7_Performance | 62.4 k | 650 | 2108 | 424 | 58.4 | 81.0 |
AGX7_Performance_Giant | 117.9 k | 1546 | 3732 | 980 | 58.3 | 81.1 |
public/i3d_rgb_tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 27.8 k | 186 | 218 | 0.31 | 65.79 | 82.89 |
A10_FP16_Performance | 84.2 k | 1162 | 1187 | 1.92 | 65.79 | 82.89 |
A10_Small_NoSoftmax | 14.6 k | 80 | 235 | 0.32 | 66.01 | 83.55 |
A10_Small_Softmax | 15.7 k | 90 | 221 | 0.31 | 65.35 | 83.55 |
A10_Generic | 30.5 k | 202 | 349 | 0.68 | 66.23 | 83.11 |
A10_Performance | 57.8 k | 650 | 1097 | 1.89 | 66.67 | 83.77 |
AGX7_FP16_Generic | 32.1 k | 186 | 438 | 0.60 | 65.79 | 82.89 |
AGX7_FP16_Performance | 99.3 k | 1162 | 2389 | 3.86 | 65.79 | 82.89 |
AGX7_Small_NoSoftmax | 16.3 k | 80 | 491 | 0.58 | 65.35 | 83.11 |
AGX7_Small_Softmax | 17.7 k | 90 | 492 | 0.58 | 65.57 | 83.11 |
AGX7_Generic | 35.1 k | 202 | 745 | 1.36 | 65.57 | 83.11 |
AGX7_Performance | 62.4 k | 650 | 2316 | 3.74 | 65.13 | 83.11 |
AGX7_Performance_Giant | 117.9 k | 1546 | 2685 | 4.39 | 65.79 | 82.89 |