2.2. Model Performance
- Intel® Arria® 10: 265 MHz
- Intel Agilex® 7: 400 Hz
The performance results for the designs below were achieved using the dla_build_example_design.py script that is included with the Intel® FPGA AI Suite. The script uses a standard (-2) speed bin with a single seed and does not use high-effort compiler settings. The runtime host uses CentOS7 on an Intel® Xeon® processor E5-1650 @ 3.5 GHz. This design uses a dedicated DDR interface for the IP. Performance varies based on the clock speed, the DDR latency and bandwidth, and, depending on the graph, the host CPU speed.
The architectures in the tables that follow are in the $COREDLA_ROOT/example_architectures/ directory. Review the README file in that directory for information about each architecture.
Details - Intel FPGA AI Suite V2023.1
Architecture | fMAX | ALMs | DSPs | M20Ks | Registers |
---|---|---|---|---|---|
A10_FP16_Generic | 315 MHz | 25.6 k | 162 | 485 | 68 k |
A10_FP16_Performance | 281 MHz | 79.7 k | 1114 | 1444 | 244 k |
A10_Small_NoSoftmax | 356 MHz | 14.9 k | 80 | 247 | 42 k |
A10_Small_Softmax | 353 MHz | 16.1 k | 90 | 255 | 45 k |
A10_Generic | 283 MHz | 27.3 k | 178 | 598 | 74 k |
A10_Performance | 300 MHz | 52.6 k | 602 | 910 | 160 k |
AGX7_FP16_Generic | 600 MHz | 29. k | 162 | 489 | 105 k |
AGX7_FP16_Performance | 600 MHz | 91.1 k | 1114 | 1477 | 315 k |
AGX7_Small_NoSoftmax | 600 MHz | 16.9 k | 80 | 296 | 57 k |
AGX7_Small_Softmax | 600 MHz | 18.3 k | 90 | 304 | 65 k |
AGX7_Generic | 600 MHz | 30.7 k | 178 | 751 | 110 k |
AGX7_Performance | 600 MHz | 54.5 k | 602 | 1222 | 189 k |
AGX7_Performance_NoPrelu_NoEltwise | 600 MHz | 80.8 k | 1162 | 2717 | 317 k |
public/mobilenet-v1-1.0-224
Architecture | ALMs | DSPs | DDR 1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.6 k | 162 | 1203 | 91 | 71.2 | 89.5 |
A10_FP16_Performance | 79.7 k | 1114 | 4738 | 293 | 71.2 | 89.5 |
A10_Small_NoSoftmax | 14.9 k | 80 | 1176 | 101 | 69.9 | 89.1 |
A10_Small_Softmax | 16.1 k | 90 | 1173 | 100 | 69.9 | 89.1 |
A10_Generic | 27.3 k | 178 | 1156 | 122 | 69.9 | 89.1 |
A10_Performance | 52.6 k | 602 | 2869 | 321 | 69.6 | 88.9 |
AGX7_FP16_Generic | 29. k | 162 | 2241 | 169 | 71.2 | 89.5 |
AGX7_FP16_Performance | 91.1 k | 1114 | 9089 | 562 | 71.2 | 89.5 |
AGX7_Small_NoSoftmax | 16.9 k | 80 | 2719 | 165 | 70.8 | 89.5 |
AGX7_Small_Softmax | 18.3 k | 90 | 2729 | 165 | 70.9 | 89.4 |
AGX7_Generic | 30.7 k | 178 | 4029 | 238 | 70.9 | 89.4 |
AGX7_Performance | 54.5 k | 602 | 5920 | 373 | 70.9 | 89.5 |
AGX7_Performance_NoPrelu_NoEltwise | 80.8 k | 1162 | 10497 | 441 | 70.9 | 89.5 |
public/mobilenet-v2
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.6 k | 162 | 2030 | 82 | 71.8 | 89.6 |
A10_FP16_Performance | 79.7 k | 1114 | 3699 | 192 | 71.7 | 89.6 |
A10_Small_NoSoftmax | 14.9 k | 80 | 2470 | 88 | 70.2 | 88.6 |
A10_Small_Softmax | 16.1 k | 90 | 2476 | 88 | 70.0 | 88.6 |
A10_Generic | 27.3 k | 178 | 992 | 99 | 70.0 | 88.6 |
A10_Performance | 52.6 k | 602 | 2308 | 211 | 70.1 | 88.1 |
AGX7_FP16_Generic | 29. k | 162 | 3609 | 146 | 71.8 | 89.6 |
AGX7_FP16_Performance | 91.1 k | 1114 | 7104 | 369 | 71.7 | 89.6 |
AGX7_Small_NoSoftmax | 16.9 k | 80 | 4460 | 137 | 71.6 | 89.7 |
AGX7_Small_Softmax | 18.3 k | 90 | 4468 | 137 | 71.6 | 89.6 |
AGX7_Generic | 30.7 k | 178 | 3246 | 187 | 71.6 | 89.6 |
AGX7_Performance | 54.5 k | 602 | 5773 | 270 | 71.8 | 89.4 |
AGX7_Performance_NoPrelu_NoEltwise | 80.8 k | 1162 | 9760 | 275 | 71.8 | 89.4 |
public/mobilenet-v2-1.4-224
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.6 k | 162 | 2233 | 66 | 74.8 | 91.8 |
A10_FP16_Performance | 79.7 k | 1114 | 5094 | 163 | 74.8 | 91.9 |
A10_Generic | 27.3 k | 178 | 1636 | 77 | 73.2 | 91.0 |
A10_Performance | 52.6 k | 602 | 3285 | 182 | 72.2 | 90.4 |
AGX7_FP16_Generic | 29. k | 162 | 4030 | 119 | 74.8 | 91.8 |
AGX7_FP16_Performance | 91.1 k | 1114 | 8990 | 289 | 74.8 | 91.9 |
AGX7_Generic | 30.7 k | 178 | 4495 | 140 | 74.7 | 91.8 |
AGX7_Performance | 54.5 k | 602 | 7590 | 245 | 74.6 | 91.7 |
AGX7_Performance_NoPrelu_NoEltwise | 80.8 k | 1162 | 11780 | 252 | 74.6 | 91.7 |
public/mobilenet-v3-large-1.0-224-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.6 k | 162 | 2090 | 81 | 75.8 | 92.1 |
A10_FP16_Performance | 79.7 k | 1114 | 12817 | 29 | 75.8 | 92.1 |
AGX7_FP16_Generic | 29. k | 162 | 3693 | 143 | 75.8 | 92.1 |
AGX7_FP16_Performance | 91.1 k | 1114 | 17656 | 39 | 75.8 | 92.1 |
AGX7_Generic | 30.7 k | 178 | 5077 | 133 | 72.1 | 90.8 |
AGX7_Performance | 54.5 k | 602 | 17037 | 38 | 72.5 | 90.5 |
public/resnet-50-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.6 k | 162 | 1606 | 17 | 76.8 | 92.9 |
A10_FP16_Performance | 79.7 k | 1114 | 6619 | 93 | 76.8 | 92.9 |
A10_Small_NoSoftmax | 14.9 k | 80 | 2086 | 17 | 76.6 | 92.7 |
A10_Small_Softmax | 16.1 k | 90 | 2071 | 17 | 76.4 | 92.6 |
A10_Generic | 27.3 k | 178 | 1350 | 29 | 76.4 | 92.6 |
A10_Performance | 52.6 k | 602 | 4652 | 104 | 76.6 | 92.7 |
AGX7_FP16_Generic | 29. k | 162 | 3023 | 32 | 76.8 | 92.9 |
AGX7_FP16_Performance | 91.1 k | 1114 | 11575 | 163 | 76.8 | 92.9 |
AGX7_Small_NoSoftmax | 16.9 k | 80 | 5846 | 27 | 77.0 | 92.9 |
AGX7_Small_Softmax | 18.3 k | 90 | 5847 | 27 | 77.0 | 92.9 |
AGX7_Generic | 30.7 k | 178 | 4387 | 60 | 77.0 | 92.9 |
AGX7_Performance | 54.5 k | 602 | 10301 | 145 | 76.9 | 92.8 |
AGX7_Performance_NoPrelu_NoEltwise | 80.8 k | 1162 | 13697 | 208 | 76.9 | 92.8 |
Resnet50 v1 (Caffe)
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.6 k | 162 | 1507 | 20 | 74.4 | 91.4 |
A10_FP16_Performance | 79.7 k | 1114 | 7031 | 113 | 74.4 | 91.4 |
A10_Small_NoSoftmax | 14.9 k | 80 | 1462 | 22 | 73.9 | 91.2 |
A10_Small_Softmax | 16.1 k | 90 | 1451 | 21 | 73.8 | 91.2 |
A10_Generic | 27.3 k | 178 | 1365 | 36 | 73.8 | 91.2 |
A10_Performance | 52.6 k | 602 | 4739 | 127 | 74.2 | 91.2 |
AGX7_FP16_Generic | 29. k | 162 | 2822 | 38 | 74.4 | 91.4 |
AGX7_FP16_Performance | 91.1 k | 1114 | 11948 | 191 | 74.4 | 91.4 |
AGX7_Small_NoSoftmax | 16.9 k | 80 | 4090 | 36 | 74.1 | 91.4 |
AGX7_Small_Softmax | 18.3 k | 90 | 4093 | 36 | 74.2 | 91.3 |
AGX7_Generic | 30.7 k | 178 | 4703 | 72 | 74.2 | 91.3 |
AGX7_Performance | 54.5 k | 602 | 10514 | 168 | 74.0 | 91.3 |
AGX7_Performance_NoPrelu_NoEltwise | 80.8 k | 1162 | 14288 | 228 | 74.0 | 91.3 |
intel/unet-camvid-onnx-0001
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
---|---|---|---|---|
A10_FP16_Generic | 25.6 k | 162 | 435 | 0.56 |
A10_FP16_Performance | 79.7 k | 1114 | 2126 | 3.53 |
AGX7_FP16_Generic | 29. k | 162 | 785 | 1.03 |
AGX7_FP16_Performance | 91.1 k | 1114 | 4244 | 7.05 |
AGX7_Small_NoSoftmax | 16.9 k | 80 | 1090 | 1.05 |
AGX7_Small_Softmax | 18.3 k | 90 | 1074 | 1.04 |
AGX7_Generic | 30.7 k | 178 | 1216 | 1.98 |
AGX7_Performance | 54.5 k | 602 | 1771 | 3.01 |
AGX7_Performance_NoPrelu_NoEltwise | 80.8 k | 1162 | 5517 | 7.68 |
public/yolo-v3-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
COCO AP | mAP |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.6 k | 162 | 728 | 2.2 | 31.58 | 62.24 |
A10_FP16_Performance | 79.7 k | 1114 | 3078 | 13.5 | 31.57 | 62.24 |
A10_Generic | 27.3 k | 178 | 591 | 3.7 | 31.33 | 62.21 |
A10_Performance | 52.6 k | 602 | 1783 | 11.6 | 31.37 | 62.20 |
AGX7_FP16_Generic | 29. k | 162 | 1309 | 3.9 | 31.58 | 62.24 |
AGX7_FP16_Performance | 91.1 k | 1114 | 6281 | 27.6 | 31.57 | 62.24 |
AGX7_Generic | 30.7 k | 178 | 1622 | 7.0 | 31.49 | 62.23 |
AGX7_Performance | 54.5 k | 602 | 2405 | 10.6 | 31.47 | 62.24 |
public/yolo-v3-tiny-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
COCO AP | mAP |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.6 k | 162 | 544 | 18.7 | 14.77 | 35.81 |
A10_FP16_Performance | 79.7 k | 1114 | 2287 | 56.7 | 14.77 | 35.81 |
A10_Generic | 27.3 k | 178 | 684 | 32.4 | 14.78 | 35.69 |
A10_Performance | 52.6 k | 602 | 1408 | 44.7 | 14.69 | 35.69 |
AGX7_FP16_Generic | 29. k | 162 | 1041 | 35.7 | 14.77 | 35.81 |
AGX7_FP16_Performance | 91.1 k | 1114 | 4531 | 112.3 | 14.77 | 35.81 |
AGX7_Generic | 30.7 k | 178 | 1886 | 63.6 | 14.73 | 35.76 |
AGX7_Performance | 54.5 k | 602 | 1503 | 37.2 | 14.72 | 35.73 |
public/squeezenet1.1
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.6 k | 162 | 1006 | 113 | 58.5 | 81.1 |
A10_FP16_Performance | 79.7 k | 1114 | 7904 | 280 | 58.5 | 81.0 |
A10_Small_NoSoftmax | 14.9 k | 80 | 762 | 129 | 58.8 | 80.9 |
A10_Small_Softmax | 16.1 k | 90 | 762 | 128 | 58.2 | 80.7 |
A10_Generic | 27.3 k | 178 | 12011 | 61 | 58.2 | 80.7 |
A10_Performance | 52.6 k | 602 | 5415 | 373 | 58.2 | 80.8 |
AGX7_FP16_Generic | 29. k | 162 | 1858 | 209 | 58.5 | 81.1 |
AGX7_FP16_Performance | 91.1 k | 1114 | 11269 | 400 | 58.5 | 81.0 |
AGX7_Small_NoSoftmax | 16.9 k | 80 | 2088 | 206 | 58.4 | 81.1 |
AGX7_Small_Softmax | 18.3 k | 90 | 2104 | 206 | 58.4 | 81.1 |
AGX7_Generic | 30.7 k | 178 | 17916 | 46 | 58.4 | 81.1 |
AGX7_Performance | 54.5 k | 602 | 8701 | 309 | 58.1 | 81.1 |
AGX7_Performance_NoPrelu_NoEltwise | 80.8 k | 1162 | 14859 | 264 | 58.1 | 81.1 |
public/i3d_rgb_tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.6 k | 162 | 113 | 0.16 | 64 | 85 |
A10_FP16_Performance | 79.7 k | 1114 | 1207 | 1.92 | 65 | 85 |
A10_Small_NoSoftmax | 14.9 k | 80 | 121 | 0.17 | 64 | 86 |
A10_Small_Softmax | 16.1 k | 90 | 120 | 0.17 | 65 | 86 |
A10_Generic | 27.3 k | 178 | 165 | 0.32 | 64 | 85 |
A10_Performance | 52.6 k | 602 | 591 | 1.01 | 63 | 87 |
AGX7_FP16_Generic | 29. k | 162 | 218 | 0.30 | 64 | 85 |
AGX7_FP16_Performance | 91.1 k | 1114 | 2403 | 3.82 | 65 | 85 |
AGX7_Small_NoSoftmax | 16.9 k | 80 | 239 | 0.28 | 65 | 86 |
AGX7_Small_Softmax | 18.3 k | 90 | 239 | 0.28 | 65 | 85 |
AGX7_Generic | 30.7 k | 178 | 363 | 0.65 | 65 | 85 |
AGX7_Performance | 54.5 k | 602 | 602 | 0.96 | 65 | 86 |
AGX7_Performance_NoPrelu_NoEltwise | 80.8 k | 1162 | 1920 | 2.35 | 65 | 86 |