2.2. Model Performance
- Intel® Arria® 10: 265 MHz
- Intel Agilex® 7: 400 Hz
The performance results for the designs that follow were achieved using the dla_build_example_design.py script that is included with the Intel® FPGA AI Suite. The script uses a standard (-2) speed bin with a single seed and uses high-effort compiler settings.
- Intel® Arria® 10 runtime host: CentOS7 host on an Intel® Xeon® processor E5-1650 @ 3.6 GHz
- Intel Agilex® 7 runtime host: SLES12 host on an Intel® Xeon® processor E5-1650 @ 3.5 GHz.
set_global_assignment -name ALLOW_SHIFT_REGISTER_MERGING_ACROSS_HIERARCHIES ALWAYS set_global_assignment -name DISABLE_REGISTER_MERGING_ACROSS_HIERARCHIES OFF
The architectures in the tables that follow are in the $COREDLA_ROOT/example_architectures/ directory. Review the README file in that directory for information about each architecture.
Details - Intel FPGA AI Suite V2023.3
Architecture | fMAX | ALMs | DSPs | M20Ks | Registers |
---|---|---|---|---|---|
A10_FP16_Generic | 324 MHz | 26. k | 162 | 491 | 68 k |
A10_FP16_Performance | 276 MHz | 80.7 k | 1114 | 1469 | 244 k |
A10_Small_NoSoftmax | 346 MHz | 14.7 k | 80 | 247 | 40 k |
A10_Small_Softmax | 347 MHz | 16.1 k | 90 | 255 | 43 k |
A10_Generic | 298 MHz | 28.7 k | 178 | 610 | 75 k |
A10_Performance | 301 MHz | 54.1 k | 602 | 935 | 161 k |
AGX7_FP16_Generic | 600 MHz | 29.3 k | 162 | 510 | 96 k |
AGX7_FP16_Performance | 600 MHz | 94.3 k | 1114 | 1531 | 314 k |
AGX7_Small_NoSoftmax | 616 MHz | 16.6 k | 80 | 307 | 54 k |
AGX7_Small_Softmax | 618 MHz | 17.9 k | 90 | 315 | 64 k |
AGX7_Generic | 610 MHz | 32.6 k | 178 | 776 | 117 k |
AGX7_Performance | 568 MHz | 56.5 k | 602 | 1277 | 194 k |
AGX7_Performance_NoPrelu_NoEltwise | 585 MHz | 87.4 k | 1162 | 2795 | 298 k |
public/mobilenet-v1-1.0-224
Architecture | ALMs | DSPs | DDR 1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 26. k | 162 | 1237 | 93 | 71.2 | 89.5 |
A10_FP16_Performance | 80.7 k | 1114 | 4661 | 288 | 71.2 | 89.5 |
A10_Small_NoSoftmax | 14.7 k | 80 | 1145 | 98 | 69.8 | 89.1 |
A10_Small_Softmax | 16.1 k | 90 | 1153 | 98 | 69.6 | 89.1 |
A10_Generic | 28.7 k | 178 | 1212 | 128 | 69.6 | 89.1 |
A10_Performance | 54.1 k | 602 | 2881 | 322 | 70.0 | 89.0 |
AGX7_FP16_Generic | 29.3 k | 162 | 2242 | 169 | 71.2 | 89.5 |
AGX7_FP16_Performance | 94.3 k | 1114 | 8954 | 554 | 71.2 | 89.5 |
AGX7_Small_NoSoftmax | 16.6 k | 80 | 2789 | 169 | 70.9 | 89.6 |
AGX7_Small_Softmax | 17.9 k | 90 | 2809 | 169 | 70.9 | 89.5 |
AGX7_Generic | 32.6 k | 178 | 4072 | 241 | 70.9 | 89.5 |
AGX7_Performance | 56.5 k | 602 | 6211 | 391 | 70.9 | 89.5 |
AGX7_Performance_NoPrelu_NoEltwise | 87.4 k | 1162 | 11156 | 469 | 70.9 | 89.5 |
public/mobilenet-v2
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 26. k | 162 | 2081 | 84 | 71.8 | 89.6 |
A10_FP16_Performance | 80.7 k | 1114 | 3644 | 189 | 71.8 | 89.6 |
A10_Small_NoSoftmax | 14.7 k | 80 | 2415 | 86 | 70.1 | 88.6 |
A10_Small_Softmax | 16.1 k | 90 | 2442 | 86 | 70.0 | 88.6 |
A10_Generic | 28.7 k | 178 | 1041 | 104 | 70.0 | 88.6 |
A10_Performance | 54.1 k | 602 | 2316 | 212 | 69.6 | 88.3 |
AGX7_FP16_Generic | 29.3 k | 162 | 3613 | 146 | 71.8 | 89.6 |
AGX7_FP16_Performance | 94.3 k | 1114 | 7100 | 369 | 71.8 | 89.6 |
AGX7_Small_NoSoftmax | 16.6 k | 80 | 4535 | 139 | 71.6 | 89.6 |
AGX7_Small_Softmax | 17.9 k | 90 | 4565 | 140 | 71.8 | 89.4 |
AGX7_Generic | 32.6 k | 178 | 3319 | 192 | 71.8 | 89.4 |
AGX7_Performance | 56.5 k | 602 | 5780 | 271 | 71.7 | 89.4 |
AGX7_Performance_NoPrelu_NoEltwise | 87.4 k | 1162 | 10036 | 279 | 71.7 | 89.4 |
public/mobilenet-v2-1.4-224
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 26. k | 162 | 2290 | 68 | 74.8 | 91.9 |
A10_FP16_Performance | 80.7 k | 1114 | 5024 | 161 | 74.8 | 91.9 |
A10_Generic | 28.7 k | 178 | 1711 | 81 | 73.2 | 90.9 |
A10_Performance | 54.1 k | 602 | 3294 | 182 | 72.4 | 90.3 |
AGX7_FP16_Generic | 29.3 k | 162 | 4032 | 119 | 74.8 | 91.9 |
AGX7_FP16_Performance | 94.3 k | 1114 | 8969 | 288 | 74.8 | 91.9 |
AGX7_Generic | 32.6 k | 178 | 4509 | 141 | 74.7 | 91.8 |
AGX7_Performance | 56.5 k | 602 | 7312 | 236 | 74.7 | 91.8 |
AGX7_Performance_NoPrelu_NoEltwise | 87.4 k | 1162 | 11641 | 249 | 74.7 | 91.8 |
public/mobilenet-v3-large-1.0-224-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 26. k | 162 | 2139 | 83 | 75.8 | 92.1 |
A10_FP16_Performance | 80.7 k | 1114 | 12734 | 28 | 75.8 | 92.1 |
AGX7_FP16_Generic | 29.3 k | 162 | 3699 | 143 | 75.8 | 92.1 |
AGX7_FP16_Performance | 94.3 k | 1114 | 17615 | 39 | 75.8 | 92.1 |
AGX7_Generic | 32.6 k | 178 | 5095 | 135 | 72.3 | 90.7 |
AGX7_Performance | 56.5 k | 602 | 17561 | 39 | 72.3 | 90.5 |
public/resnet-50-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 26. k | 162 | 1650 | 17 | 76.8 | 92.9 |
A10_FP16_Performance | 80.7 k | 1114 | 6557 | 92 | 76.8 | 92.9 |
A10_Small_NoSoftmax | 14.7 k | 80 | 2030 | 17 | 76.6 | 92.7 |
A10_Small_Softmax | 16.1 k | 90 | 2037 | 17 | 76.4 | 92.6 |
A10_Generic | 28.7 k | 178 | 1418 | 31 | 76.4 | 92.6 |
A10_Performance | 54.1 k | 602 | 4650 | 104 | 76.5 | 92.7 |
AGX7_FP16_Generic | 29.3 k | 162 | 3003 | 32 | 76.8 | 92.9 |
AGX7_FP16_Performance | 94.3 k | 1114 | 11546 | 163 | 76.8 | 92.9 |
AGX7_Small_NoSoftmax | 16.6 k | 80 | 5983 | 28 | 77.0 | 92.9 |
AGX7_Small_Softmax | 17.9 k | 90 | 6001 | 28 | 77.1 | 92.9 |
AGX7_Generic | 32.6 k | 178 | 4452 | 60 | 77.1 | 92.9 |
AGX7_Performance | 56.5 k | 602 | 10072 | 142 | 76.9 | 92.9 |
AGX7_Performance_NoPrelu_NoEltwise | 87.4 k | 1162 | 13490 | 205 | 76.9 | 92.9 |
Resnet50 v1 (Caffe)
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 26. k | 162 | 1549 | 21 | 74.4 | 91.4 |
A10_FP16_Performance | 80.7 k | 1114 | 6958 | 111 | 74.4 | 91.4 |
A10_Small_NoSoftmax | 14.7 k | 80 | 1423 | 21 | 73.8 | 91.2 |
A10_Small_Softmax | 16.1 k | 90 | 1428 | 21 | 73.9 | 91.0 |
A10_Generic | 28.7 k | 178 | 1434 | 37 | 73.9 | 91.0 |
A10_Performance | 54.1 k | 602 | 4736 | 127 | 73.9 | 91.1 |
AGX7_FP16_Generic | 29.3 k | 162 | 2822 | 38 | 74.4 | 91.4 |
AGX7_FP16_Performance | 94.3 k | 1114 | 11937 | 191 | 74.4 | 91.4 |
AGX7_Small_NoSoftmax | 16.6 k | 80 | 4197 | 37 | 74.1 | 91.4 |
AGX7_Small_Softmax | 17.9 k | 90 | 4213 | 37 | 74.2 | 91.3 |
AGX7_Generic | 32.6 k | 178 | 4775 | 73 | 74.2 | 91.3 |
AGX7_Performance | 56.5 k | 602 | 10361 | 166 | 74.0 | 91.4 |
AGX7_Performance_NoPrelu_NoEltwise | 87.4 k | 1162 | 14296 | 228 | 74.0 | 91.4 |
intel/unet-camvid-onnx-0001
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
---|---|---|---|---|
A10_FP16_Generic | 26. k | 162 | 455 | 0.58 |
A10_FP16_Performance | 80.7 k | 1114 | 2113 | 3.51 |
AGX7_FP16_Generic | 29.3 k | 162 | 812 | 1.07 |
AGX7_FP16_Performance | 94.3 k | 1114 | 4307 | 7.16 |
AGX7_Small_NoSoftmax | 16.6 k | 80 | 1136 | 1.10 |
AGX7_Small_Softmax | 17.9 k | 90 | 1139 | 1.10 |
AGX7_Generic | 32.6 k | 178 | 1298 | 2.11 |
AGX7_Performance | 56.5 k | 602 | 3670 | 6.24 |
AGX7_Performance_NoPrelu_NoEltwise | 87.4 k | 1162 | 5908 | 8.22 |
public/yolo-v3-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
COCO AP | mAP |
---|---|---|---|---|---|---|
A10_FP16_Generic | 26. k | 162 | 758 | 2.3 | 31.58 | 62.27 |
A10_FP16_Performance | 80.7 k | 1114 | 3026 | 13.3 | 31.58 | 62.25 |
A10_Generic | 28.7 k | 178 | 648 | 4.0 | 31.26 | 62.07 |
A10_Performance | 54.1 k | 602 | 1910 | 12.4 | 31.32 | 62.25 |
AGX7_FP16_Generic | 29.3 k | 162 | 1391 | 4.1 | 31.58 | 62.27 |
AGX7_FP16_Performance | 94.3 k | 1114 | 6248 | 27.5 | 31.58 | 62.25 |
AGX7_Generic | 32.6 k | 178 | 1842 | 8.0 | 31.49 | 62.28 |
AGX7_Performance | 56.5 k | 602 | 2640 | 11.6 | 31.47 | 62.22 |
public/yolo-v3-tiny-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
COCO AP | mAP |
---|---|---|---|---|---|---|
A10_FP16_Generic | 26. k | 162 | 576 | 20 | 14.77 | 35.79 |
A10_FP16_Performance | 80.7 k | 1114 | 2244 | 56 | 14.78 | 35.81 |
A10_Generic | 28.7 k | 178 | 766 | 36 | 14.78 | 35.76 |
A10_Performance | 54.1 k | 602 | 1512 | 48 | 14.70 | 35.71 |
AGX7_FP16_Generic | 29.3 k | 162 | 1074 | 37 | 14.77 | 35.79 |
AGX7_FP16_Performance | 94.3 k | 1114 | 4539 | 113 | 14.78 | 35.81 |
AGX7_Generic | 32.6 k | 178 | 2007 | 68 | 14.74 | 35.76 |
AGX7_Performance | 56.5 k | 602 | 1570 | 39 | 14.72 | 35.73 |
public/squeezenet1.1
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 26. k | 162 | 1034 | 116 | 58.5 | 81.1 |
A10_FP16_Performance | 80.7 k | 1114 | 7827 | 278 | 58.5 | 81.1 |
A10_Small_NoSoftmax | 14.7 k | 80 | 742 | 125 | 58.9 | 80.9 |
A10_Small_Softmax | 16.1 k | 90 | 749 | 126 | 58.1 | 81.1 |
A10_Generic | 28.7 k | 178 | 12149 | 62 | 58.1 | 81.1 |
A10_Performance | 54.1 k | 602 | 5432 | 374 | 58.7 | 81.1 |
AGX7_FP16_Generic | 29.3 k | 162 | 1861 | 209 | 58.5 | 81.1 |
AGX7_FP16_Performance | 94.3 k | 1114 | 12364 | 439 | 58.5 | 81.1 |
AGX7_Small_NoSoftmax | 16.6 k | 80 | 2143 | 211 | 58.5 | 81.0 |
AGX7_Small_Softmax | 17.9 k | 90 | 2165 | 212 | 58.5 | 81.0 |
AGX7_Generic | 32.6 k | 178 | 17942 | 46 | 58.5 | 81.0 |
AGX7_Performance | 56.5 k | 602 | 9063 | 321 | 58.4 | 81.0 |
AGX7_Performance_NoPrelu_NoEltwise | 87.4 k | 1162 | 14917 | 265 | 58.4 | 81.0 |
public/i3d_rgb_tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 26. k | 162 | 231 | 0.33 | 65.79 | 82.89 |
A10_FP16_Performance | 80.7 k | 1114 | 1193 | 1.89 | 65.79 | 82.89 |
A10_Small_NoSoftmax | 14.7 k | 80 | 235 | 0.32 | 65.57 | 83.99 |
A10_Small_Softmax | 16.1 k | 90 | 236 | 0.33 | 66.01 | 83.55 |
A10_Generic | 28.7 k | 178 | 347 | 0.67 | 66.23 | 83.11 |
A10_Performance | 54.1 k | 602 | 1200 | 2.05 | 66.67 | 83.77 |
AGX7_FP16_Generic | 29.3 k | 162 | 438 | 0.60 | 65.79 | 82.89 |
AGX7_FP16_Performance | 94.3 k | 1114 | 2434 | 3.87 | 65.79 | 82.89 |
AGX7_Small_NoSoftmax | 16.6 k | 80 | 491 | 0.58 | 65.35 | 82.89 |
AGX7_Small_Softmax | 17.9 k | 90 | 493 | 0.58 | 65.57 | 83.11 |
AGX7_Generic | 32.6 k | 178 | 738 | 1.33 | 65.57 | 83.11 |
AGX7_Performance | 56.5 k | 602 | 2320 | 3.69 | 65.13 | 83.11 |
AGX7_Performance_NoPrelu_NoEltwise | 87.4 k | 1162 | 3857 | 4.73 | 65.13 | 83.11 |