Deploying facial recognition solutions often faces two bottlenecks: network bandwidth and computing capabilities, that negatively impact deep learning inference throughput and latency, thereby resulting in less than optimal user experiences. Additionally, achieving improvement in throughput often requires scaling out, resulting in added deployment costs & complexity.
For facial recognition, a customized Resnet32 model was optimized on Intel Caffe. Compared to FP32, Intel® DL Boost (delivered by Vector Neural Network Instructions (VNNI)/INT8) optimizations helped increase deep learning inference throughput by 2.18X (see chart)12, while meeting the partner-specified latency of less than 10ms.
Significantly improved deep learning inference throughput, without impacting latency, thus delivering better user experience.