Deploying facial recognition solutions in bank, security government or police station faces two bottlenecks: network bandwidth and computing capabilities, that negatively impact deep learning inference throughput and latency, thereby resulting in less than optimal user experiences.
For facial recognition, a customized Resnet50 (FP32 and INT8) model was optimized on Intel Caffe. Compared to FP32, Intel® DL Boost (delivered by Vector Neural Network Instructions (VNNI)/INT8) optimizations helped achieve a speedup of 3.3X in inference latency, for the same batch size and same instance (see chart)123. The application also meets Cloudwalk’s desired accuracy requirement (of accuracy loss being less than 0.03%).
Significantly reduced deep learning inference latency, delivering better user experience – Cloudwalk customers will benefit from improved performance (i.e., lower latency), while maintaining SLAs for accuracy loss.