Deep-learning deployment on the edge for real-time inference can significantly reduce the cost of communicating with the cloud in terms of network bandwidth, network latency, and power consumption.
But, edge devices also have limited memory, compute, and power. As a result, using the traditional 32 bits of floating-point precision is often too computationally heavy for embedded deep learning inference workloads.
The Intel® Distribution of OpenVINO™ toolkit offers a solution via int8 quantization—deep learning inference with 8-bit multipliers.
Join deep-learning expert Alex Kozlov for a closer look at achieving better performance with less overhead on Intel® CPUs, GPUs, and VPUs using the latest int8 calibration tool and runtime in the Intel Distribution of OpenVINO toolkit. He covers:
- New features such as asymmetric quantization, bias correction, and weight equalization to improve quality of inference workloads and lower precision
- How to make best use of enhanced capabilities in the Intel Distribution of OpenVINO toolkit for your AI applications
- Using int8 to accelerate computation performance, save memory bandwidth and power, and provide better cache locality
Get the Software
Download the latest version of the Intel® Distribution of OpenVINO™ toolkit so you can follow along during the webinar.
- Webinar Slides
- Introducing Int8 Quantization for Fast CPU Inference Using the OpenVINO Toolkit
- Using Low-Precision, 8-bit Integer Inference
- Inference Flow with the Intel Distribution of OpenVINO Toolkit
- OpenVINO Toolkit: Example of an Int8 Full Inference Flow
Machine-learning and deep-learning R&D engineer, Intel Corporation
Alexander has expertise in deep-learning object detection architectures, human action recognition approaches, and neural network compression techniques. Before Intel, he was a senior software engineer and researcher at Itseez* (now acquired by Intel) where he worked on computer-vision algorithms for advanced drive-assistance systems (ADAS). Now Alexander focuses on deep learning neural network compression methods and tools that allow getting more lightweight and hardware-friendly models. Alex holds a master’s degree from the University of Nizhny Novgorod.