The second generation of Intel® Xeon® Scalable processors introduced a collection of features for deep learning, packaged together as Intel® Deep Learning Boost. These features include Vector Neural Network Instructions (VNNI), which increases throughput for inference applications with support for INT8 convolutions by combining multiple machine instructions from previous generations into one machine instruction.
Most deep learning models are built using 32 bits floating-point precision (FP32). Quantization is the process to represent the model using less memory with minimal accuracy loss. In this context, the main focus is the representation in INT8.