Accelerate AI Inference without Sacrificing Accuracy

Accelerate AI Inference without Sacrificing Accuracy

Subscribe Now

Stay in the know on all things CODE. Updates are delivered to your inbox.

Overview

AI inference can often be a slow, memory-crushing process due to the need for precision coupled with model computational complexity.

This session looks at a way to solve these issues using quantization: the process of converting data in FP32 to a smaller precision (like int8) while maintaining accuracy and performance and saving memory bandwidth.

AI software engineers Neo Zhang and Severine Habert introduce the tools and techniques to quantize your AI models easily and quickly, including:

An overview of Intel® Neural Compressor and Intel® Deep Learning Boost
A demonstration showcasing an end-to-end pipeline to train a TensorFlow* model with a small Keras* dataset, followed by speeding it up using quantization
Performance comparisons of FP32 and int8 models by the same script

Get the Software

The Intel Neural Compressor is available as part of the AI Tools—eight tools and frameworks to accelerate end-to-end data science and analytics pipelines.

Jump to:

You May Also Like

AI Tools

Accelerate data science and AI pipelines-from preprocessing through machine learning-and provide interoperability for efficient model development.

Get It Now

You May Also Like

Related Article

Accelerate Compression on Intel® FPGAs

Related Videos

Adaptive Noise Reduction (ANR) Design Using oneAPI on Intel FPGAs

Speed Up FPGA Programming

Streamline FPGA Development with oneAPI Shared Libraries

<link rel="stylesheet" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/commons-page.min.css" type="text/css"><script src="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/commons-page.min.js" defer></script>

<link rel="preload" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/atomVideo.min.css" as="style"><link rel="stylesheet" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/atomVideo.min.css" type="text/css"><script src="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/atomVideo.min.js"></script>

<link rel="preload" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/colorBlock.min.css" as="style"><link rel="stylesheet" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/colorBlock.min.css" type="text/css">

<link rel="preload" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/contact-us.min.css" as="style"><link rel="stylesheet" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/contact-us.min.css" type="text/css">

<script>!function(){var e=setInterval(function(){"undefined"!=typeof $CQ&&($CQ(function(){CQ_Analytics.SegmentMgr.loadSegments("/etc/segmentation"),CQ_Analytics.ClientContextUtils.init("/etc/clientcontext/intel",window.location.pathname.substr(0,window.location.pathname.indexOf(".")))}),clearInterval(e))},100)}();</script>

<link rel="preload" as="style" href="/etc.clientlibs/settings/wcm/designs/intel/us/en/css/resources/css/intel.rwd.override.css"/>
<link rel="stylesheet" type="text/css" href="/etc.clientlibs/settings/wcm/designs/intel/us/en/css/resources/css/intel.rwd.override.css"/>