Intel® Distribution of OpenVINO™ Toolkit Release Notes

ID 780177
Updated 12/18/2023
Version
Public

author-image

By

New and changed in 2023.2  

Summary of major features and improvements   

Summary of major features and improvements   
 

  • More Generative AI coverage and framework integrations to minimize code changes.
     
    • Expanded model support for direct PyTorch model conversion automatically convert additional models directly from PyTorch or execute via torch.compile with OpenVINO as the backend. 
    • New and noteworthy models supported we have enabled models used for chatbots, instruction following, code generation, and many more, including prominent models like LLaVA, chatGLM, Bark (text to audio), and LCM (Latent Consistency Models, an optimized version of Stable Diffusion). 
    • Easier optimization and conversion of Hugging Face models compress LLM models to Int8 with the Hugging Face Optimum command line interface and export models to the OpenVINO IR format. 
    • OpenVINO is now available on Conan – a package manager which enables more seamless package management for large-scale projects for C and  C++ developers.
       
  • Broader Large Language Model (LLM) support and more model compression techniques.
     
    • Accelerate inference for LLM models on Intel® CoreTM  CPU and iGPU with the use of Int8 model weight compression.  
    • Expanded model support for dynamic shapes for improved performance on GPU. 
    • Preview support for Int4 model format is now included. Int4 optimized model weights are now available to try on Intel® Core™ CPU and iGPU, to accelerate models like Llama 2 and chatGLM2. 
    • The following Int4 model compression formats are supported for inference in runtime:  
      • Generative Pre-training Transformer Quantization (GPTQ); with GPTQ-compressed models, you can access them through the Hugging Face repositories.
      • Native Int4 compression through Neural Network Compression Framework (NNCF).
         
  • More portability and performance to run AI at the edge, in the cloud, or locally.
     
    • In 2023.1 we announced full support for ARM architecture, now we have improved performance by enabling FP16 model formats for LLMs and integrating additional acceleration libraries to improve latency. 
    • Intel announced the age of the AI PC with Intel® Core™ Ultra (codename Meteor Lake) featuring the Neural Processing Unit (NPU).  The install guide can be found in the documentation.

Support Change and Deprecation Notices

  • The OpenVINO™ Development Tools package (pip install openvino-dev) is deprecated and will be removed from installation options and distribution channels with 2025.0. To learn more, refer to the OpenVINO Legacy Features and Components page. To ensure optimal performance, install the OpenVINO package (pip install openvino), which includes essential components such as OpenVINO Runtime, OpenVINO Converter, and Benchmark Tool.  
  • Tools:  

    • Deployment Manager is deprecated and will be removed in the 2024.0 release.  
    • Accuracy Checker is deprecated and will be discontinued with 2024.0.    
    • Post-Training Optimization Tool (POT)  is deprecated and will be discontinued with 2024.0.  
    • Model Optimizer is deprecated and will be fully supported up until the 2025.0 release. Model conversion to the OpenVINO IR format should be performed through OpenVINO Model Converter which is part of the PyPI package. Follow the Model Optimizer to OpenVINO Model Converter transition guide for smoother transition. Known limitations are TensorFlow model with TF1 Control flow and object detection models. These limitations relate to the gap in TensorFlow direct conversion capabilities which will be addressed in upcoming releases. 
    • PyTorch 1.13 support is deprecated in Neural Network Compression Framework (NNCF).
  • Runtime:  

    • Intel® Gaussian & Neural Accelerator (Intel® GNA) will be deprecated in a future release. We encourage developers to use the Neural Processing Unit (NPU) for low powered systems like Intel® CoreTM Ultra or 14th generation and beyond.  
    • OpenVINO C++/C/Python 1.0 APIs will be discontinued with 2024.0.  
    • PyTorch 1.13 support is deprecated in Neural Network Compression Framework (NNCF).

OpenVINO™ Development Tools 

  • List of components and their changes: 
    •  OpenVINO Model Converter tool now supports the original framework shape format.
    • Neural Network Compression Framework (NNCF)   
      • Added data-free Int4 weights compression support for LLMs in OpenVINO IR with nncf.compress_weights(). 
      • Improved quantization time of LLMs with NNCF PTQ API for nncf.quantize() and nncf.quantize_with_accuracy_control(). 
      • Added support for SmoothQuant and ChannelAlighnment algorithms in NNCF HyperParameter Tuner for automatic optimization of their hyperparameters during quantization.   
      • Added quantization support for IF operation of models in OpenVINO format to speed up such models.
      • NNCF Post-training Quantization for PyTorch backend is now supported with nncf.quantize() and the common implementation of quantization algorithms.  
      • Added support for PyTorch 2.1. PyTorch 1.13 support has been deprecated.  

OpenVINO™ Runtime (previously known as Inference Engine) 

  • OpenVINO Common 
    • Operations for reference implementations updated from legacy API to API 2.0. 
    • Symbolic transformation introduced the ability to remove Reshape operations surrounding MatMul operations. 
  • OpenVINO Python API  
    • Better support for the ``openvino.properties`` submodule, which now allows the use of properties directly, without additional parenthesis. Example use-case: ``{openvino.properties.cache_dir: "./some_path/"}``. 
    • Added missing properties: ``execution_devices`` and ``loaded_from_cache``. 
    • Improved error propagation on imports from OpenVINO package. 
  • AUTO device plug-in (AUTO)

    • Provided additional option to improve performance of cumulative throughput (or MULTI), where part of CPU resources can be reserved for GPU inference when GPU and CPU are both used for inference (using ov::hint::enable_cpu_pinning(true)). This avoids the performance issue of CPU resource contention where there is not enough CPU resources to schedule tasks for GPU (PR#19214). 
  • CPU  
    • Introduced support of GPTQ quantized Int4 models, with improved performance compared to Int8 weight compressed or FP16 models. In the CPU plugin, the gain in performance is achieved by FullyConnected acceleration with 4bit weight decompression (PR #20607). 
    • Improved performance of Int8 weight-compressed large language models on some platforms, such as 13th Gen Intel Core (PR #20607).  
    • Further reduced memory consumption of select large language models on CPU platforms with AMX and AVX512 ISA, by eliminating extra memory copy with unified weight layout (PR #19575).  
    • Fixed performance issue observed in 2023.1 release on selected Xeon CPU platform with improved thread workload partitioning matching L2 cache utilization (PR #20436). 
    • Extended support of configuration (enable_cpu_pinning) on Windows platforms to allow fine-grain control on CPU resource used for inference workload, by binding inference thread to CPU cores (PR #19418). 
    • Optimized YoloV8n and YoloV8s model performance for BF16/FP32 precision. 
    • Optimized Falcon model on 4th Gen Intel® Xeon® Scalable Processors. 
    • Enabled support for FP16 inference precision on ARM. 
  • GPU
    • Enhanced inference performance for Large Language Models. 
    • Introduced Int8 weight compression to boost LLM performance (PR #19548). 
    • Implemented Int4 GPTQ weight compression for improved LLM performance. 
    • Optimized constant weights for LLMs, resulting in better memory usage and faster model loading. 
    • Optimized gemm (general matrix multiply) and fc (fully connected) for enhanced performance on iGPU (PR #19780). 
    • Completed GPU plugin migration to API 2.0. 
    • Added support for oneDNN 3.3 version.
  • Model Import Updates  
    • TensorFlow Framework Support
      • Supported conversion of models from memory in keras.Model and tf.function formats (PR #19903).  
      • Supported TF 2.14. (PR #20385). 
      • New operations supported.   
  • PyTorch Framework Support
    • Supported Int4 GPTQ models.
    • New operations supported. 

 

  • ONNX Framework Support
    • Added support for ONNX version 1.14.1 (PR #18359). 
    • New operations supported.   

Distribution (where to download the release)  

The OpenVINO product selector tool provides easy access to the right packages that match your desired Operating System, version, and distribution options.   

OpenVINO Ecosystem  

OpenVINO Model Server  

Introduced an extension of the KServe gRPC API, enabling streaming input and output for servables with Mediapipe graphs. This extension ensures the persistence of Mediapipe graphs within a user session, improving processing performance. This enhancement supports stateful graphs, such as tracking algorithms, and enables the use of source calculators. Learn more.

Mediapipe framework has been updated to version 0.10.3.  

model_api used in the OpenVINO inference Mediapipe calculator has been updated and included with all its features. 

Added a demo showcasing gRPC streaming with Mediapipe graph

Added parameters for gRPC quota configuration and changed default gRPC channel arguments to add rate limits. It will minimize the risks of impact of the service from uncontrolled flow of requests. 

Updated python client requirements to support a wider range of Python versions from 3.7 to 3.11.

Learn more about the changes in the Model Server repository.

Jupyter Notebook Tutorials  

The following notebooks have been updated or newly added:    

Added optimization support (8-bit quantization, weights compression) by NNCF for the following notebooks:  

Known Issues  

   

ID  

Description  

Component  

Workaround  

1  

118179  

When input byte sizes are matching, inference methods accept incorrect inputs in copy mode (share_inputs=False). Example: [1, 4, 512, 512] is allowed when [1, 512, 512, 4] is required by the model.  

Python API, Plugins  

Pass inputs which shape and layout match model ones.  

124181 

On CPU platform with L2 cache size less than 256KB, such as i3 series of 8th Gen Intel CORE platforms, some models may hang during model loading.  

CPU plugin 

Rebuild the software from OpenVINO master or use the next OpenVINO release 

121959 

During inference using latency hint on selected hybrid CPU platforms (such as 12th or 13th Gen Intel CORE), there is a sporadic occurrence of increased latency caused by the operating system scheduling of P-cores or E-cores during OpenVINO initialization.  

CPU plugin 

This will be fixed in the next OpenVINO release. 

123101 

Hung up of GPU plugin on A770 Graphics (dGPU) in case of large batch size (1750) 

GPU plugin 

Decrease the batch size, wait for fixed driver released 

System Requirements  

Disclaimer. Certain hardware (including but not limited to GPU, GNA, and latest CPUs) requires manual installation of specific drivers and/or other software components to work correctly and/or to utilize hardware capabilities at their best. This might require updates to operating system, including but not limited to Linux kernel, please refer to their documentation for details. These modifications should be handled by user and are not part of OpenVINO installation.   

Intel CPU processors with corresponding operating systems   

Intel Atom® processor with Intel® SSE4.2 support   

Intel® Pentium® processor N4200/5, N3350/5, N3450/5 with Intel® HD Graphics   

6th - 13th generation Intel® Core™ processors  

Intel® Core™ Ultra (code name Meteor Lake)  

Intel® Xeon® Scalable Processors (code name Skylake)   

2nd Generation Intel® Xeon® Scalable Processors (code name Cascade Lake)   

3rd Generation Intel® Xeon® Scalable Processors (code name Cooper Lake and Ice Lake)   

4th Generation Intel® Xeon® Scalable Processors (code name Sapphire Rapids)   

ARM and ARM64 CPUs; Apple M1, M2 and Raspberry Pi 

Operating Systems:  

  • Ubuntu* 22.04 long-term support (LTS), 64-bit (Kernel 5.15+)  

  • Ubuntu* 20.04 long-term support (LTS), 64-bit (Kernel 5.15+)  

  • Ubuntu* 18.04 long-term support (LTS) with limitations, 64-bit (Kernel 5.4+)  

  • Windows* 10   

  • Windows* 11   

  • macOS* 10.15 and above, 64-bit  

  • macOS 11 and above, ARM64 

  • Red Hat Enterprise Linux* 8, 64-bit  

  • Debian 9 ARM64 and ARM 

  • CentOS 7 64-bit 

Intel® Processor Graphics with corresponding operating systems (GEN Graphics)   

Intel® HD Graphics   

Intel® UHD Graphics   

Intel® Iris® Pro Graphics   

Intel® Iris® Xe Graphics   

Intel® Iris® Xe Max Graphics   

Intel® Arc ™ GPU Series   

Intel® Data Center GPU Flex Series    

Intel® Data Center GPU Max Series    

Operating Systems:  

  • Ubuntu 22.04 long-term support (LTS), 64-bit  

  • Ubuntu 20.04 long-term support (LTS), 64-bit  

  • Windows 10, 64-bit   

  • Windows 11, 64-bit  

  • CentOS 7 

  • Red Hat Enterprise Linux 8, 64-bit  

 

Intel® Neural Processing Unit with corresponding operating systems  

Operating Systems:  

  • Ubuntu 22.04 long-term support (LTS), 64-bit  

  • Windows 11, 64-bit  

NOTES:  

  • This installation requires drivers that are not included in the Intel® Distribution of OpenVINO™ toolkit package.  Users can access the NPU plugin through the OpenVINO archives on the download page

  • A chipset that supports processor graphics is required for Intel® Xeon® processors. Processor graphics are not included in all processors. See  Product Specifications for information about your processor.   

  • Although this release works with Ubuntu 20.04 for discrete graphic cards, Ubuntu 20.04 is not POR for discrete graphics drivers, so OpenVINO support is limited.   

  • The following minimum (i.e., used for old hardware) OpenCL™ driver's versions were used during OpenVINO internal validation:  22.43 for Ubuntu* 22.04, 21.48 for Ubuntu* 20.04 and 21.49 for Red Hat Enterprise Linux* 8.   

Intel® Gaussian & Neural Accelerator   

Operating Systems:  

  • Ubuntu 22.04 long-term support (LTS), 64-bit  

  • Ubuntu 20.04 long-term support (LTS), 64-bit  

  • Windows 10, 64-bit   

  • Windows 11, 64-bit  

Operating systems and developer environment requirements:  

  • Linux OS   

    • Ubuntu 22.04 with Linux kernel 5.15+  
    • Ubuntu 20.04 with Linux kernel 5.15+  
    • Red Hat Enterprise Linux 8 with Linux kernel 5.4  
    • A Linux OS build environment needs these components:   
    • Higher versions of kernel might be required for 10th Gen Intel® Core™ Processors, 11th Gen Intel® Core™ Processors, 11th Gen Intel® Core™ Processors S-Series Processors, 12th Gen Intel® Core™ Processors, 13th Gen Intel® Core™ Processors,  Intel® Core™ Ultra Processors, or 4th Gen Intel® Xeon® Scalable Processors to support CPU, GPU, GNA or hybrid-cores CPU capabilities.  
  • Windows 10 and 11 

  • macOS* 10.15 and above   

  • A macOS build environment requires these components:  

  • DL frameworks versions:  

    • TensorFlow 1.15, 2.12  
    • MxNet 1.9.0 
    • ONNX 1.14.1 
    • PaddlePaddle 2.4  
    • Note: This package can be installed on other versions of DL Framework but only the specified version above are fully validated.  

NOTE: OpenVINO Python binaries and binaries on Windows, CentOS 7, macOS (x86) are built with oneTBB libraries, and others on Ubuntu and RedHat OS systems are built with legacy TBB which is released by OS distribution. OpenVINO supports being built with oneTBB or legacy TBB from source on all above OS systems. System compatibility and performance were improved on Hybrid CPUs like 12th Gen Intel Core and above.  

Included in This Release  

The Intel® Distribution of OpenVINO™ toolkit is available for downloading in three types of operating systems: Windows, Linux, and macOS.   

Component  

License  

Location  

Windows  

Linux  

macOS  

OpenVINO (Inference Engine) C++ Runtime  

Unified API to integrate the inference with application logic  

OpenVINO (Inference Engine) Headers  

Dual licensing:  

Intel® OpenVINO™ Distribution License (Version May 2021)  

Apache 2.0  

<install_root>/runtime/*  

   

   

<install_root>/runtime/include/*  

Yes  

Yes  

Yes  

OpenVINO (Inference Engine) Pythion API  

Apache 2.0  

<install_root>/python/*  

Yes  

Yes  

Yes  

OpenVINO (Inference Engine) Samples  

Samples that illustrate OpenVINO C++/ Python API usage  

Apache 2.0  

<install_root>/samples/*  

Yes  

Yes  

Yes  

[Deprecated] Deployment manager  

The Deployment Manager is a Python* command-line tool that creates a deployment package by assembling the model, IR files, your application, and associated dependencies into a runtime package for your target device.  

Apache 2.0  

<install_root>/tools/deployment_manager/*  

Yes  

Yes  

Yes  

   

Helpful Links  

NOTE: Links open in a new window.  

Home Page  

Featured Documentation  

All Documentation, Guides, and Resources  

Community Forum  

Legal Information   

You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.   

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.   

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.   

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.   

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at http://www.intel.com/ or from the OEM or retailer.   

No computer system can be absolutely secure.   

Intel, Atom, Arria, Core, Movidius, Xeon, OpenVINO, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.   

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos   

Other names and brands may be claimed as the property of others.   

Copyright © 2023, Intel Corporation. All rights reserved.   

For more complete information about compiler optimizations, see our Optimization Notice.   

  

Product and Performance Information 

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex