Intel® Distribution of OpenVINO™ Toolkit Release Notes

ID 780177
Updated 4/9/2024
Version
Public

A newer version of this document is available. Customers should click here to go to the newest version.

author-image

By

New and Changed in 2023.1 

Summary of major features and improvements  

More Generative AI options with Hugging Face and improved PyTorch model support. 

  • NEW: Your PyTorch solutions are now even further enhanced with OpenVINO. You’ve got more options and you no longer need to convert to ONNX for deployment. Developers can now use their API of choice - PyTorch or OpenVINO for added performance benefits. Additionally, users can automatically import and convert PyTorch models for quicker deployment. You can continue to make the most of OpenVINO tools for advanced model compression and deployment advantages, ensuring flexibility and a range of options. 

  • torch.compile (preview) – OpenVINO  is now available as a backend through PyTorch torch.compile, empowering developers to utilize OpenVINO toolkit through PyTorch APIs. This feature has also been integrated into the Automatic1111 Stable Diffusion Web UI, helping developers achieve accelerated performance for Stable Diffusion 1.5 and 2.1 on Intel CPUs and GPUs in both Native Linux and Windows OS platforms. 

  • Optimum Intel – Hugging Face and Intel continue to enhance top generative AI models by optimizing execution, making your models run faster and more efficiently on both CPU and GPU. OpenVINO serves as a runtime for inferencing execution.  New PyTorch auto import and conversion capabilities have been enabled, along with support for weights compression to achieve further performance gains. 

Broader LLM model support and more model compression techniques 

  • Enhanced performance and accessibility for Generative AI:  Runtime performance and memory usage have been significantly optimized, especially for Large Language models (LLMs). Models used for chatbots, instruction following, code generation, and many more, including prominent models like BLOOM, Dolly, Llama 2, GPT-J, GPTNeoX, ChatGLM, and Open-Llama have been enabled. 

  • Improved LLMs on GPU – Model coverage for dynamic shapes support has been expanded, further helping the performance of generative AI workloads on both integrated and discrete GPUs. Furthermore, memory reuse and weight memory consumption for dynamic shapes have been improved.   

  • Neural Network Compression Framework (NNCF) now includes an 8-bit weights compression method, making it easier to compress and optimize LLM models.  SmoothQuant method has been added for more accurate and efficient post-training quantization for Transformer-based models. 

More portability and performance to run ​AI at the edge, in the cloud or locally. 

  • NEW: Support for Intel® Core™ Ultra (codename Meteor Lake). This new generation of Intel CPUs is tailored to excel in AI workloads with a built-in inference accelerators. 

  • Integration with MediaPipe – Developers now have direct access to this framework for building multipurpose AI pipelines. Easily integrate with OpenVINO Runtime and OpenVINO Model Server to enhance performance for faster AI model execution. You also benefit from seamless model management and version control, as well as custom logic integration with additional calculators and graphs for tailored AI solutions. Lastly, you can scale faster by delegating deployment to remote hosts via gRPC/REST interfaces for distributed processing. 

Support Change and Deprecation Notices 

  • OpenVINO™ Development Tools (pip install openvino-dev) are currently being deprecated and will be removed from installation option and distribution channels with 2025.0. 

  • Tools: 
  • Runtime:
    •  Intel® Gaussian & Neural Accelerator (Intel® GNA) is being deprecated, the GNA plugin will be discontinued with 2024.0.  
    • The shared_memory argument for Python API inference methods is deprecated and replaced by a new share_inputs argument. 
    • OpenVINO C++/C/Python 1.0 APIs will be discontinued with 2024.0. 
    • Python 3.7 will be discontinued with 2023.2 release.

OpenVINO™ Development Tools

  • List of components and their changes: 
    • A preview of the new OpenVINO converter tool (OVC) has been introduced.  This tool offers functionality similar to Model Optimizer and is designed to be its lightweight version with the following differences:  
      • Pre-processing options like layout, channel reverse, mean and scale are supposed to be applied through preprocess API and not supported in OVC.
      • The model file is specified without input_model parameter, and the framework is detected automatically. 
    • Conversion API (Model Optimizer)   
      • convert_model Python API is now available in the openvino namespace.
      • Model Optimizer tool generates an Intermediate Representation or IR file with compressed weights by default. --compresss_to_fp16 option can be used to control this behavior. convert_model keeps original weights for generated OpenVINO Model object. 
    • Neural Network Compression Framework (NNCF)
      • Added SmoothQuant method for more accurate Post-training Quantization of Transformer-based models.  
      • Introduced new API nncf.compress_weights() and preliminary support for 8-bit weights compression method for OpenVINO and PyTorch LLMs.
      • Added Hyperparameters Tuning method into Post-training Quantization. When enabled, it automatically finds hyperparameters for the most efficient quantization results. 
      • Extended Post-training Quantization for OpenVINO by ChannelAlignment algorithm for more accurate quantization results. 
      • Extended Post-training Quantization for PyTorch by Fast Bias Correction algorithm for more accurate quantization results. For more details, refer to NNCF Release Notes.  
    • Benchmark Tool enables you to estimate deep-learning inference performance on supported devices for both synchronous and asynchronous modes.  

OpenVINO™ Runtime (previously known as Inference Engine)

  • Overall updates 
    • Proxy & hetero plugins have been migrated to API 2.0, providing enhanced compatibility and stability. 
    • Symbolic shape inference preview is now available, leading to improved performance for LLMs. 
    • OpenVINO's graph representation has been upgraded to opset12, introducing a new set of operations that offer enhanced functionality and optimizations. See the full list of operations here
  • OpenVINO Python API
    • Python Conversion API is now the primary conversion path, making it easier for Python developers to work with OpenVINO. 
    • Python API inference methods (like InferRequest.infer, CompiledModel.__call__) have new parameters called share_inputs and share_outputs which allow to control memory sharing on inputs and outputs of inference. Enabling shared memory modes resulting in a “zero-copy” approach, which reduces memory consumption and compute overhead by reducing the number of copies. 
    • The torchvision.transforms object has been added to OpenVINO pre-processing, which allows user to embed torchvision pre-processing into IR. 
    • All python tools related to OpenVINO have been moved into a single namespace, improving user experience with better API readability. 
  • OpenVINO C API
    • The following C API 1.0 is removed from the 2023.1 release, as communicated in the 2023.0 release: 
      • NV12 and I420 color formats for legacy API  
      • Methods and functions enabling nv12 and i420 blobs creation 
      • InferRequest::SetBatch() method 
      • InferRequest::SetBlob() method which allows to set pre-processing for specific tensor in Infer request
      • Legacy properties: DYN_MATCH_LIMIT and DYN_BATCH_ENABLED 
    • The legacy C API is deprecated and will be removed in the 2024.0 release. Here are the instructions on transitioning to the new C API.  
  • AUTO device plug-in (AUTO)
    • Improved support of configuration through AUTO for the execution hardware devices, such as CPU or GPU, by leveraging ov::device::properties. 
  • Intel® CPU 
    • Enabled weights decompression support for LLMs. This implementation supports avx2 and avx512 HW targets for Intel® Core™ processors for improved latency mode (FP32 VS FP32+INT8 weights comparison). For 4th Generation Intel® Xeon® Scalable Processors (codename Sapphire Rapids) this INT8 decompression feature provides performance improvement, compared to pure BF16 inference. 
    • Improved performance of LLMs, particularly for Transformer-based models, in CPU plugin.  By optimizing memory efficiency for output data between CPU plugin and the inference request, it improves performance of matrix multiplication operator, and other operators like LayerNorm and MVN. 
    • Reduced memory consumption for LLMs, especially for models with 7 billion parameters or above.  
    • Reduced overall model load and compile time, particularly for LLMs.  
    • Added new capability to provide full control for developers to tune the usage of the CPU cores for inference, including P-core, E-core, and hyper-threading cores based on supported CPU platforms.  
  • Intel® GPU 
    • Improved performance of generative AI models. Inference performance of LLMs is significantly improved on both iGPU and dGPU. Stable Diffusion performance is also improved on iGPU. 
    • Dynamic shape support is expanded to convolutional models by enabling more operators to support dynamism. Performance of dynamic shapes has been also improved for NLP models. 
    • Improved performance for StableDiffusion models. 
    • Performance of transformer models such as Bert-like models and vision transformer models is improved on dGPU with integration of oneDNN 3.2. 
  • Intel® Gaussian & Neural Accelerator (Intel® GNA) 
    • Introduced support for GNA 3.5, new version of GNA HW included in Intel® CoreTM Ultra (codename Meteor Lake). 

    • C++ and Python automatic speech recognition samples are being deprecated and will be removed with 2024.0.  

    • Optimized data pre-processing using AVX instructions on CPU has improved the runtime performance. 

    • Introduced support of automatic layout conversion which allows to support for more TensorFlow models out of the box.  

  • Model Import Updates  
    • TensorFlow Framework Support
      • Added support for Switch/Merge operations, bringing TensorFlow 1.x control flow support closer to full compatibility and enabling more models. 
      • Added support for the TensorFlow 1 Checkpoint format. All native TensorFlow formats are now enabled. 
      • Added support for 12 new operations:  
        • UnsortedSegmentSum 

        • FakeQuantWithMinMaxArgs 

        • MaxPoolWithArgmax 

        • UnravelIndex 

        • AdjustContrastv2 

        • InvertPermutation 

        • CheckNumerics 

        • DivNoNan 

        • EnsureShape 

        • ShapeN 

        • Switch 

        • Merge

    • PyTorch Framework Support
      • OpenVINO now supports originally quantized PyTorch models, including models produced with the Neural Network Compression Framework (NNCF). 
      • Added support for inplace operations on aliases of tensors improving accuracy for detection models. 
      • Added support for 43 new operations. To learn about PyTorch model conversion, follow this link.   
        • aten::concat 

        • aten::masked_scatter 

        • aten::linspace 

        • aten::view_as 

        • aten::std 

        • aten::outer 

        • aten::broadcast_to 

        • aten::allaten::embedding_bag 

        • aten::argmax 

        • aten::argmin 

        • aten::unflatten 

        • aten::item 

        • aten::frobenius_norm 

        • aten::__range_length 

        • aten::__derive_index 

        • aten::cdist 

        • aten::pairwise_distance 

        • aten::squeeze 

        • aten::LogSoftmax 

        • aten::_native_multi_head_attention 

        • aten::_shape_as_tensor 

        • aten::t 

        • aten::fft_rfftn 

        • aten::fft_irfft 

        • aten::padaten::reflection_pad2d 

        • aten::fake_quantize_per_tensor_affine 

        • aten::fake_quantize_per_channel_affine 

        • aten::scatter 

        • aten::quantize_per_tensor 

        • aten::quantize_per_channel 

        • aten::dequantize 

        • aten::quantize_per_channel 

        • aten::rand 

        • aten::randn 

        • aten::rand_like 

        • aten::randn_like 

        • aten::broadcast_to 

        • aten::_upsample_bilinear2d_aa 

        • aten::_upsample_bicubic2d_aa 

        • aten::randint 

        • aten::index_put_ 

        • aten::tensor_split 

Distribution (where to download the release) 

The OpenVINO product selector tool (available at www.openvino.ai) provides easy access to the right packages that match your desired needs; OS, version, and distribution options.  

OpenVINO Ecosystem 

OpenVINO Model Server 

OpenVINO Model Server (OVMS) is a solution for serving models. The tool uses the same API endpoints as TensorFlow Serving and KServe while leveraging OpenVINO for inference execution. See the full release notes on GitHub.   

Learn more about the changes in https://github.com/openvinotoolkit/model_server/releases 

Open Model Zoo 

The following public models are deprecated and will be removed in the 2023.2 release: 

  • All Caffe models  
    • alexnet 

    • caffenet 

    • densenet-121 

    • face-detection-retail-0044 

    • googlenet-v1 

    • googlenet-v2 

    • mobilenet-ssd 

    • mobilenet-v1-1.0-224 

    • mobilenet-v2 

    • mtcnn 

    • pelee-coco 

    • se-inception 

    • se-resnet-50 

    • se-resnext-50 

    • shufflenet-v2-x0.5 

    • Sphereface 

    • squeezenet1.0 

    • squeezenet1.1 

    • ssd300 

    • ssd512 

    • vgg16 

    • vgg19 

  • All MXNet models  
    • brain-tumor-segmentation-0001 

    • mobilefacedet-v1-mxnet 

    • octave-resnet-26-0.25 

  • All Paddle models  
    • mobilenet-v3-large-1.0-224-paddle 

    • mobilenet-v3-small-1.0-224-paddle 

    • ocrnet-hrnet-w48-paddle 

  • DeblurGAN-v2

Jupyter Notebook Tutorials 

Since the 2023.0 release, the following new notebooks have been added:   

Added tutorials for 8-bit quantization support for the following notebooks: 

Known Issues 

  

Jira ID 

Description 

Component 

Workaround 

118179 

When inputs byte sizes are matching, inference methods accept incorrect inputs in copy mode (share_inputs=False). Example: [1, 4, 512, 512] is allowed when [1, 512, 512, 4] is required by the model. 

Python API, Plugins 

Pass inputs which shape and layout match model ones. 

119142 

Reading TensorFlow model directly from memory using convert_model causes unpredicted behavior such as inference results mismatch, wrong output names and degraded performance 

Conversion API, 
TensorFlow FE 

Serialize TensorFlow model to the file and only then pass it to convert_model 

 

System Requirements 

Disclaimer. Certain hardware (including but not limited to GPU, GNA, and latest CPUs) requires manual installation of specific drivers and/or other software components to work correctly and/or to utilize hardware capabilities at their best. This might require updates to operating system, including but not limited to Linux kernel, please refer to their documentation for details. These modifications should be handled by user and are not part of OpenVINO installation.  

Intel CPU processors with corresponding operating systems  

Intel Atom® processor with Intel® SSE4.2 support  

Intel® Pentium® processor N4200/5, N3350/5, N3450/5 with Intel® HD Graphics  

6th - 13th generation Intel® Core™ processors 

Intel® Core™ Ultra (codename Meteor Lake) 

Intel® Xeon® Scalable Processors (code name Skylake)  

2nd Generation Intel® Xeon® Scalable Processors (code name Cascade Lake)  

3rd Generation Intel® Xeon® Scalable Processors (code name Cooper Lake and Ice Lake)  

4th Generation Intel® Xeon® Scalable Processors (code name Sapphire Rapids)  

Operating Systems: 

  • Ubuntu 22.04 long-term support (LTS), 64-bit (Kernel 5.15+) 

  • Ubuntu 20.04 long-term support (LTS), 64-bit (Kernel 5.15+) 

  • Ubuntu 18.04 long-term support (LTS) with limitations, 64-bit (Kernel 5.4+) 

  • Windows* 10  

  • Windows* 11  

  • macOS* 10.15 and above, 64-bit  

  • Red Hat Enterprise Linux* 8, 64-bit 

  • CentOS 7 

Intel® Processor Graphics with corresponding operating systems (GEN Graphics)  

Intel® HD Graphics  

Intel® UHD Graphics  

Intel® Iris® Pro Graphics  

Intel® Iris® Xe Graphics  

Intel® Iris® Xe Max Graphics  

Intel® Arc ™ GPU Series  

Intel® Data Center GPU Flex Series   

Operating Systems: 

  • Ubuntu* 22.04 long-term support (LTS), 64-bit 

  • Ubuntu* 20.04 long-term support (LTS), 64-bit 

  • Windows* 10, 64-bit  

  • Windows* 11, 64-bit 

  • Red Hat Enterprise Linux* 8, 64-bit 

NOTES: 

  • This installation requires drivers that are not included in the Intel® Distribution of OpenVINO™ toolkit package.  

  • A chipset that supports processor graphics is required for Intel® Xeon® processors. Processor graphics are not included in all processors. See  Product Specifications for information about your processor.  

  • Although this release works with Ubuntu 20.04 for discrete graphic cards, Ubuntu 20.04 is not POR for discrete graphics drivers, so OpenVINO support is limited.  

  • The following minimum (i.e., used for old hardware) OpenCL™ driver's versions were used during OpenVINO internal validation:  22.43 for Ubuntu* 22.04, 21.48 for Ubuntu* 20.04 and 21.49 for Red Hat Enterprise Linux* 8.  

Intel® Gaussian & Neural Accelerator  

Operating Systems: 

  • Ubuntu* 22.04 long-term support (LTS), 64-bit 

  • Ubuntu* 20.04 long-term support (LTS), 64-bit 

  • Windows* 10, 64-bit  

  • Windows* 11, 64-bit 

Operating system's and developer's environment requirements: 

  • Linux* OS  
    • Ubuntu 22.04 with Linux kernel 5.15+ 

    • Ubuntu 20.04 with Linux kernel 5.15+ 

    • RHEL 8 with Linux kernel 5.4 

    • A Linux OS build environment needs these components:  

    • Higher versions of kernel might be required for 10th Gen Intel® Core™ Processors, 11th Gen Intel® Core™ Processors, 11th Gen Intel® Core™ Processors S-Series Processors, 12th Gen Intel® Core™ Processors, 13th Gen Intel® Core™ Processors,  Intel® Core™ Ultra Processors, or 4th Gen Intel® Xeon® Scalable Processors to support CPU, GPU, GNA or hybrid-cores CPU capabilities. 

  • Windows* 10 and 11

  • macOS* 10.15 and above  

  • DL frameworks versions: 

    • TensorFlow* 1.15, 2.12 

    • MxNet* 1.9 

    • ONNX* 1.14 

    • PaddlePaddle* 2.4 

    • Note: This package can be installed on other versions of DL Framework but only the specified version above are fully validated. 

NOTE: OpenVINO Python binaries and binaries on Windows, CentOS 7, macOS (x86) are built with oneTBB libraries, and others on Ubuntu and RedHat OS systems are built with legacy TBB which is released by OS distribution. OpenVINO supports being built with oneTBB or legacy TBB from source on all above OS systems. System compatibility and performance were improved on Hybrid CPUs like 12th Gen Intel Core and above. 

Included in This Release 

The Intel® Distribution of OpenVINO™ toolkit is available for downloading in three types of operating systems: Windows*, Linux*, and macOS*.  

Component 

License 

Location 

Windows 

Linux 

macOS 

OpenVINO (Inference Engine) C++ Runtime 

Unified API to integrate the inference with application logic 

OpenVINO (Inference Engine) Headers 

Dual licensing: 

Intel® OpenVINO™ Distribution License (Version May 2021) 

Apache 2.0 

<install_root>/runtime/* 

  

  

<install_root>/runtime/include/* 

Yes 

Yes 

Yes 

OpenVINO (Inference Engine) Pythion API 

Apache 2.0 

<install_root>/python/* 

Yes 

Yes 

Yes 

OpenVINO (Inference Engine) Samples 

Samples that illustrate OpenVINO C++/ Python API usage 

Apache 2.0 

<install_root>/samples/* 

Yes 

Yes 

Yes 

Deployment manager 

The Deployment Manager is a Python* command-line tool that creates a deployment package by assembling the model, IR files, your application, and associated dependencies into a runtime package for your target device. 

Apache 2.0 

<install_root>/tools/deployment_manager/* 

Yes 

Yes 

Yes 

  

Helpful Links 

NOTE: Links open in a new window. 

Home Page 

Featured Documentation 

All Documentation, Guides, and Resources 

Community Forum 

Legal Information  

You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.  

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.  

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.  

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.  

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at http://www.intel.com/ or from the OEM or retailer.  

No computer system can be absolutely secure.  

Intel, Atom, Arria, Core, Movidius, Xeon, OpenVINO, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.  

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos  

*Other names and brands may be claimed as the property of others.  

Copyright © 2023, Intel Corporation. All rights reserved.  

For more complete information about compiler optimizations, see our Optimization Notice.