Intel® Distribution of OpenVINO™ Toolkit

753640
9/18/2023

Introduction

This package contains the Intel® Distribution of OpenVINO™ Toolkit software version 2023.1 for Linux*, Windows* and macOS*.

Available Downloads

  • CentOS 7 (1908)*
  • Size: 49.7 MB
  • SHA1: 7D73F7BED5FF8C99646730D07D4C9A12D08B1DBA
  • Debian Linux*
  • Size: 33.5 MB
  • SHA1: A703CA79608AFE74986083B1841E53DEF0C2303A
  • Debian Linux*
  • Size: 27.3 MB
  • SHA1: C62CB77B0B302B0F292ED6D2F59C668463E97EC4
  • Red Hat Enterprise Linux 8*
  • Size: 45.7 MB
  • SHA1: C942DC207E210533BE78FA59730E101B8CB237BC
  • Ubuntu 18.04 LTS*
  • Size: 45.6 MB
  • SHA1: CDC66D9CAB268DCC77472A0AC67741146C7383EF
  • Ubuntu 20.04 LTS*
  • Size: 49.2 MB
  • SHA1: 7210DC0DAE282F453A961B89D50E5999BB0E07B7
  • Ubuntu 22.04 LTS*
  • Size: 49.9 MB
  • SHA1: 5E7BB817016EDE156FB142A1FBDA69B7D49A0500
  • macOS*
  • Size: 125.7 MB
  • SHA1: 89FF006302CDBF7B368C935FA5C66705449C9E76
  • macOS*
  • Size: 35.2 MB
  • SHA1: 3539701E406006DD3193FCC752C8888E90BEBB94
  • Windows 11*, Windows 10*
  • Size: 98.9 MB
  • SHA1: 266EC9DCCCD1B3ECAEB02ECE50766A1F1B9B57F9

Detailed Description

Introduction

The Intel® Distribution of OpenVINO™ toolkit is an open-source solution for optimizing and deploying AI inference, in domains such as computer vision, automatic speech recognition, natural language processing, recommendation systems, and now generative AI with the release of 2023.1. With its plug-in architecture, OpenVINO allows developers to write once and deploy anywhere. We are proud to announce the release of OpenVINO 2023.1 introducing a range of new features, improvements, and deprecations aimed at enhancing the developer experience.

The OpenVINO™ toolkit: 

  • Enables the use of models trained with popular frameworks, such as TensorFlow* and PyTorch*. 
  • Optimizes inference of deep learning models by applying model retraining or fine-tuning, like post-training quantization. 
  • Supports heterogeneous execution across Intel hardware, using a common API for the Intel CPU, Intel® Integrated Graphics, Intel® Discrete Graphics, and other commonly used accelerators. 

New and Changed in 2023.1 

Summary of major features and improvements 

More Generative AI options with Hugging Face and improved PyTorch model support. 

  • NEW: Your PyTorch solutions are now even further enhanced with OpenVINO. You’ve got more options and you no longer need to convert to ONNX for deployment. Developers can now use their API of choice - PyTorch or OpenVINO for added performance benefits. Additionally, users can automatically import and convert PyTorch models for quicker deployment. You can continue to make the most of OpenVINO tools for advanced model compression and deployment advantages, ensuring flexibility and a range of options. 
  • torch.compile (preview) – OpenVINO is now available as a backend through PyTorch torch.compile, empowering developers to utilize OpenVINO toolkit through PyTorch APIs. This feature has also been integrated into the Automatic1111 Stable Diffusion Web UI, helping developers achieve accelerated performance for Stable Diffusion 1.5 and 2.1 on Intel CPUs and GPUs in both Native Linux* and Windows* OS platforms. 
  • Optimum Intel – Hugging Face* and Intel continue to enhance top generative AI models by optimizing execution, making your models run faster and more efficiently on both CPU and GPU. OpenVINO serves as a runtime for inferencing execution. New PyTorch auto import and conversion capabilities have been enabled, along with support for weights compression to achieve further performance gains. 

Broader LLM model support and more model compression techniques 

  • Enhanced performance and accessibility for Generative AI:  Runtime performance and memory usage have been significantly optimized, especially for Large Language models (LLMs). Models used for chatbots, instruction following, code generation, and many more, including prominent models like BLOOM, Dolly, Llama 2, GPT-J, GPTNeoX, ChatGLM, and Open-Llama have been enabled. 
  • Improved LLMs on GPU – Model coverage for dynamic shapes support has been expanded, further helping the performance of generative AI workloads on both integrated and discrete GPUs. Furthermore, memory reuse and weight memory consumption for dynamic shapes have been improved.   
  • Neural Network Compression Framework (NNCF) now includes an 8-bit weights compression method, making it easier to compress and optimize LLM models. SmoothQuant* method has been added for more accurate and efficient post-training quantization for Transformer-based models. 

More portability and performance to run ​AI at the edge, in the cloud or locally. 

  • NEW: Support for Intel® Core™ Ultra (codename Meteor Lake). This new generation of Intel CPUs is tailored to excel in AI workloads with a built-in inference accelerators. 
  • Integration with MediaPipe* – Developers now have direct access to this framework for building multipurpose AI pipelines. Easily integrate with OpenVINO Runtime and OpenVINO Model Server to enhance performance for faster AI model execution. You also benefit from seamless model management and version control, as well as custom logic integration with additional calculators and graphs for tailored AI solutions. Lastly, you can scale faster by delegating deployment to remote hosts via gRPC/REST interfaces for distributed processing. 

Support Change and Deprecation Notices 

  • OpenVINO™ Development Tools (pip install openvino-dev) are currently being deprecated and will be removed from installation option and distribution channels with 2025.0. 
  • Tools: 
  • Runtime:
    •  Intel® Gaussian & Neural Accelerator (Intel® GNA) is being deprecated, the GNA plugin will be discontinued with 2024.0.  
    • The shared_memory argument for Python API inference methods is deprecated and replaced by a new share_inputs argument. 
    • OpenVINO C++/C/Python 1.0 APIs will be discontinued with 2024.0. 
    • Python* 3.7 will be discontinued with 2023.2 LTS release.

OpenVINO™ Development Tools

  • List of components and their changes: 
    • A preview of the new OpenVINO converter tool (OVC) has been introduced. This tool offers functionality similar to Model Optimizer and is designed to be its lightweight version with the following differences:  
      • Pre-processing options like layout, channel reverse, mean and scale are supposed to be applied through preprocess API and not supported in OVC.
      • The model file is specified without input_model parameter, and the framework is detected automatically. 
    • Conversion API (Model Optimizer
      • convert_model Python API is now available in the openvino namespace.
      • Model Optimizer tool generates an Intermediate Representation or IR file with compressed weights by default. --compresss_to_fp16 option can be used to control this behavior. convert_model keeps original weights for generated OpenVINO Model object. 
    • Neural Network Compression Framework (NNCF)
      • Added SmoothQuant method for more accurate Post-training Quantization of Transformer-based models.  
      • Introduced new API nncf.compress_weights() and preliminary support for 8-bit weights compression method for OpenVINO and PyTorch LLMs.
      • Added Hyperparameters Tuning method into Post-training Quantization. When enabled, it automatically finds hyperparameters for the most efficient quantization results. 
      • Extended Post-training Quantization for OpenVINO by ChannelAlignment algorithm for more accurate quantization results. 
      • Extended Post-training Quantization for PyTorch by Fast Bias Correction algorithm for more accurate quantization results. For more details, refer to NNCF Release Notes.  
    • Benchmark Tool enables you to estimate deep-learning inference performance on supported devices for both synchronous and asynchronous modes. 

OpenVINO™ Runtime (previously known as Inference Engine)

  • Overall updates 
    • Proxy & hetero plugins have been migrated to API 2.0, providing enhanced compatibility and stability. 
    • Symbolic shape inference preview is now available, leading to improved performance for LLMs. 
    • OpenVINO's graph representation has been upgraded to opset12, introducing a new set of operations that offer enhanced functionality and optimizations. See the full list of operations here
  • OpenVINO Python API
    • Python Conversion API is now the primary conversion path, making it easier for Python developers to work with OpenVINO. 
    • Python API inference methods (like InferRequest.inferCompiledModel.__call__) have new parameters called share_inputs and share_outputs which allow to control memory sharing on inputs and outputs of inference. Enabling shared memory modes resulting in a “zero-copy” approach, which reduces memory consumption and compute overhead by reducing the number of copies. 
    • The torchvision.transforms object has been added to OpenVINO pre-processing, which allows user to embed torchvision pre-processing into IR. 
    • All python tools related to OpenVINO have been moved into a single namespace, improving user experience with better API readability. 
  • OpenVINO C API
    • The following C API 1.0 is removed from the 2023.1 release, as communicated in the 2023.0 release: 
      • NV12 and I420 color formats for legacy API  
      • Methods and functions enabling nv12 and i420 blobs creation 
      • InferRequest::SetBatch() method 
      • InferRequest::SetBlob() method which allows to set pre-processing for specific tensor in Infer request
      • Legacy properties: DYN_MATCH_LIMIT and DYN_BATCH_ENABLED 
    • The legacy C API is deprecated and will be removed in the 2024.0 release. Here are the instructions on transitioning to the new C API.  
  • AUTO device plug-in (AUTO)
    • Improved support of configuration through AUTO for the execution hardware devices, such as CPU or GPU, by leveraging ov::device::properties. 
  • Intel® CPU 
    • Enabled weights decompression support for LLMs. This implementation supports avx2 and avx512 HW targets for Intel® Core™ processors for improved latency mode (FP32 VS FP32+INT8 weights comparison). For 4th Generation Intel® Xeon® Scalable Processors (codename Sapphire Rapids) this INT8 decompression feature provides performance improvement, compared to pure BF16 inference. 
    • Improved performance of LLMs, particularly for Transformer-based models, in CPU plugin. By optimizing memory efficiency for output data between CPU plugin and the inference request, it improves performance of matrix multiplication operator, and other operators like LayerNorm and MVN. 
    • Reduced memory consumption for LLMs, especially for models with 7 billion parameters or above.  
    • Reduced overall model load and compile time, particularly for LLMs.  
    • Added new capability to provide full control for developers to tune the usage of the CPU cores for inference, including P-core, E-core, and hyper-threading cores based on supported CPU platforms. 
  • Intel® GPU 
    • Improved performance of generative AI models. Inference performance of LLMs is significantly improved on both iGPU and dGPU. Stable Diffusion performance is also improved on iGPU. 
    • Dynamic shape support is expanded to convolutional models by enabling more operators to support dynamism. Performance of dynamic shapes has been also improved for NLP models. 
    • Improved performance for StableDiffusion models. 
    • Performance of transformer models such as Bert-like models and vision transformer models is improved on dGPU with integration of oneDNN 3.2. 
  • Intel® Gaussian & Neural Accelerator (Intel® GNA) 
    • Introduced support for GNA 3.5, new version of GNA HW included in Intel® CoreTM Ultra (codename Meteor Lake). 
    • C++ and Python automatic speech recognition samples are being deprecated and will be removed with 2024.0.  
    • Optimized data pre-processing using AVX instructions on CPU has improved the runtime performance. 
    • Introduced support of automatic layout conversion which allows to support for more TensorFlow models out of the box.  

 Known Issues 

  

Jira* ID 

Description 

Component 

Workaround 

118179 

When inputs byte sizes are matching, inference methods accept incorrect inputs in copy mode (share_inputs=False). Example: [1, 4, 512, 512] is allowed when [1, 512, 512, 4] is required by the model. 

Python API, Plugins 

Pass inputs which shape and layout match model ones. 

119142 

Reading TensorFlow model directly from memory using convert_model causes unpredicted behavior such as inference results mismatch, wrong output names and degraded performance 

Conversion API, 
TensorFlow FE 

Serialize TensorFlow model to the file and only then pass it to convert_model 

System Requirements 

Disclaimer. Certain hardware (including but not limited to GPU, GNA, and latest CPUs) requires manual installation of specific drivers and/or other software components to work correctly and/or to utilize hardware capabilities at their best. This might require updates to operating system, including but not limited to Linux kernel, please refer to their documentation for details. These modifications should be handled by user and are not part of OpenVINO installation. 

Intel CPU processors with corresponding operating systems 

Intel Atom® processor with Intel® SSE4.2 support 

Intel® Pentium® processor N4200/5, N3350/5, N3450/5 with Intel® HD Graphics 

6th - 13th generation Intel® Core™ processors 

Intel® Core™ Ultra (codename Meteor Lake) 

Intel® Xeon® Scalable Processors (code name Skylake) 

2nd Generation Intel® Xeon® Scalable Processors (code name Cascade Lake) 

3rd Generation Intel® Xeon® Scalable Processors (code name Cooper Lake and Ice Lake) 

4th Generation Intel® Xeon® Scalable Processors (code name Sapphire Rapids) 

Operating Systems: 

  • Ubuntu* 22.04 long-term support (LTS), 64-bit (Kernel 5.15+) 
  • Ubuntu 20.04 long-term support (LTS), 64-bit (Kernel 5.15+) 
  • Ubuntu 18.04 long-term support (LTS) with limitations, 64-bit (Kernel 5.4+) 
  • Windows® 10 
  • Windows 11* 
  • macOS* 10.15 and above, 64-bit 
  • Red Hat Enterprise Linux* 8, 64-bit 
  • CentOS* 7 

Intel® Processor Graphics with corresponding operating systems (GEN Graphics) 

Intel® HD Graphics 

Intel® UHD Graphics 

Intel® Iris® Pro Graphics 

Intel® Iris® Xe Graphics 

Intel® Iris® Xe Max Graphics 

Intel® Arc ™ GPU Series 

Intel® Data Center GPU Flex Series 

Operating Systems: 

  • Ubuntu* 22.04 long-term support (LTS), 64-bit 
  • Ubuntu* 20.04 long-term support (LTS), 64-bit 
  • Windows 10, 64-bit 
  • Windows 11, 64-bit 
  • Red Hat Enterprise Linux* 8, 64-bit 

NOTES: 

  • This installation requires drivers that are not included in the Intel® Distribution of OpenVINO™ toolkit package. 
  • A chipset that supports processor graphics is required for Intel® Xeon® processors. Processor graphics are not included in all processors. See Product Specifications for information about your processor. 
  • Although this release works with Ubuntu 20.04 for discrete graphic cards, Ubuntu 20.04 is not POR for discrete graphics drivers, so OpenVINO support is limited. 
  • The following minimum (i.e., used for old hardware) OpenCL™ driver's versions were used during OpenVINO internal validation: 22.43 for Ubuntu* 22.04, 21.48 for Ubuntu* 20.04 and 21.49 for Red Hat Enterprise Linux* 8. 

Intel® Gaussian & Neural Accelerator 

Operating Systems: 

  • Ubuntu* 22.04 long-term support (LTS), 64-bit 
  • Ubuntu* 20.04 long-term support (LTS), 64-bit 
  • Windows 10, 64-bit 
  • Windows 11, 64-bit 

Operating system's and developer's environment requirements: 

  • Linux* OS  
    • Ubuntu 22.04 with Linux kernel 5.15+ 
    • Ubuntu 20.04 with Linux kernel 5.15+ 
    • RHEL 8 with Linux kernel 5.4 
    • A Linux OS build environment needs these components:  
    • Higher versions of kernel might be required for 10th Gen Intel® Core™ Processors, 11th Gen Intel® Core™ Processors, 11th Gen Intel® Core™ Processors S-Series Processors, 12th Gen Intel® Core™ Processors, 13th Gen Intel® Core™ Processors, Intel® Core™ Ultra Processors, or 4th Gen Intel® Xeon® Scalable Processors to support CPU, GPU, GNA or hybrid-cores CPU capabilities. 
  • Windows 10 and 11
  • macOS* 10.15 and above 
  • DL frameworks versions: 
    • TensorFlow* 1.15, 2.12 
    • MxNet* 1.9 
    • ONNX* 1.14 
    • PaddlePaddle* 2.4 
    • Note: This package can be installed on other versions of DL Framework but only the specified version above are fully validated. 

NOTE: OpenVINO Python binaries and binaries on Windows, CentOS 7, macOS (x86) are built with oneTBB libraries, and others on Ubuntu and RedHat OS systems are built with legacy TBB which is released by OS distribution. OpenVINO supports being built with oneTBB or legacy TBB from source on all above OS systems. System compatibility and performance were improved on Hybrid CPUs like 12th Gen Intel Core and above. 

Included in this release 

The Intel® Distribution of OpenVINO™ toolkit is available for downloading in three types of operating systems: Windows*, Linux*, and macOS*. 

Component 

License 

Location 

Windows 

Linux 

macOS 

OpenVINO (Inference Engine) C++ Runtime 

Unified API to integrate the inference with application logic 

OpenVINO (Inference Engine) Headers 

Dual licensing: 

Intel® OpenVINO™ Distribution License (Version May 2021) 

Apache 2.0 

<install_root>/runtime/* 

  

  

<install_root>/runtime/include/* 

Yes 

Yes 

Yes 

OpenVINO (Inference Engine) Pythion API 

Apache 2.0 

<install_root>/python/* 

Yes 

Yes 

Yes 

OpenVINO (Inference Engine) Samples 

Samples that illustrate OpenVINO C++/ Python API usage 

Apache 2.0 

<install_root>/samples/* 

Yes 

Yes 

Yes 

Deployment manager 

The Deployment Manager is a Python* command-line tool that creates a deployment package by assembling the model, IR files, your application, and associated dependencies into a runtime package for your target device. 

Apache 2.0 

<install_root>/tools/deployment_manager/* 

Yes 

Yes 

Yes 

 

What's included in the download package

  • Runtime/Inference Engine

You can choose how to install OpenVINO Runtime according to your operating system:

 

Helpful Links

NOTE: Links open in a new window.

This download is valid for the product(s) listed below.