New and Changed in 2023.1
Summary of major features and improvements
More Generative AI options with Hugging Face and improved PyTorch model support.
-
NEW: Your PyTorch solutions are now even further enhanced with OpenVINO. You’ve got more options and you no longer need to convert to ONNX for deployment. Developers can now use their API of choice - PyTorch or OpenVINO for added performance benefits. Additionally, users can automatically import and convert PyTorch models for quicker deployment. You can continue to make the most of OpenVINO tools for advanced model compression and deployment advantages, ensuring flexibility and a range of options.
-
torch.compile (preview) – OpenVINO is now available as a backend through PyTorch torch.compile, empowering developers to utilize OpenVINO toolkit through PyTorch APIs. This feature has also been integrated into the Automatic1111 Stable Diffusion Web UI, helping developers achieve accelerated performance for Stable Diffusion 1.5 and 2.1 on Intel CPUs and GPUs in both Native Linux and Windows OS platforms.
-
Optimum Intel – Hugging Face and Intel continue to enhance top generative AI models by optimizing execution, making your models run faster and more efficiently on both CPU and GPU. OpenVINO serves as a runtime for inferencing execution. New PyTorch auto import and conversion capabilities have been enabled, along with support for weights compression to achieve further performance gains.
Broader LLM model support and more model compression techniques
-
Enhanced performance and accessibility for Generative AI: Runtime performance and memory usage have been significantly optimized, especially for Large Language models (LLMs). Models used for chatbots, instruction following, code generation, and many more, including prominent models like BLOOM, Dolly, Llama 2, GPT-J, GPTNeoX, ChatGLM, and Open-Llama have been enabled.
-
Improved LLMs on GPU – Model coverage for dynamic shapes support has been expanded, further helping the performance of generative AI workloads on both integrated and discrete GPUs. Furthermore, memory reuse and weight memory consumption for dynamic shapes have been improved.
-
Neural Network Compression Framework (NNCF) now includes an 8-bit weights compression method, making it easier to compress and optimize LLM models. SmoothQuant method has been added for more accurate and efficient post-training quantization for Transformer-based models.
More portability and performance to run AI at the edge, in the cloud or locally.
-
NEW: Support for Intel® Core™ Ultra (codename Meteor Lake). This new generation of Intel CPUs is tailored to excel in AI workloads with a built-in inference accelerators.
-
Integration with MediaPipe – Developers now have direct access to this framework for building multipurpose AI pipelines. Easily integrate with OpenVINO Runtime and OpenVINO Model Server to enhance performance for faster AI model execution. You also benefit from seamless model management and version control, as well as custom logic integration with additional calculators and graphs for tailored AI solutions. Lastly, you can scale faster by delegating deployment to remote hosts via gRPC/REST interfaces for distributed processing.
Support Change and Deprecation Notices
-
OpenVINO™ Development Tools (pip install openvino-dev) are currently being deprecated and will be removed from installation option and distribution channels with 2025.0.
- Tools:
- Accuracy Checker is deprecated and will be discontinued with 2024.0.
- Post-Training Optimization Tool (POT) has been deprecated and will be discontinued with 2024.0.
- Runtime:
- Intel® Gaussian & Neural Accelerator (Intel® GNA) is being deprecated, the GNA plugin will be discontinued with 2024.0.
- The shared_memory argument for Python API inference methods is deprecated and replaced by a new share_inputs argument.
- OpenVINO C++/C/Python 1.0 APIs will be discontinued with 2024.0.
- Python 3.7 will be discontinued with 2023.2 release.
OpenVINO™ Development Tools
- List of components and their changes:
- A preview of the new OpenVINO converter tool (OVC) has been introduced. This tool offers functionality similar to Model Optimizer and is designed to be its lightweight version with the following differences:
- Pre-processing options like layout, channel reverse, mean and scale are supposed to be applied through preprocess API and not supported in OVC.
- The model file is specified without input_model parameter, and the framework is detected automatically.
- Conversion API (Model Optimizer)
- convert_model Python API is now available in the openvino namespace.
- Model Optimizer tool generates an Intermediate Representation or IR file with compressed weights by default. --compresss_to_fp16 option can be used to control this behavior. convert_model keeps original weights for generated OpenVINO Model object.
- Neural Network Compression Framework (NNCF)
- Added SmoothQuant method for more accurate Post-training Quantization of Transformer-based models.
- Introduced new API nncf.compress_weights() and preliminary support for 8-bit weights compression method for OpenVINO and PyTorch LLMs.
- Added Hyperparameters Tuning method into Post-training Quantization. When enabled, it automatically finds hyperparameters for the most efficient quantization results.
- Extended Post-training Quantization for OpenVINO by ChannelAlignment algorithm for more accurate quantization results.
- Extended Post-training Quantization for PyTorch by Fast Bias Correction algorithm for more accurate quantization results. For more details, refer to NNCF Release Notes.
- Benchmark Tool enables you to estimate deep-learning inference performance on supported devices for both synchronous and asynchronous modes.
- A preview of the new OpenVINO converter tool (OVC) has been introduced. This tool offers functionality similar to Model Optimizer and is designed to be its lightweight version with the following differences:
OpenVINO™ Runtime (previously known as Inference Engine)
- Overall updates
- Proxy & hetero plugins have been migrated to API 2.0, providing enhanced compatibility and stability.
- Symbolic shape inference preview is now available, leading to improved performance for LLMs.
- OpenVINO's graph representation has been upgraded to opset12, introducing a new set of operations that offer enhanced functionality and optimizations.
- OpenVINO Python API
- Python Conversion API is now the primary conversion path, making it easier for Python developers to work with OpenVINO.
- Python API inference methods (like InferRequest.infer, CompiledModel.__call__) have new parameters called share_inputs and share_outputs which allow to control memory sharing on inputs and outputs of inference. Enabling shared memory modes resulting in a “zero-copy” approach, which reduces memory consumption and compute overhead by reducing the number of copies.
- The torchvision.transforms object has been added to OpenVINO pre-processing, which allows user to embed torchvision pre-processing into IR.
- All python tools related to OpenVINO have been moved into a single namespace, improving user experience with better API readability.
- OpenVINO C API
- The following C API 1.0 is removed from the 2023.1 release, as communicated in the 2023.0 release:
- NV12 and I420 color formats for legacy API
- Methods and functions enabling nv12 and i420 blobs creation
- InferRequest::SetBatch() method
- InferRequest::SetBlob() method which allows to set pre-processing for specific tensor in Infer request
- Legacy properties: DYN_MATCH_LIMIT and DYN_BATCH_ENABLED
- The legacy C API is deprecated and will be removed in the 2024.0 release. Here are the instructions on transitioning to the new C API.
- The following C API 1.0 is removed from the 2023.1 release, as communicated in the 2023.0 release:
- AUTO device plug-in (AUTO)
- Improved support of configuration through AUTO for the execution hardware devices, such as CPU or GPU, by leveraging ov::device::properties.
- Intel® CPU
- Enabled weights decompression support for LLMs. This implementation supports avx2 and avx512 HW targets for Intel® Core™ processors for improved latency mode (FP32 VS FP32+INT8 weights comparison). For 4th Generation Intel® Xeon® Scalable Processors (codename Sapphire Rapids) this INT8 decompression feature provides performance improvement, compared to pure BF16 inference.
- Improved performance of LLMs, particularly for Transformer-based models, in CPU plugin. By optimizing memory efficiency for output data between CPU plugin and the inference request, it improves performance of matrix multiplication operator, and other operators like LayerNorm and MVN.
- Reduced memory consumption for LLMs, especially for models with 7 billion parameters or above.
- Reduced overall model load and compile time, particularly for LLMs.
- Added new capability to provide full control for developers to tune the usage of the CPU cores for inference, including P-core, E-core, and hyper-threading cores based on supported CPU platforms.
- Intel® GPU
- Improved performance of generative AI models. Inference performance of LLMs is significantly improved on both iGPU and dGPU. Stable Diffusion performance is also improved on iGPU.
- Dynamic shape support is expanded to convolutional models by enabling more operators to support dynamism. Performance of dynamic shapes has been also improved for NLP models.
- Improved performance for StableDiffusion models.
- Performance of transformer models such as Bert-like models and vision transformer models is improved on dGPU with integration of oneDNN 3.2.
- Intel® Gaussian & Neural Accelerator (Intel® GNA)
-
Introduced support for GNA 3.5, new version of GNA HW included in Intel® CoreTM Ultra (codename Meteor Lake).
-
C++ and Python automatic speech recognition samples are being deprecated and will be removed with 2024.0.
-
Optimized data pre-processing using AVX instructions on CPU has improved the runtime performance.
-
Introduced support of automatic layout conversion which allows to support for more TensorFlow models out of the box.
-
- Model Import Updates
- TensorFlow Framework Support
- Added support for Switch/Merge operations, bringing TensorFlow 1.x control flow support closer to full compatibility and enabling more models.
- Added support for the TensorFlow 1 Checkpoint format. All native TensorFlow formats are now enabled.
- Added support for 12 new operations:
-
UnsortedSegmentSum
-
FakeQuantWithMinMaxArgs
-
MaxPoolWithArgmax
-
UnravelIndex
-
AdjustContrastv2
-
InvertPermutation
-
CheckNumerics
-
DivNoNan
-
EnsureShape
-
ShapeN
-
Switch
-
Merge
-
- PyTorch Framework Support
- OpenVINO now supports originally quantized PyTorch models, including models produced with the Neural Network Compression Framework (NNCF).
- Added support for inplace operations on aliases of tensors improving accuracy for detection models.
- Added support for 43 new operations. To learn about PyTorch model conversion, follow this link.
-
aten::concat
-
aten::masked_scatter
-
aten::linspace
-
aten::view_as
-
aten::std
-
aten::outer
-
aten::broadcast_to
-
aten::allaten::embedding_bag
-
aten::argmax
-
aten::argmin
-
aten::unflatten
-
aten::item
-
aten::frobenius_norm
-
aten::__range_length
-
aten::__derive_index
-
aten::cdist
-
aten::pairwise_distance
-
aten::squeeze
-
aten::LogSoftmax
-
aten::_native_multi_head_attention
-
aten::_shape_as_tensor
-
aten::t
-
aten::fft_rfftn
-
aten::fft_irfft
-
aten::padaten::reflection_pad2d
-
aten::fake_quantize_per_tensor_affine
-
aten::fake_quantize_per_channel_affine
-
aten::scatter
-
aten::quantize_per_tensor
-
aten::quantize_per_channel
-
aten::dequantize
-
aten::quantize_per_channel
-
aten::rand
-
aten::randn
-
aten::rand_like
-
aten::randn_like
-
aten::broadcast_to
-
aten::_upsample_bilinear2d_aa
-
aten::_upsample_bicubic2d_aa
-
aten::randint
-
aten::index_put_
-
aten::tensor_split
-
- TensorFlow Framework Support
Distribution (where to download the release)
The OpenVINO product selector tool (available at www.openvino.ai) provides easy access to the right packages that match your desired needs; OS, version, and distribution options.
- Added conda-forge pre-release channel, simplifying OpenVINO pre-release installation with “conda install -c “conda-forge/label/openvino_dev” openvino” command.
- Python API is now distributed as a part of conda-forge distribution, allowing users to access it using the command above.
- Runtime can now be installed and used via vcpkg C++ package manager, providing more flexibility in integrating OpenVINO into projects.
- The 2023.1 release is available via the following distribution channels:
- pypi.org: https://pypi.org/project/openvino-dev/
- DockerHub* https://hub.docker.com/u/openvino
- Release Archives specifically for C++ users can be found here: https://storage.openvinotoolkit.org/repositories/openvino/packages/
- APT & YUM
- Homebrew https://formulae.brew.sh/formula/openvino
- A new distribution channel has been introduced for C++ developers: Conda Forge.
OpenVINO Ecosystem
OpenVINO Model Server (OVMS) is a solution for serving models. The tool uses the same API endpoints as TensorFlow Serving and KServe while leveraging OpenVINO for inference execution. See the full release notes on GitHub.
Learn more about the changes in https://github.com/openvinotoolkit/model_server/releases
The following public models are deprecated and will be removed in the 2023.2 release:
- All Caffe models
-
alexnet
-
caffenet
-
densenet-121
-
face-detection-retail-0044
-
googlenet-v1
-
googlenet-v2
-
mobilenet-ssd
-
mobilenet-v1-1.0-224
-
mobilenet-v2
-
mtcnn
-
pelee-coco
-
se-inception
-
se-resnet-50
-
se-resnext-50
-
shufflenet-v2-x0.5
-
Sphereface
-
squeezenet1.0
-
squeezenet1.1
-
ssd300
-
ssd512
-
vgg16
-
vgg19
-
- All MXNet models
-
brain-tumor-segmentation-0001
-
mobilefacedet-v1-mxnet
-
octave-resnet-26-0.25
-
- All Paddle models
-
mobilenet-v3-large-1.0-224-paddle
-
mobilenet-v3-small-1.0-224-paddle
-
ocrnet-hrnet-w48-paddle
-
-
DeblurGAN-v2
Jupyter Notebook Tutorials
Since the 2023.0 release, the following new notebooks have been added:
-
121-convert-to-openvino : Learn OpenVINO model conversion API
-
122-quantizing-model-with-accuracy-control : Quantizing with Accuracy Control using NNCF
-
220-books-alignment-labse - Cross-lingual Books Alignment with Transformers
-
241-riffusion-text-to-music - Text-to-Music generation using Riffusion
-
242-freevc-voice-conversion - High-Quality Text-Free One-Shot Voice Conversion with FreeVC
-
243-tflite-selfie-segmentation - Selfie Segmentation using TFLite
-
244-named-entity-recognition : Named entity recognition with OpenVINO™
-
245-typo-detector : English Typo Detection in sentences with OpenVINO™
-
246-depth-estimation-videpth : Monocular Visual-Inertial Depth Estimation with OpenVINO™
-
247-code-language-id : Identify the programming language used in an arbitrary code snippet
-
248-stable-diffusion-xl : Image generation with Stable Diffusion XL
-
249-oneformer-segmentation : Universal segmentation with OneFormer
-
250-music-generation : Text-to-Music generation using MusicGen
-
251-tiny-sd-image-generation : Image generation with TinySD
-
252-fastcomposer-image-generation: Fastcomposer – personalized image generation without model fine-tuning
-
253-zeroscope-text2video: Text-to-Video generation using ZeroScope
Added tutorials for 8-bit quantization support for the following notebooks:
-
227-whisper-subtitles-generation : Generate subtitles for video with OpenAI Whisper and OpenVINO
-
228-clip-zero-shot-image-classification : Zero-shot Image Classification with OpenAI CLIP
-
237-segment-anything : Object masks from prompts with SAM and OpenVINO
-
239-image-bind : Binding multimodal data using ImageBind and OpenVINO™
Known Issues
|
Jira ID |
Description |
Component |
Workaround |
1 |
118179 |
When inputs byte sizes are matching, inference methods accept incorrect inputs in copy mode (share_inputs=False). Example: [1, 4, 512, 512] is allowed when [1, 512, 512, 4] is required by the model. |
Python API, Plugins |
Pass inputs which shape and layout match model ones. |
2 |
119142 |
Reading TensorFlow model directly from memory using convert_model causes unpredicted behavior such as inference results mismatch, wrong output names and degraded performance |
Conversion API, |
Serialize TensorFlow model to the file and only then pass it to convert_model |
System Requirements
Disclaimer. Certain hardware (including but not limited to GPU, GNA, and latest CPUs) requires manual installation of specific drivers and/or other software components to work correctly and/or to utilize hardware capabilities at their best. This might require updates to operating system, including but not limited to Linux kernel, please refer to their documentation for details. These modifications should be handled by user and are not part of OpenVINO installation.
Intel CPU processors with corresponding operating systems
Intel Atom® processor with Intel® SSE4.2 support
Intel® Pentium® processor N4200/5, N3350/5, N3450/5 with Intel® HD Graphics
6th - 13th generation Intel® Core™ processors
Intel® Core™ Ultra (codename Meteor Lake)
Intel® Xeon® Scalable Processors (code name Skylake)
2nd Generation Intel® Xeon® Scalable Processors (code name Cascade Lake)
3rd Generation Intel® Xeon® Scalable Processors (code name Cooper Lake and Ice Lake)
4th Generation Intel® Xeon® Scalable Processors (code name Sapphire Rapids)
Operating Systems:
-
Ubuntu 22.04 long-term support (LTS), 64-bit (Kernel 5.15+)
-
Ubuntu 20.04 long-term support (LTS), 64-bit (Kernel 5.15+)
-
Ubuntu 18.04 long-term support (LTS) with limitations, 64-bit (Kernel 5.4+)
-
Windows* 10
-
Windows* 11
-
macOS* 10.15 and above, 64-bit
-
Red Hat Enterprise Linux* 8, 64-bit
-
CentOS 7
Intel® Processor Graphics with corresponding operating systems (GEN Graphics)
Intel® HD Graphics
Intel® UHD Graphics
Intel® Iris® Pro Graphics
Intel® Iris® Xe Graphics
Intel® Iris® Xe Max Graphics
Intel® Arc ™ GPU Series
Intel® Data Center GPU Flex Series
Operating Systems:
-
Ubuntu* 22.04 long-term support (LTS), 64-bit
-
Ubuntu* 20.04 long-term support (LTS), 64-bit
-
Windows* 10, 64-bit
-
Windows* 11, 64-bit
-
Red Hat Enterprise Linux* 8, 64-bit
NOTES:
-
This installation requires drivers that are not included in the Intel® Distribution of OpenVINO™ toolkit package.
-
A chipset that supports processor graphics is required for Intel® Xeon® processors. Processor graphics are not included in all processors. See Product Specifications for information about your processor.
-
Although this release works with Ubuntu 20.04 for discrete graphic cards, Ubuntu 20.04 is not POR for discrete graphics drivers, so OpenVINO support is limited.
-
The following minimum (i.e., used for old hardware) OpenCL™ driver's versions were used during OpenVINO internal validation: 22.43 for Ubuntu* 22.04, 21.48 for Ubuntu* 20.04 and 21.49 for Red Hat Enterprise Linux* 8.
Intel® Gaussian & Neural Accelerator
Operating Systems:
-
Ubuntu* 22.04 long-term support (LTS), 64-bit
-
Ubuntu* 20.04 long-term support (LTS), 64-bit
-
Windows* 10, 64-bit
-
Windows* 11, 64-bit
Operating system's and developer's environment requirements:
- Linux* OS
-
Ubuntu 22.04 with Linux kernel 5.15+
-
Ubuntu 20.04 with Linux kernel 5.15+
-
RHEL 8 with Linux kernel 5.4
-
A Linux OS build environment needs these components:
-
Python* 3.7-3.11
-
Intel® HD Graphics Driver. Required for inference on GPU.
-
Note: GNU Compiler Collection and CMake are needed for building from source:
-
GNU Compiler Collection (GCC)* 8.4 (RHEL 8) 9.3 (Ubuntu 20)
-
-
-
Higher versions of kernel might be required for 10th Gen Intel® Core™ Processors, 11th Gen Intel® Core™ Processors, 11th Gen Intel® Core™ Processors S-Series Processors, 12th Gen Intel® Core™ Processors, 13th Gen Intel® Core™ Processors, Intel® Core™ Ultra Processors, or 4th Gen Intel® Xeon® Scalable Processors to support CPU, GPU, GNA or hybrid-cores CPU capabilities.
-
-
Windows* 10 and 11
-
A Windows* OS build environment needs these components:
-
Intel® HD Graphics Driver. Required only for GPU.
-
-
macOS* 10.15 and above
-
A macOS build environment requires these components:
-
-
DL frameworks versions:
-
TensorFlow* 1.15, 2.12
-
MxNet* 1.9
-
ONNX* 1.14
-
PaddlePaddle* 2.4
-
Note: This package can be installed on other versions of DL Framework but only the specified version above are fully validated.
-
NOTE: OpenVINO Python binaries and binaries on Windows, CentOS 7, macOS (x86) are built with oneTBB libraries, and others on Ubuntu and RedHat OS systems are built with legacy TBB which is released by OS distribution. OpenVINO supports being built with oneTBB or legacy TBB from source on all above OS systems. System compatibility and performance were improved on Hybrid CPUs like 12th Gen Intel Core and above.
Included in This Release
The Intel® Distribution of OpenVINO™ toolkit is available for downloading in three types of operating systems: Windows*, Linux*, and macOS*.
Component |
License |
Location |
Windows |
Linux |
macOS |
OpenVINO (Inference Engine) C++ Runtime Unified API to integrate the inference with application logic OpenVINO (Inference Engine) Headers |
Dual licensing: Intel® OpenVINO™ Distribution License (Version May 2021) |
<install_root>/runtime/*
<install_root>/runtime/include/* |
Yes |
Yes |
Yes |
OpenVINO (Inference Engine) Pythion API |
<install_root>/python/* |
Yes |
Yes |
Yes |
|
OpenVINO (Inference Engine) Samples Samples that illustrate OpenVINO C++/ Python API usage |
Apache 2.0 |
<install_root>/samples/* |
Yes |
Yes |
Yes |
Deployment manager The Deployment Manager is a Python* command-line tool that creates a deployment package by assembling the model, IR files, your application, and associated dependencies into a runtime package for your target device. |
Apache 2.0 |
<install_root>/tools/deployment_manager/* |
Yes |
Yes |
Yes |
Helpful Links
NOTE: Links open in a new window.
Featured Documentation
All Documentation, Guides, and Resources
Legal Information
You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at http://www.intel.com/ or from the OEM or retailer.
No computer system can be absolutely secure.
Intel, Atom, Arria, Core, Movidius, Xeon, OpenVINO, and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos
*Other names and brands may be claimed as the property of others.
Copyright © 2023, Intel Corporation. All rights reserved.
For more complete information about compiler optimizations, see our Optimization Notice.