OpenVINO™ toolkit: An open source AI toolkit that makes it easier to write once, deploy anywhere.
OpenVINO™ Model Hub for AI Inference Benchmarks
Discover the performance difference OpenVINO toolkit can deliver across AI models on Intel® hardware platforms from the edge to AI PCs. Access the latest OpenVINO toolkit performance benchmarks for a select list of leading generative AI (GenAI) and large language models (LLM) on Intel CPUs, built-in GPUs, NPUs, and accelerators.
Model Performance: Find out how top models perform on Intel hardware.
Hardware Comparison: Find the right Intel hardware platform for your solution.
Develop with Intel® AI Products
Sign Up for Exclusive News, Tips & Releases
Be among the first to learn about everything new with the Intel® Distribution of OpenVINO™ toolkit. By signing up, you get early access product updates and releases, exclusive invitations to webinars and events, training and tutorial resources, contest announcements, and other breaking news.
AI Glossary
Inference Engine: The types of processors used for AI inferencing predictions, such as CPUs, GPUs, and accelerators.
GenAI Models (LLMs): An AI model trained with extensive volumes of text data to generate and understand language.
- First Token Latency: The time to generate the first token after receiving a prompt.
- Second Token Latency: The time to generate each subsequent token.
Traditional AI (Vision Models): An AI model using classical algorithms to interpret images.
- Throughput (Frames per Second [FPS]): The number of frames and images processed per second.
- Latency (per Frame): The time to process each individual image.
Traditional AI (Natural Language Processing [NLP] Models): An AI model using rule-based or statistical methods for language tasks.
- Latency: The speed at which the model processes a query.
- Throughput (Queries per Second): The number of queries processed per second.
GenAI (Diffusion) Models: These AI models create data and content, like images, text, and more.
- Image-Generation Latency: The time taken to generate an image from the input.
- Throughput: The number of images generated per second.
FP32 (32-bit Floating Point): A high-precision format using 32 bits to represent real numbers that is widely used in early AI models and tasks requiring high accuracy.
FP16 (16-bit Floating Point): A lower-precision format (compared to FP32) that is often used to speed up computations and reduce memory use where the highest precision isn't needed.
BF16 (Bfloat16): A variant of 16-bit floating point with a larger range, like FP32, but less precision than FP16. Common in modern training and inference due to its efficiency and adaptability to large-scale models.
Int8 (8-bit Integer): A lower-precision format typically applied in inference to significantly speed computations while reducing memory and power requirements. Often used in edge and mobile devices, requiring quantization.
Int4 (4-bit Integer): A low-precision integer format sometimes used in lightweight inference applications where efficiency is prioritized over precision.
Resources
Community and Support
Explore ways to get involved and stay up-to-date with the latest announcements.
Get Started
Optimize, fine-tune, and run comprehensive AI inference using the included model optimizer and runtime and development tools.
The productive smart path to freedom from the economic and technical burdens of proprietary alternatives for accelerated computing.