Model Types and Comparative Performance Indicators

GenAI Models (LLMs): An AI model trained with extensive volumes of text data to generate and understand language. First Token Latency: The time to generate the first token after receiving a prompt. Second Token Latency: The time to generate each subsequent token. Traditional AI (Vision Models): An AI model using classical algorithms to interpret images. Throughput (Frames per Second [FPS]): The number of frames and images processed per second. Latency (per Frame): The time to process each individual image. Traditional AI (Natural Language Processing [NLP] Models): An AI model using rule-based or statistical methods for language tasks. Latency: The speed at which the model processes a query. Throughput (Queries per Second): The number of queries processed per second. GenAI (Diffusion) Models: These AI models create data and content, like images, text, and more. Image-Generation Latency: The time taken to generate an image from the input. Throughput: The number of images generated per second.

OpenVINO™ toolkit: An open source AI toolkit that makes it easier to write once, deploy anywhere.

Welcome to the OpenVINO™ Model Hub

Discover the performance difference OpenVINO toolkit can deliver across AI models on Intel® hardware platforms from the edge to AI PCs. Access the latest OpenVINO toolkit performance benchmarks for a select list of leading generative AI (GenAI) and large language models (LLM) on Intel CPUs, built-in GPUs, NPUs, and accelerators. Explore Jupyter* Notebooks to accelerate your development with additional models supported by OpenVINO toolkit.

Model Performance: Find out how top models perform on Intel hardware.

Hardware Comparison: Find the right Intel hardware platform for your solution.

Model Testing: Expedite model evaluation using our Jupyter Notebooks.

OpenVINO Model Hub

Find More Supported Models

Discover Over 1000+ Models That Work with the OpenVINO Toolkit

Download Models from Intel’s Collections from Hugging Face*

Develop with Intel® AI Products

Learn More About Intel® Core™ Ultra Processors and AI PC Use Cases

Discover Intel® Xeon® Processors for Your AI Workloads

Explore Intel® Arc™ Graphics for Edge AI

AI Glossary

Inference Engine: The types of processors used for AI inferencing predictions, such as CPUs, GPUs, and accelerators.

GenAI Models (LLMs): An AI model trained with extensive volumes of text data to generate and understand language.

First Token Latency: The time to generate the first token after receiving a prompt.
Second Token Latency: The time to generate each subsequent token.

Traditional AI (Vision Models): An AI model using classical algorithms to interpret images.

Throughput (Frames per Second [FPS]): The number of frames and images processed per second.
Latency (per Frame): The time to process each individual image.

Traditional AI (Natural Language Processing [NLP] Models): An AI model using rule-based or statistical methods for language tasks.

Latency: The speed at which the model processes a query.
Throughput (Queries per Second): The number of queries processed per second.

GenAI (Diffusion) Models: These AI models create data and content, like images, text, and more.

Image-Generation Latency: The time taken to generate an image from the input.
Throughput: The number of images generated per second.

FP32 (32-bit Floating Point): A high-precision format using 32 bits to represent real numbers that is widely used in early AI models and tasks requiring high accuracy.

FP16 (16-bit Floating Point): A lower-precision format (compared to FP32) that is often used to speed up computations and reduce memory use where the highest precision isn't needed.

BF16 (Bfloat16): A variant of 16-bit floating point with a larger range, like FP32, but less precision than FP16. Common in modern training and inference due to its efficiency and adaptability to large-scale models.

Int8 (8-bit Integer): A lower-precision format typically applied in inference to significantly speed computations while reducing memory and power requirements. Often used in edge and mobile devices, requiring quantization.

Int4 (4-bit Integer): A low-precision integer format sometimes used in lightweight inference applications where efficiency is prioritized over precision.

Sign Up for Exclusive News, Tips & Releases

Be among the first to learn about everything new with the Intel® Distribution of OpenVINO™ toolkit. By signing up, you get early access product updates and releases, exclusive invitations to webinars and events, training and tutorial resources, and other breaking news.