Support Knowledge Base

Is Parallelism Possible in OpenVINO™?

Content Type: Product Information & Documentation | Article ID: 000101300 | Last Reviewed: 03/25/2026

Description Resolution

Description

Ran Large Language Model (LLM) with multiple GPUs.
Unable to find more information on how to use multiple GPUs on an LLM.

Resolution

Parallelism is possible in OpenVINO™. To distribute inference across multiple GPUs, a heterogeneous plugin in OpenVINO can be used, allowing you to simultaneously leverage multiple inference devices (CPU, GPU, NPU) within a single model.

Refer to pipeline-parallelism for multiple devices' execution in OpenVINO.

Related Products

This article applies to 1 products.

OpenVINO™ toolkit

Need more help?

Contact support