Skip To Main Content
Support Knowledge Base

Is Parallelism Possible in OpenVINO™?

Content Type: Product Information & Documentation   |   Article ID: 000101300   |   Last Reviewed: 03/25/2026

Description

  • Ran Large Language Model (LLM) with multiple GPUs.
  • Unable to find more information on how to use multiple GPUs on an LLM.

Resolution

Parallelism is possible in OpenVINO™. To distribute inference across multiple GPUs, a heterogeneous plugin in OpenVINO can be used, allowing you to simultaneously leverage multiple inference devices (CPU, GPU, NPU) within a single model.

Refer to pipeline-parallelism for multiple devices' execution in OpenVINO.

Related Products

This article applies to 1 products.