How Does OpenVINO™ Allocate NPU Device Memory?

Content Type: Product Information & Documentation | Article ID: 000101553 | Last Reviewed: 02/18/2026

Description Resolution

Description

Unable to determine is there any API in OpenVINO for allocating NPU device memory.
Unable to determine is it possible to pass a pre-allocated device memory address as an argument to RemoteContext::create_tensor() as one of the create_tensor() overloads takes an AnyMap parameter and supports an SHARED_BUF enum.
Unable to determine whether NPU plugin can pass a pre-allocated host pointer to create_tensor() like what CPU and GPU plugins did.
Unable to determine whether OpenVINO internally copy the data into NPU device memory if call compile_model() with the NPU device selected but use ov::Tensor (rather than RemoteTensor) for the inputs and outputs.

Resolution

In OpenVINO, memory allocation on NPU devices is primarily managed via create_tensor() in conjunction with a RemoteContext. There is no public API to allocate raw NPU device memory directly outside of this mechanism; memory must be wrapped in the OpenVINO tensor abstraction for compatibility with the NPU plugin. However, the NPU plugin internally leverages Level Zero APIs, and advanced users could theoretically manage memory with zeMemAllocDevice externally; however, OpenVINO does not provide a direct or supported API for raw device memory management outside of its RemoteTensor interface.

The SHARED_BUF property in the AnyMap is meant to allow importing external memory into the NPU plugin. However, for this to work, the pointer must point to memory allocated using Level Zero-compatible APIs (for example, via zeMemAllocShared()), not just any buffer. The NPU plugin internally checks and validates the memory handle to ensure it's a valid, device-accessible address. Users who use SHARED_BUF, must ensure the buffer is allocated appropriately via Level Zero APIs. Standard malloc or new won’t work here.

For the NPU plugin, this API is not supported because the plugin doesn't directly accept host-side memory pointers as device tensors. Instead, use the RemoteTensor creation via a RemoteContext. The NPU expects a valid memory handle, not a raw pointer. The NPU plugin’s create_tensor() does not support raw host memory pointers like the CPU and GPU plugins.

Yes, OpenVINO automatically performs memory copy from host to device for inputs (and back for outputs) when using standard ov::Tensor inputs (host memory) with an NPU-compiled model. This copy happens internally as part of infer() or infer_async(). So, if performance is a concern and the user wants to avoid host-device copies, the user must use RemoteTensors mapped to device memory instead.

How Does OpenVINO™ Allocate NPU Device Memory?

Description

Resolution

Related Products

Need more help?