In OpenVINO, memory allocation on NPU devices is primarily managed via create_tensor() in conjunction with a RemoteContext. There is no public API to allocate raw NPU device memory directly outside of this mechanism; memory must be wrapped in the OpenVINO tensor abstraction for compatibility with the NPU plugin. However, the NPU plugin internally leverages Level Zero APIs, and advanced users could theoretically manage memory with zeMemAllocDevice externally; however, OpenVINO does not provide a direct or supported API for raw device memory management outside of its RemoteTensor interface.
The SHARED_BUF property in the AnyMap is meant to allow importing external memory into the NPU plugin. However, for this to work, the pointer must point to memory allocated using Level Zero-compatible APIs (for example, via zeMemAllocShared()), not just any buffer. The NPU plugin internally checks and validates the memory handle to ensure it's a valid, device-accessible address. Users who use SHARED_BUF, must ensure the buffer is allocated appropriately via Level Zero APIs. Standard malloc or new won’t work here.
For the NPU plugin, this API is not supported because the plugin doesn't directly accept host-side memory pointers as device tensors. Instead, use the RemoteTensor creation via a RemoteContext. The NPU expects a valid memory handle, not a raw pointer. The NPU plugin’s create_tensor() does not support raw host memory pointers like the CPU and GPU plugins.
Yes, OpenVINO automatically performs memory copy from host to device for inputs (and back for outputs) when using standard ov::Tensor inputs (host memory) with an NPU-compiled model. This copy happens internally as part of infer() or infer_async(). So, if performance is a concern and the user wants to avoid host-device copies, the user must use RemoteTensors mapped to device memory instead.