Article ID: 000057525 Content Type: Maintenance & Performance Last Reviewed: 01/31/2023

Why Is the Model Load Time to GPU Longer Than to CPU?

Environment

OpenVINO™ toolkit GPU plugin CPU plugin

BUILT IN - ARTICLE INTRO SECOND COMPONENT
Summary

Quick step to improve the model load time on GPU

Description

Loading an input model's Intermediate Representation (IR) to GPU takes longer than loading the same model to a CPU.

Resolution

Manually create cl_cache directory in the working directory of your application.

The driver will use this directory to store the binary representations of the compiled kernels. This will work on all supported OSes.

Additional information

Refer to this article for more information on managing the cl_cache.

Loading your input model in Intermediate Representation (IR) format to GPU takes longer than loading the same model to a CPU because the GPU stack is based on OpenCL*. The load time depends on the compilation time of OpenCL* kernels.

When you enable the cl_cache, the first time you load the model it will still take a long time because the OpenCL* kernel will compile. However, each subsequent load of the same model will be much faster.

Related Products

This article applies to 2 products