Level Zero Immediate Command Lists

author-image

By

Summary of Default Behavior Change

The content applies to SYCL and C/C++/Fortran OpenMP offload programs using the Level Zero plugin. 

Starting with the 2023.2 compiler release, immediate command lists are the default submission mode on Intel® Data Center GPU Max Series running Linux. With the 2025.3 release, the L0 V2 adapter is now the default for newer platforms like Intel Arc B Series and Intel Core Ultra 200V Series. and it only supports Immediate Command Lists, marking a shift toward standardizing immediate submission across newer hardware.

For platforms still using the L0 V1 adapter, Immediate Command Lists are now default for:

  • Intel® Data Center GPU Max Series
  • Intel® Data Center GPU Flex Series (starting 2025.0)
  • Intel® Arc™ A Series (except Intel Core Ultra 200 Series and Intel Core Ultra 100 Series iGPUs)

Here's a quick lookup table on Immediate Command Lists support and L0 adapter for various Intel platforms:

Platform/GPU Series

Adapter Version

Default Submission Mode

Immedia Command List Support

Notes
Intel® Data Center GPU Max Series L0 V1 Immediate Enabled Best performance in most cases; can revert via env vars if needed
Intel® Data Center GPU Flex Series (2025.0+) L0 V1 Immediate Enabled Performance tuning ongoing
Intel® Arc™ A Series (except Intel Core Ultra 200 series / Intel Core Ultra 100 series) L0 V1 Immediate Enabled ULLs supported; tuning limited
Intel® Arc™ B Series L0 V2 Immediate (only mode supported) Required L0 V2 adapter only supports immediate mode
Intel® Core Ultra 200V Series L0 V2 Immediate (only mode supported) Required L0 V2 adapter only supports immediate mode
Intel Core Ultra 200 series / Intel Core Ultra 100 series L0 V1 Regular Not Supported Immediate mode not default or supported
Other Platforms (older Intel GPUs) L0 V1 Regular Not Supported Immediate mode not default

 

Level Zero Immediate Command Lists 

The Level Zero API provides two modes of submitting work to the GPU:

1.      Regular command lists in combination with command queues

2.      Immediate command lists where the command queue is implicit

In the first mode, programming (e.g. zeCommandListAppendLaunchKernel) and submission (zeCommandQueueExecuteCommandList) are decoupled. Upper layers of software such as the SYCL Level Zero plugin and the OpenMP target runtime control when the actual submission occurs.

The advantage of this mode for Intel® Data Center GPU Max Series is:

  • Submissions can be batched on the host, i.e., many operations may be collected in a command list and then submitted together, thus dividing the submission cost across many operations.

The disadvantages are:

  • Multiple command lists cannot run concurrently when only a single hardware queue is used.
  • Dependencies between operations in a one SYCL queue can impede progress of GPU operations in a different SYCL queue even when there are no dependences across the queues, when both SYCL queues are mapped to the same underlying hardware queue.
  • Cache invalidation occurs at each submission.
  • Requires managing command lists (create/reset) which has higher overhead.

In the second mode using immediate command lists, programming and submission occur together. The tradeoffs are different.

The advantages are:

  • Multiple command lists can run concurrently on a single hardware queue.
  • Allows batching of kernels on the GPU.

The disadvantage is:

  • Has more host overhead on appending an operation to the command list because actual submission to the GPU occurs immediately.

If an application (typical in some AI workloads) uses only one SYCL queue (in-order or out-of-order) and has very short-running kernels (of the order of < 10 microseconds), then host submission time becomes very important and immediate command may cause performance regressions. We recommend using the environment variables to go back to using regular command lists if you encounter this problem (see below).

Forcing Use of Immediate Command Lists

The platform defaults can be over-ridden by setting environment variables to enable immediate command lists for SYCL and OpenMP offload programs (including OpenMP applications that use “omp dispatch” to call MKL).

SYCL control:

SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
OpenMP control:
LIBOMPTARGET_LEVEL_ZERO_USE_IMMEDIATE_COMMAND_LIST=all

Forcing Use of Regular Command Lists

The platform defaults can be over-ridden by setting environment variables to use regular command lists for SYCL and OpenMP offload programs (including OpenMP applications that use “omp dispatch” to call MKL).

SYCL control:

SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=0 
OpneMP control:
LIBOMPTARGET_LEVEL_ZERO_USE_IMMEDIATE_COMMAND_LIST=0

Future Plans for Immediate Command Lists

Performance analysis and tuning of immediate command lists is ongoing. There are known optimization opportunities throughout the software stack that are being addressed. There is also work in progress to enable immediate command lists on other Intel® GPUs.

Future releases will also support a SYCL language-level queue property applicable to an individual SYCL queue to choose between regular and immediate command lists. Applications may use the property instead of relying on environment variables to select the submission model.

1