Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference

ID 767253
Date 3/22/2024
Public
Document Table of Contents

fsycl-pstl-offload

Enables the offloading of C++ standard parallel algorithms to a SYCL device. This is an experimental feature.

Syntax

Linux:

-fsycl-pstl-offload[=arg]

-fno-sycl-pstl-offload

Windows:

None

Arguments

arg

Is one of the following:

cpu

Tells the compiler to perform offloading to a SYCL CPU device.

gpu

Tells the compiler to perform offloading to a SYCL GPU device.

Default

-fno-sycl-pstl-offload

C++ standard parallel algorithms are not offloaded.

Description

This option enables the offloading of C++ standard parallel algorithms that were called with std::execution::par_unseq policy to a SYCL device. The offloaded algorithms are implemented via the oneAPI Data Parallel C++ Library (oneDPL). This option is an experimental feature.

If you do not specify arg, it tells the compiler to perform offloading to the default SYCL device.

oneDPL is required for offloading support. See the oneDPL documentation for information about how to make it available in the environment.

NOTE:

When using this option, you must also specify option -fsycl.

The following are restrictions, requirements, and limitations when using option fsycl-pstl-offload:

  • Parallel algorithms callable objects restrictions

    Parallel algorithms callable objects have the same limitations as SYCL kernels:

    • Exceptions are not allowed.

    • Dynamic memory allocation is not allowed.

    • There can be no unsupported API from std.

    For the complete list of kernel limitations, see the SYCL 2020 specification.

  • Data placement requirements

    • Only heap memory allocated with C++ standard dedicated facilities can be passed to the standard algorithms for offloading.

    • std::vector can also be used with parallel algorithms for offloading since it dynamically allocated memory underneath.

    • Stack allocated on the host cannot be used in offloaded parallel algorithms as well as std::array and C-style array on the stack. The solution for such a situation is to make a "deep copy" by capturing it in an algorithm callable by value or by allocating std::array or C-style array on the heap.

    • Performance of memory allocations may be improved by using the SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR environment variable. For more information about this environment variable, see Environment Variables on GitHub.

  • Other limitations:

    • Only a subset of standard C++ APIs can be used in parallel algorithms callable objects. For the complete list, see the oneDPL documentation on Tested Standard C++ APIs.

    • Currently, this option is only supported for Linux.

    • The maximum supported memory alignment is 2048 bytes.

    • Option -fsycl-pstl-offload with the same argument must be applied to all Translation Units (TU) in an executable or a dynamic library.

IDE Equivalent

None

Alternate Options

None