Intel® oneAPI DPC++ Library
Speed Up DPC++ (SYCL*) Kernels on CPUs, GPUs, and FPGAs
A Performance and Productivity Library for Accelerated Computing
The Intel® oneAPI DPC++ Library (oneDPL) is a companion to the Intel® oneAPI DPC++/C++ Compiler and provides an alternative for C++ developers who create heterogeneous applications and solutions. Its APIs are based on familiar standards—C++ STL, Parallel STL (PSTL), Boost.Compute, and SYCL*—to maximize productivity and performance across CPUs, GPUs, and FPGAs.
- Allows explicit use of the C++ STL API within accelerated DPC++ kernels
- Streamline cross-architecture programming with Boost.Compute and PSTL algorithm extensions
- Increase the successful application of parallel algorithms with custom iterators
Download as Part of the Toolkit
oneDPL is included as part of the Intel® oneAPI Base Toolkit, which is a core set of tools and libraries for developing high-performance, data-centric applications across diverse architectures.
Download the Stand-Alone Version
A stand-alone download of oneDPL is available. You can download binaries from Intel or choose your preferred repository.
Develop in the Cloud
Build and optimize oneAPI multiarchitecture applications using the latest optimized Intel® oneAPI and AI tools, and test your workloads across Intel® CPUs and GPUs. No hardware installations, software downloads, or configuration necessary. Free for 120 days with extensions possible.
Help oneDPL Evolve
oneDPL is part of the oneAPI industry standards initiative. We welcome you to participate.
Features
Inline Accelerator Targeting
Use device and host containers to target GPUs and FPGAs or run your code across multi-node CPUs.
Optimized C++ Standard Algorithms
Access parallelized C++17 algorithms and utilities for efficient application development and deployment on a variety of hardware.
Integrated with Intel® DPC++ Compatibility Tool
This library complements all Intel oneAPI DPC++ components to simplify migration of CUDA* applications to SYCL code.
Documentation & Code Samples
Documentation
Code Samples
Use the C++ Standard Template Library (STL) Extended API from oneDPL
See how to write heterogeneous program to offload to a CPU or GPU using upper_bound and reduce_by_segment API to implement both dense and sparse histograms.
Use the extended API in oneDPL with the counting and zip iterator extensions to implement a stable sort by key algorithm that can be offloaded to a CPU or GPU.
Use the C++ STL and Parallel STL API for CPU & GPU Offload
This sample implements the Maxloc reduction search with direct SYCL code, and then simplifies it using oneDPL with three ways of passing data: in a standard container, in the SYCL buffer, or by using unified shared memory.
Learn how the oneDPL parallel STL policy and oneDPL algorithms help to accelerate a gamma correction's nonlinear operations to encode and decode the luminance of each pixel of an image.
View the oneAPI Samples Catalog
How to work with code samples:
Specifications
Processors:
- Intel® Core™ processors Gen6 and newer
- Intel® Xeon® processors
GPUs:
- Intel® UHD Graphics for 11th generation Intel processors or newer
- Intel® Iris® Xe graphics
- Intel® Arc™ graphics
- Intel® Data Center GPU Flex Series
- Intel® Data Center GPU Max Series
FPGAs:
- Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA
- Intel® Stratix® 10 FPGAs
Host and target operating systems:
- Windows
- Linux
Languages:
- SYCL
- C++
Development environments (optional):
- Microsoft Visual Studio*, Microsoft Visual Studio Code
- Eclipse* IDE
For more information, see the system requirements.
Get Help
Your success is our success. Access these forums when you need assistance.
Stay in the Know with All Things CODE
Sign up to receive the latest trends, tutorials, tools, training, and more to help you
write better code optimized for CPUs, GPUs, FPGAs, and other accelerators—
stand-alone or in any combination.