The goal of oneAPI is to deliver open, performant, productive cross-architecture programming for CPUs and accelerators. Achieving this goal has multiple challenges due to the diverse set of architectures and programming solutions provided by vendors today. To be successful, we need a software design to support many programming languages so that programmers have a choice. Similarly, we need to support many accelerators, both new and existing. In this blog, we discuss how to refactor the oneAPI design to make it easier to integrate new languages and new hardware platforms.
oneAPI was designed with a layered architecture: a productivity layer on top of a performance layer, both built on the foundation of a hardware platform abstraction. In the initial design, services needed in a language runtime, such as memory management and kernel submission, were contained in the performance layer and the platform layer. Going forward, we propose to introduce a new runtime layer to reduce code duplication, separate the language issues from the platform specifics, and simplify the support of new language and new accelerators.
|oneAPI Layered Architecture|
|Productivity||The top of the stack contains frameworks, applications, and productivity languages like Python. TensorFlow and PyTorch are examples of machine learning frameworks.|
|Performance||Languages||Performance languages enable the programmer to deploy code targeting accelerators. The primary language of oneAPI is SYCL, a Khronos standard based on ISO C++. oneAPI also supports OpenMP with C++ and Fortran.|
|Libraries||Libraries deliver high-performance functions that are accessed through standard APIs. For example, the oneAPI Math Kernel Library (oneMKL) provides high-performance implementations of Basic Linear Algebra, as well as other common mathematics algorithms.|
|Runtime||A new unified runtime supplies common functionality needed to target programming constructs to a hardware platform.|
|Platform||The platform is a hardware abstraction layer that defines the minimum required set of capabilities to integrate a hardware device into oneAPI.|
Level Zero is the foundational hardware abstraction layer for oneAPI (spec.oneapi.io/level-zero/latest). Level Zero provides explicit, low overhead control of the accelerator hardware; it defines the minimal set of APIs required to integrate an accelerator into oneAPI. For new hardware, Level Zero is a natural choice. But for existing accelerators with well-established drivers, it is pragmatic to give integrators a choice, either to create a tightly integrated implementation in Level Zero or to directly reuse the accelerator’s existing driver stack.
Current oneAPI implementations support alternatives to the Level Zero at the language layer. The SYCL standard explicitly supports multiple backends (five-outstanding-additions-sycl2020.html). The multiple backends are implemented with a plug-in architecture that maps the SYCL runtime to the hardware abstraction. However, this plug-in is specific to the SYCL implementation. The OpenMP mapping also uses a similar plug-in idea. Requiring a plug-in for each combination of language and hardware platform causes much code duplication and increases the difficulty of introducing new languages and accelerators into oneAPI. We propose to introduce a new runtime layer into oneAPI, moving the plug-ins into a unified runtime.
To support a language in oneAPI, we need three things:
1. We need programming language constructs to specify the code to be run on an accelerator. In SYCL, this is done by defining kernel functions that are submitted to a queue. OpenMP uses directives to define regions of code to offload.
2. We need a compiler that identifies the code to be offloaded to an accelerator and translates that source code to SPIR-V, the standard intermediate representation defined by Khronos (khronos.org/spir).
3. We need a mapping from the runtime primitives of the language (e.g., memory management, kernel invocation, error handling) to the hardware abstraction defined by the oneAPI platform layer.
The language definition is straightforward. For example, the Julia team recently adapted its portable kernel abstractions to support the key SYCL language features in the Julia language (github.com/JuliaGPU/oneAPI.jl).
The compiler is naturally based on LLVM, which has good support for identifying the host and device code and compiling each separately (github.com/intel/llvm/blob/sycl/sycl/doc/design/CompilerAndRuntimeDesign.md).
A new unified runtime would simplify the support of a new language by providing the services needed by these languages to target accelerators effectively. Examples of these services include a scheduler that load-balances kernel executions across a set of devices, and memory allocation algorithms that decide dynamically the best allocation policy for a particular hardware platform. The unified runtime would also provide an abstraction to enable a single language implementation to target different accelerators. As mentioned above, both SYCL and OpenMP support multiple platforms by using plugins. By refactoring the plugins into a unified runtime, we avoid duplication and cleanly isolate the hardware specific platform layers from the language layer.
The goal of oneAPI is to deliver open, performant, productive cross-architecture programming for CPUs and accelerators. We are working with the oneAPI community and taking the lessons from the first implementations to propose refactoring the oneAPI design, making it simpler to add new languages and new accelerators.
Learn more about oneAPI including notes from the advisory boards - oneapi.io.
Learn more about programming with oneAPI - Base Training Modules for Intel oneAPI.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.