Heterogeneous Processing Requires Data Parallelization: SYCL* and DPC++ Are a Good Start

Published: 05/21/2021  

Last Updated: 05/21/2021

James Reinders, editor emeritus, The Parallel Universe
@IntelDevTools


[Note that this article was originally published at The New Stack.]

It’s all about XPUs.

This is a wonderful time. Hardware innovation is leading to an explosion in CPUs, GPUs, FPGAs, DSPs, ASICs, and more, which can be called XPUs. They are any type of "processing unit" (hardware) that can help applications compute.

The onslaught of XPUs means that developers are increasingly challenged to code for a larger collection of diverse processing units. They are tasked with factoring in extra time and money to rewrite and test code to boost application performance for new architectures. More than ever, to preserve their sanity and the maintainability of code, it is paramount that the code they write is applicable to as many XPUs as possible. Moving to cross-architecture models for application development has shown that this can save organizations significant time and money, and this becomes an even more pressing concern with the rise in popularity of heterogeneous computing.

Underway today is a rethinking because our world is rapidly becoming a world of XPUs that will eventually transform all of computing.

 

XPUs: Reinventing Software for Accelerated Compute

CUDA*, a widely used proprietary software programming system, was designed and is effective for NVIDIA* GPUs. The OpenCL™ standard took an open approach and achieved a certain level of multivendor support. The OpenCL standard had its own shortcomings—most notably being C-centric and failing to address C++ needs well.

CUDA and the OpenCL standard have served their purposes well. Going forward, developers need a truly open and multivendor approach to help deliver on the promises of XPUs.

 

Why SYCL* and Data Parallel C++ (DPC++) Offer the Best Path Forward

The learnings from both CUDA and the OpenCL standard set the stage for the emergence of a truly popular and open solution for data parallelism based on C++ for heterogeneous systems. That solution is SYCL*, which is a higher-level programming model to improve programming productivity on multiple hardware accelerators. It has quickly gained broad multivendor support, widespread interest, and the support of multiple serious compiler projects.

SYCL is important because effective programming in this increasingly heterogeneous world requires that performant access for all XPUs is offered. Only a truly open approach can provide that.

SYCL is an open standard for single-source C++ data-parallel programming of heterogeneous hardware, or XPUs. SYCL allows single-source compilation in C++ to target multiple devices on a system, rather than using C++ for the host and domain-specific kernel languages for the devices.

SYCL brings to C++ both kernel-style programming and a mechanism to locate, query, and use accelerators in a system. Kernel-based programming is an important programming style for harnessing data parallelism that was also supported in OpenCL and CUDA. An ability to enumerate and access accelerators, in a standard way, was previously introduced by OpenCL.

Also take a look at Data Parallel C++ (DPC++), which provides an open implementation to the LLVM community, with ambitions to upstream everything into LLVM C++ compilers. DPC++ aims to implement SYCL with some extensions. DPC++ pioneered many features that are now in SYCL 2020, and therefore had a head start in implementing much of SYCL 2020 even before the standard was complete. Work remains to complete alignment with the entire SYCL 2020 specification; all the work is easy to observe in the very active open-source repository. DPC++ is used by Intel to target Intel® CPUs, GPUs, and FPGAs. DPC++ is also used by Codeplay* to target NVIDIA* GPUs. Another SYCL compiler, hipSYCL, supports AMD* CPUs and GPUs by connecting with AMD ROCm*. Having multiple open-source compilers for SYCL is fantastic for the community, and it demonstrates that SYCL has broad, diverse, and open support.

Over the course of 2019 and 2020, the author worked with a dedicated small team to create the first book about SYCL and DPC++. You can download a free copy from Apress*. Shortly after its publication, the Khronos Group* announced the finalized specification for SYCL 2020.

The recent ratification of the SYCL 2020 specification is a significant milestone. It is truly an open specification with a bright future. The specification is the product of years of specification development by many dedicated individuals from around the industry. Based on C++17, SYCL 2020 enables easier acceleration of standard C++ applications and drives a closer alignment with the ISO C++ roadmap. In their SYCL 2020 announcement, the Khronos Group highlighted a number of SYCL 2020 features including support for Unified Shared Memory (USM), built-in reductions, extensive use of CTAD, and atomic operations that align with standard C++ atomics.

 

XPUs Are the Future, Let’s Keep It Open for the Benefits of XPU Diversity and Programming Sanity

SYCL and DPC++ helps you make effective use of XPUs. They are part of a broader push for support of XPUs that extends into libraries and all software development tools, building on the ambitions of SYCL and its compilers. That is the origin of the oneAPI industry initiative, which the author is passionate and excited about being a part of. The support for this whole topic—of easing the challenges of using all XPUs openly—is driving interest in SYCL and oneAPI. A solid example is the use of the oneAPI Deep Neural Network Library (oneDNN), initially highly optimized for Intel® processors, which accelerates the world’s fastest computer (with ARM* processors). As a result, oneDNN has strong ARM support now, too. The openness of SYCL and oneAPI libraries and tools are helping usher in a new era for openness and performance to give us useful programming access to all XPUs.

Together, the software developer community has an opportunity to create standards, including SYCL, that serve the whole industry, and strongly support the adoption of heterogeneous programming (XPUs) and modern C++ as it embraces parallelism.

SYCL offers an open standard with broad support, lots of ability to participate, multiple open-source implementations, and seemingly infinite possibilities. DPC++ provides an open LLVM-based compiler to reduce the effort to support SYCL and encourage strong compatibility across XPUs. oneAPI offers a forum to discuss and drive open and performant access for XPUs into all aspects of software development.

Take the opportunity to get educated about SYCL, DPC++, and oneAPI because XPUs are the future of compute. Let's shape support for XPUs together, in the open, and enjoy the benefits of the enormous diversity in XPUs available for us to program effectively.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.