Cross-Vendor Heterogeneous Computing

My teammates have been busy lately experimenting with different aspects of oneAPI, all of which demonstrate its promise. Tony Mongkolsmai asked, “Do I really need to learn CUDA*, ROCm*, and SYCL* to program for NVIDIA*, AMD*, and Intel® accelerators?” Then he proceeds to answer this question by showing how to compile and run the same SYCL code on GPUs from all three vendors, even running on Intel® and Nvidia GPUs at the same time. Next up, our editor emeritus, James Reinders, introduces a collection of resources to Migrate from CUDA* to C++ with SYCL*, complete with code examples and self-guided tutorials to help free your code from the constraints of vendor-specific tools and accelerators. Finally, Guy Tamir continues to release videos for his oneAPI Basics Training Series, which cover all aspects of oneAPI, from direct programming with SYCL, to API-based programming with component libraries like the oneAPI Deep Neural Network Library (oneDNN). Guy also covers data science applications from a oneAPI perspective.

Speaking of data science, this issue is full of AI-related content, starting with our feature article: Supply Chain Optimization at Enterprise Scale. This article was coauthored by Ted Jones and Karl Eklund from Red Hat, and Karol Brejna and Piotr Grabuszynski from Intel. It describes how to leverage open-source AI technologies using the Red Hat OpenShift* Data Science platform with Intel-optimized software. This is followed by Optimizing Transformer Model Inference on Intel® Processors, which describes several optimizations to Google’s popular Transformer model for natural language processing. Next, Optimize Utility Maintenance Prediction for Better Service describes one of many practical AI reference kits developed in collaboration with Accenture. Accelerating Artificial Intelligence with Intel® End-to-End AI Optimization Kit shows how Intel-optimized software is democratizing AI.

From data science, we turn our attention to heterogeneous parallel programming using Fortran and the OpenMP* target offload API. Solving Heterogeneous Programming Challenges with Fortran and OpenMP* describes several compiler directives to offload parallel loops to an accelerator and move data between host and device memory, and then closes with a brief overview of standard Fortran DO CONCURRENT loops. Solving Linear Systems Using oneMKL and OpenMP* Target Offloading shows how to dispatch oneMKL functions to an accelerator. The latter is a follow-up to my previous article Accelerating LU Factorization Using Fortran, oneMKL, and OpenMP* in The Parallel Universe (Issue 51), but this time we go from a conceptual analysis of host-device data transfer to direct measurement and analysis of the performance advantage of minimizing data movement.

Finally, we close this issue with two oneAPI articles: Device Discovery with SYCL* and Zero in on Level Zero. In the former, John Pennycook and I show how to use the SYCL device discovery API to determine what accelerators are available in a system and to query their characteristics. The latter describes the Level Zero hardware abstraction layer that makes oneAPI so powerful.

As always, don’t forget to check out Tech.Decoded for more information on Intel solutions for code modernization, visual computing, data center and cloud computing, data science, systems and IoT development, and heterogeneous parallel programming with oneAPI.

Henry A. Gabb

April 2023

Henry A. Gabb, Senior Principal Engineer at Intel Corporation, is a longtime high-performance and parallel computing practitioner who has published numerous articles on parallel programming. He was editor/coauthor of “Developing Multithreaded Applications: A Platform Consistent Approach” and program manager of the Intel/Microsoft Universal Parallel Computing Research Centers.

LinkedIn | Twitter