Experiences with Adding SYCL Support to GROMACS
GROMACS is an open-source, high-performance molecular dynamics package primarily used for biomolecular simulations, accounting for approximately five percent of high-performance computing (HPC) use worldwide. Due to the extreme computing needs of molecular dynamics, significant efforts are invested in improving the performance and scalability of simulations. Target hardware ranges from supercomputers to laptops of individual researchers and volunteers of distributed computing projects, such as Foldingathome. The code has been designed both for portability and performance by explicitly adapting algorithms to SIMD and data-parallel processors. A SIMD intrinsic abstraction layer provides high CPU performance. Explicit GPU acceleration has long used CUDA* to target NVIDIA* devices and OpenCL™ for AMD and Intel devices.
In this talk, we discuss the experiences and challenges of adding support for the SYCL* platform into the established GROMACS codebase, and share experiences and considerations in porting and optimization. While OpenCL offers the benefits of using the same code to target different hardware, it suffers from several drawbacks that add significant development friction. Its separate-source model leads to code duplication and makes changes complicated. The need to use C99 for kernels, while the rest of the codebase uses C++17, exacerbates these issues. Another problem is that OpenCL, while supported by most GPU vendors, is never the main framework and thus is not getting the primary support or tuning efforts. SYCL alleviates many of these issues, employing a single-source model based on the modern C++ standard. In addition to being the primary platform for Intel GPUs, the possibility to target AMD and NVIDIA GPUs through other implementations (for example, hipSYCL) might make it possible to reduce the number of separate GPU ports that have to be maintained.
Some design differences from OpenCL, such as flow-directed acyclic graphs (DAGs) instead of in-order queues, made it necessary to reconsider the GROMACS’ task scheduling approach and architectural choices in the GPU backend. Additionally, supporting multiple GPU platforms presents a challenge of balancing performance (low-level and hardware-specific code) and maintainability (more generalization and code reuse). We discuss the limitations of the existing codebase and interoperability layers with regards to adding the new platform, the compute performance and latency comparisons, code quality considerations, and the issues we encountered with SYCL implementations tested. Finally, we discuss our goals for the next release cycle for the SYCL backend and the overall architecture of GPU acceleration code in GROMACS.
Erik Lindahl received a PhD from the KTH Royal Institute of Technology in 2001 and performed post-doctoral research at Groningen University, Stanford University, and the Pasteur Institute. He is a professor of Biophysics at Stockholm University, with a second appointment as professor of Theoretical Biophysics at the Royal Institute of Technology. Erik's research is focused on understanding the molecular mechanisms of membrane proteins, in particular ion channels, through a combination of molecular simulations and experimental work involving cryo-EM and electrophysiology. He has authored 130 scientific publications and is the recipient of an ERC starting grant. He heads the international GROMACS molecular simulation project, which is one of the leading scientific codes to exploit parallelism on all levels from accelerators and assembly code to supercomputers and distributed computing. He is codirector of the Swedish e-Science Research Center (SeRC) as well as the Swedish National Bioinformatics Infrastructure, and lead scientist of the BioExcel Center of Excellence for Computational Biomolecular Research. His research work has earned the Prix Jeune Chercheur Blaise Pascal award, the Sven and Ebba-Christian Högberg prize, and the Wallenberg Consortium North prize.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.