By Viola Cavallini and Laura Cappelli
Introduction
The High-Luminosity Large Hadron Collider (HL-LHC) is set to come online towards the end of this decade. It will require significantly increased computing resources, which means that we need to investigate ways to use the computing power available more efficiently.
Within our project, we wanted to make some of the code used at CERN more portable and executable on different hardware platforms. This functionality would enable us to generalize the existing implementation and target heterogeneous platforms, including FPGAs, Intel® GPUs, NVIDIA GPUs, and other upcoming computing devices.
The specific code we were working on is used as part of the process to reconstruct particle-collision events within the Compact Muon Solenoid (CMS), one of the two large, general-purpose experiments on the Large Hadron Collider.
Intel® oneAPI Products Used
We started by porting the CUDA code to oneAPI, as code written in CUDA can only be executed on NVIDIA GPUs. Using oneAPI is beneficial because it allows the same code models to run across various platforms and in development sandboxes such as the Intel® DevCloud.
The Intel® DPC++ Compatibility Tool has been extremely helpful with refactoring code from CUDA to oneAPI, which can already be challenging when parsing other people’s code written for CUDA.
Project Process
We started by studying oneAPI and understanding the algorithms developed for reconstructing events in the CMS Silicon Tracker.
By creating stand-alone versions of pieces of the code, the two of us divided and conquered the challenge. We used the compatibility tool to obtain a first version of the translated code. We then reviewed all the resulting code and rewrote the pieces of code that the compatibility tool didn’t translate. With that completed, we began testing the code with some known inputs until it produced the expected results and merged the stand-alone functions, recreating the original CUDA code, but parallelized with oneAPI.
Results So Far
oneAPI improves the efficiency in writing code for cross-architecture systems because you can write the algorithms once and run them on different architectures. Without this technology, the reconstruction algorithms would need to be rewritten multiple times: once for every architecture used for running them. This process can be very expensive and time-intensive with varying levels of expertise required to work on the code.
Conclusion
While we did encounter some challenges understanding the code and working with a new programming model, we learned a lot when translating the code written by someone else and adapting it to the oneAPI programming model, based on the Khronos* SYCL* standard.
Additional Resources
Project Details and Final Presentation
Intel® DevCloud
About Viola Cavallini
Viola Cavallini recently graduated with a degree in computer science at the University of Ferrara. She is now studying for a master’s degree in computer engineering at the same university. She studied data-acquisition software during her internship at the university and participated in the CERN openlab 2020 online summer internship program. She is working on an INFN project developing new data-acquisition software.
About Laura Cappelli
Laura Cappelli is a computer science student at Alma Mater Studiorum, University of Bologna. Her studies span heterogeneous systems and parallel programming models. Her current research focuses on the use of SYCL from the Khronos Group and oneAPI for performance portability, as well as their application to physics reconstruction algorithms. In 2020, Laura participated in the CERN openlab online summer internship program, and is working in collaboration with physicists from the CMS Experiment at CERN.