GPU Porting & Optimization of a Lattice Boltzmann Flow Solver

Computational fluid dynamic (CFD) solvers are computationally expensive because:

  • They require higher grid resolutions to capture flow physics accurately
  • Parallelization is necessary to reduce simulation time

The performance of a lattice Boltzmann flow solver was optimized on an Intel® Xeon® CPU and resulted in a 6x performance gain. Analysis tools like Intel® Advisor and Intel® VTune™ Profiler were used to guide the optimizations. The code was ported to Intel® oneAPI using an offload based on OpenMP* to enable running on Intel® GPUs that run on Xe Architecture. The same analysis tools were used for guiding GPU porting and optimizations.

Significant performance gains were achieved by optimizing data allocations and data movements between the host and the device. Multiple schemes with varying degrees of offload were also tested. The performance on GPUs with Xe Architecture was better than Intel® Xeon® platforms for medium- to large-size grids. The effect of further data transfer optimizations, GPU tile scaling, and process scaling is also presented.



Dr. Dhiraj V. Patil is an associate professor at the Indian Institute of Technology (IIT)—Dharwad. He also worked at Indian Institute of Technology—Mandi, The City University of New York (CUNY), University of Edinburgh, and guest faculty at Karlsruhe Institute of Technology (KIT) in Germany.

Dhiraj is skilled in numerical simulation, C++, and fluid dynamics. His PhD on CFD using the lattice Boltzmann method is from the Aerospace Engineering at the Indian Institute of Science, Bangalore.