CFHall is a plasma simulation code based on a particle-in-cell (PIC) method. PIC codes combine the finite difference approach with "superparticles" for particle distribution modeling. Applications of PIC codes range from space weather modeling to the design of medical devices. Performance optimization in this context means more accurate physics and better engineering.
New methods of performance optimization in PIC codes are shared in a webinar given by Dr. Anastasia Perpelkina of the Keldysh Institute of Advanced Mathematics.
Dr. Perepelkina shares the programming practices that she uses in her work on CFHall:
- Roofline analysis to identify the best optimization route
- Template C++ classes to hide the complexity of vectorization
- Data structures conducive to portable vectorization
She also discusses memory traffic optimization based on the LRnLA (locally-recursive, non-locally asynchronous) algorithms.
- Dependency graph decomposition
- Diamond and "torre" shapes of domain decomposition
- LRnLA algorithm ConeFold
The non-trivial methods discussed in the talk pay off tremendously. With the help of ConeFold, the code jumps from DDR to L2 cache bandwidth. With ConeTorre tuning, the arithmetic intensity moves the algorithm by as much as an order of magnitude up the roofline diagram. The talk also makes a connection between the algorithms and the high-bandwidth memory (MCDRAM) on Intel® Xeon Phi™ processors.
Dr. Perepelkina's webinar is the 3rd of the Modern Code Contributed Talks ("MC² Series"). Webinar registration, slides, and video recording are available at colfaxresearch.com/mc2-003