As you may know, I’m an old-school HPC guy―not by choice, but out of necessity. High-performance computing (HPC) has a double meaning, depending on who you’re talking to. On the one hand, it simply means improving application performance in the generic sense. Some of my friends on the compiler team see a 5% speedup as HPC. And, in their world, they’re right. In my world, HPC refers to computing on a grand scale―harnessing thousands of cores to get orders of magnitude speedups. (Think TOP500.)
That’s why I’m understandably excited about the announcement last month of the Aurora* supercomputer Intel is collaborating on with Argonne National Laboratory. (See U.S. Department of Energy and Intel to Deliver First Exascale Supercomputer for the whole story.) Aurora is expected to deliver exaFLOPS performance (i.e., a quintillion, or 1018, floating-point operations per second). Exascale systems will be essential for converged workflows, as we discussed in the last issue of The Parallel Universe.
Three articles in our current issue touch on optimizations that the push to exascale demands. The Princeton Plasma Physics Laboratory is doing the type of science that will take advantage of an exascale system. Their article, Improving Performance by Vectorizing Particle-in-Cell Codes, describes how they fine-tuned one of their critical algorithms. How Effective Is Your Vectorization? shows how to take advantage of the information provided by Intel® Advisor. Boost Performance for Hybrid Applications with Multiple Endpoints in Intel® MPI Library describes enhancements that improve the scalability of applications that combine message passing and multithreading.
That’s enough about HPC. What else is in this issue? The feature article, Effectively Train and Execute Machine Learning and Deep Learning Projects on CPUs, describes the Intel® Math Kernel Library for Deep Neural Networks and how it’s used to accelerate AI frameworks. We also have two other articles that data scientists should find interesting: Parallelism in Python* Using Numba* and Boosting the Performance of Graph Analytics Workloads. The former provides practical advice on using the Numba compiler to significantly improve the performance of Python numerical kernels. The latter describes the analysis of the GAP Benchmark Suite, a common benchmark for graph analytics. Finally, we close this issue with a review of the analysis tools in Intel® System Studio: Innovate System and IoT Apps.
As always, don’t forget to check out Tech.Decoded, Intel’s knowledge hub for developers, for more on solutions for code modernization, visual computing, data center and cloud computing, data science, and systems and IoT development. And if you haven't already, be sure to subscribe to The Parallel Universe so you won't miss a thing.
Henry A. Gabb
April 2019