Success Story: Tuning and Analysis Utilities (TAU) Integrates oneAPI to Support Argonne National Laboratory’s Aurora Supercomputer

TAU with oneAPI will enable code optimization for the new Exascale supercomputer to push computational science beyond current boundaries.

Challenge

One of the most important developments in modern High Performance Computing (HPC) is the Aurora supercomputer (Figure 1) at Argonne National Laboratory. Aurora is being built on the latest generation of Intel® technologies—Intel® Xeon® Scalable processors and Intel® Iris® Xe graphics. When deployed, Aurora will be one of America’s first exascale supercomputers, converging traditional HPC with artificial intelligence (AI) and data analytics. Writing and optimizing scalable and efficient code at exascale—or any scale—is difficult and time consuming. Fortunately, the Tuning and Analysis Utilities (TAU) has been available to HPC developers to help them optimize their codes. Additionally, the new oneAPI standard will help HPC developers write applications for multiple architectures using a single programming model. But, to support optimization of software based on oneAPI, TAU must be integrated with the oneAPI software stack.

Artist rendering of Aurora

Figure 1. Artist rendering of Aurora (Argonne Leadership Computing Facility)

Solution

HPC developers need to balance competing objectives: achieve optimal multi-architecture performance from code, and make the code portable, maintainable, scalable, and power-efficient. To help developers optimize their code, the TAU toolset (Figure 2) provides tools for the following:

  • Instrumentation, measurement, analysis, and visualization
  • Portable performance profiling and tracing
  • Performance data management and data mining

TAU helps developers see exactly how their code is performing during every operation—irrespective of the hardware environment and programming model. In fact, TAU can be used on the same code across system architectures to evaluate how an application runs on different platforms.

 

TAU Architecture

Figure 2. TAU Architecture (TAU and E4S)

“TAU supports many different runtimes,” said Professor Sameer Shende (University of Oregon). Professor Shende has been working on TAU for 25 years and now leads the TAU development team. “Developers do not need to modify their code or instrument it before running TAU. They can simply launch their application with a TAU script, and TAU will report how the code is doing down to the level of statements, loops, or functions.”

TAU is part of the Extreme-Scale Scientific Software Stack (E4S). The E4S is standardized software designed to enable exascale supercomputing. It includes software packages to support convergence of traditional simulation, modeling, and visualization workloads with AI, machine learning (ML), and data analytics.

Shende’s team is working on integrating the oneAPI programming model into TAU by the time Aurora launches. That means adding support for the oneAPI Level Zero specification and the oneAPI direct programming language, Data Parallel C++ (DPC++).

TAU Components

TAU supports multiple programming models, languages, and libraries that developers use across multiple architectures, including the following:

  • Multiple parallel programming paradigms and models—MPI*, OpenMP*, Pthreads, OpenCL™ standard, and others in any mix
  • Multiple languages and libraries—Fortran, C, C++, Berkeley Unified Parallel C (UPC), and Python*
  • Various types of tracing and metrics—direct instrumentation, standard and user-defined program events, and more
  • Profiling and data management with databases and visualization utilities

The oneAPI programming model will allow TAU to extend its capabilities for developers building or modifying code based on oneAPI for Intel® architectures.

Integrating oneAPI

“I’ve been working on integrating oneAPI into TAU for about a year,” added Shende. “Since the specification is new and still in development, integrating the libraries and other support into TAU is an ongoing process.”

As with any development of new hardware and new software, independent software vendors (ISVs) and third-party developers, like Shende, work closely with the manufacturer (Intel in this case). Collaboration with Intel allows Shende to keep his code base as up to date with oneAPI development as possible, learn about capabilities in new hardware and software releases, and provide feedback to the Intel team. That level of collaboration is needed for efficient and effective integration of oneAPI into TAU.

“As a collaborator with Intel, we have access to early prototypes of GPUs. We also work with the Intel® DevCloud, where the latest hardware and oneAPI developer tools are available.”

“While TAU is a tool for any architecture, Intel and oneAPI are important to our development for several reasons besides the architecture on which Aurora is based,” explained Shende. “Intel has some very mature compilers and runtime systems. In particular, the Intel oneAPI oneAPI DPC++/C++ Compiler is one of only two compilers that support the OpenMP Tools (OPT) interface. Plus, Intel has many interfaces in their runtime to support the capabilities TAU needs, such as OpenCL and MPI profiling and, of course, Level Zero.”

Additionally, Shende points out, with oneAPI being Intel’s main approach to development going forward, it has garnered support from other organizations, like Codeplay*, to support hardware from other vendors, such as NVIDIA GPUs, to oneAPI.

“We see our partnership with Intel and their support for our research as being key to our success,” concluded Shende.

Enabling Technologies

Software:

  • Intel® oneAPI Base Toolkit
  • Intel® oneAPI HPC Toolkit
  • Intel® Math Kernel Library
  • Intel® Distribution for Python*
  • Intel® oneAPI DPC++/C++, C, and Fortran compilers
  • Intel® MPI Library

Hardware:

  • Intel® Core™ i7-1185G7 processor
  • Integrated Intel® Iris® Xe graphics
  • Intel DevCloud with 2nd generation Intel® Xeon® processors

Resources and Recommendations

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.