Supercomputers are critical for astronomical studies and exploration. Yet they require incredible levels of compute. One such HPC project is simulation exploration of the Big Bang, conducted with the supercomputer ATERUI II, which leverages the Intel® Xeon® Scalable Processor.
Similarly, Intel® has been part of Transformation Projects in the HPC community like Weather Prediction with AI & HPC, Computer-Aided Engineering for Motorcycle Design and many more. To support such projects, our HPC Reference Stack (HPCRS) provides simple deployment of the preinstalled software and leverages performance optimizations on Intel® architecture, while offering the flexibility of running on premise or in the cloud.
We are proud to announce the HPC Reference Stack v3, which combines Intel’s optimizations of both HPC and AI. As part of this release we are targeting the performance benefits of the 3rd Generation Intel® Xeon® Scalable Processors. There are few major key features as part of the version 3 release, which are the latest Intel® oneMKL, Intel® MPI Library with libfabric support and suite of Intel® Compilers including Data Parallel C++. Another key advantage of HPCRS is that it provides the support to run Intel® VTune™ Profiler, which adds the ability to perform performance profiling.
Tools and Frameworks
- Our HPC Reference Stack incorporates the latest versions of leading developer tools and frameworks to help HPC developers build novel applications using Intel optimizations for HPC and AI:
- CentOS*, an open source Linux* distribution.
- PyTorch* open source machine learning framework accelerated by Intel® oneAPI Deep Neural Network Library (oneDNN)) the path from research prototyping to production deployment.
Horovod framework for optimized distributed Deep Learning training for PyTorch.
Intel® oneAPI Math Kernel Library (oneMKL), a highly optimized math library for mathematical function performance
oneAPI Collective Communication Library (oneCCL), a Scalable & Efficient Distributed Training for Deep Neural Networks
Intel® MPI Library, is a multi-fabric message-passing library
- Intel® oneAPI Threading Building Blocks (Intel® oneTBB), is a widely used C++ library for task-based, shared memory parallel programming on the host
- Data Parallel C++ (DPC++) is a high-level language designed for data parallel programming productivity
- Omni-Path* fabric support which provides the ability to run the container in a multi-node cluster environment
- Spack*, a package management tool to support multiple versions and configurations of software.
- Runtimes: Python* application and service execution support.
Multiple layers of the HPC Reference Stack are performance-tuned for Intel® architecture, offering significant advantages leveraging 3rd Generation Intel® Xeon® Scalable Processors.
Architectural diagram of the HPC Reference Stack as below:
Performance gains for the HPC Reference Stack, single node, on 3rd Generation Intel® Xeon® Scalable Processors 8380 vs 2nd Generation Intel® Xeon® Scalable Processors 8280 as follows:
As shown in the chart below, 3rd Generation Intel® Xeon® Scalable processor 8380 clearly offers greater performance benefits when compared to the 2nd Generation Intel® Xeon® Scalable processor 8280.
Performance gains for the HPC Reference Stack, single node, on 3rd Generation Intel® Xeon® Scalable Processors 8380 as follows:
As shown in the graph below, there is a nearly zero performance penalty leveraging docker solution when compared to bare metal. Apart from performance benefits of leveraging containers, they also provide advantages of scaling the solution for HPC use-cases.
2nd Generation Intel® Xeon® Scalable Platform - Tested by Intel as of 03/22/2021. 2 socket Intel® Xeon® Platinum 8280L Processor (2.7GHz, 28 cores), HT On, Turbo On, Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.02.01.0013.121520200651 (ucode: 0x5003003), CentOS Linux 8 (Core) kernel 4.18.0-240.10.1.el8_3.x86_64, High Performance Computing Reference Stack v0.3.0-m3, Intel® oneAPI 2021.1.1, Quantum ESPRESSO v6.6 (https://github.com/QEF/q-e/releases/tag/qe-6.6), AUSURF112 benchmark (https://github.com/QEF/benchmarks/tree/master/AUSURF112 commit:058041d)
3rd Generation Intel® Xeon® Scalable Platform - Tested by Intel as of 03/22/2021. 2 socket Intel® Xeon® Platinum 8380 Processor (2.3GHz, 40 cores), HT On, Turbo On, Total Memory 512 GB (16 slots/ 32GB/ 3200 MHz), BIOS: SE5C6200.86B.3021.D40.2103160200 (ucode: 0x8d05a260), CentOS Linux 8 (Core) kernel 4.18.0-240.10.1.el8_3.x86_64, High Performance Computing Reference Stack v0.3.0-m3, Intel® oneAPI 2021.1.1, Quantum ESPRESSO v6.6 (https://github.com/QEF/q-e/releases/tag/qe-6.6), AUSURF112 benchmark (https://github.com/QEF/benchmarks/tree/master/AUSURF112 commit:058041d)
Intel is dedicated to ensure popular frameworks and topologies run best on Intel® architecture, giving you a choice in the right solution for your needs. HPCRS provides AI and HPC workload support, and we plan to continue performance optimizations for upcoming generations.
Please visit the Intel® Developer Zone page to learn more and download the HPC Reference Stack Solutions. Please provide your feedback and contribute to the project. As always, we welcome ideas for further enhancements through the stacks mailing list.
Notices and Disclaimers
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex .
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details.
No product or component can be absolutely secure.
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
Your costs and results may vary.
Intel technologies may require enabled hardware, software or service activation.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.