Executive Summary
Shanghai Jiao Tong University is one of the most prestigious universities in China. The university’s High Performance Computing (HPC) center installed its last supercomputer in 2013. For computational researchers from the organization’s 60,000 students and 6,000 faculty, the resource lacks capacity to effectively support ongoing work. The University turned to Inspur* for their next-generation cluster built on 2nd Generation Intel® Xeon® Scalable processors and Cornelis Networks products.
Challenge
Established in 1896, Shanghai Jiao Tong University is one of the oldest universities in China. Its 28 departments and 15 hospitals educate 30,000 undergraduate and 30,000 graduate students in a wide range of disciplines. Many of those departments require supercomputing resources for discovery and insight in materials, astrophysics, aeronautics, computational genomics, and other traditional sciences. Over the last few years, research has expanded in new areas, including big data and machine learning.
The university’s existing HPC resource, named π, was built in 2013. It is a heterogeneous, 260-teraFLOPS cluster built on Intel® Xeon® processors E5 and NVIDIA* GPUs with InfiniBand* Architecture interconnect.
“Things have changed a lot over the last six years,” stated Dr. James Lin, Vice Director of the HPC Center. “As research at the university has addressed ever more complex and deeper problems and included new fields in machine learning and big data, more students need computing cycles. The queues on π for researchers’ jobs have gotten longer and longer, delaying important research work.”
Most supercomputers on the Top500 list are built on Intel® architecture (IA). And we have a lot of experience with Intel® architecture, including modernizing codes from GPUs to IA. So, we chose next-generation Intel® Xeon® Scalable processors for our new cluster.” — Dr. James Lin, vice director of the HPC Center
Besides not having enough capacity in π for current work, researchers want to take advantage of more scalable codes that can run their jobs faster on more processor cores. In 2018, the university contacted OEM Inspur to build a new, approximately two-petaFLOPS homogeneous system named π 2.0.
Solution
“We support research that runs commercial applications, open source codes for traditional CFD modeling and other science, and in-house high-scalability codes,” explained Stephen Wang, head of Technical Support. “We provide researchers with help in optimizing and porting their codes for scalability on parallel systems.”
π 2 will be a 658-node system with two-socket Inspur servers running 2nd Gen Intel® Xeon® Gold 6248 processors with 20 cores each, totaling 26,320 compute cores with around 2 PetaFlops peak performance. It will be the 3rd largest supercomputer among China’s universities. The compute nodes will be connected by Intel OPA fabric and supported by a Lustre* scalable, parallel file system using Intel® SSD Data Center series drives for NVMe*.
“Most supercomputers on the Top500 list are built on Intel® architecture (IA),” added James. “And we have a lot of experience with Intel® architecture, including modernizing codes from GPUs to IA. So, we chose next-generation Intel® Xeon® Scalable processors for our new cluster.”
Selecting Cornelis Networks for the interconnect was a little more involved.
“We visited the two leading HPC centers in Japan: The Joint Center for Advanced High Performance Computing (JCAHPC) by the University of Tokyo and the University of Tsukuba, and Global Scientific Information and Computing Center (GSIC) of Tokyo Institute of Technology,” explained James. “JCAHPC hosts Oakforest-PACS, a very large supercomputer with Cornelis Networks products and the largest Cornelis Networks deployment in the world. Tsubame3 GSIC is also a very large cluster that uses Cornelis Networks. We chose Cornelis Networks based on our research with those centers and visits in China to other Cornelis Networks customers.”
Power was a critical concern for the HPC center. With 26,320 cores, π 2.0 will be seven to eight times larger than π.
“We are required to support a power usage effectiveness (PUE) of 1.3,” commented Stephen. “With the more efficient technology of the latest Intel Xeon Scalable processors, π 2.0 power demand will be only between two and three times that of π even though the system is nearly eight times larger.”
A key area of concern was the Lustre file system. With π and an increasing number of computational genomics jobs (as many as 1000 at a time), Lustre was becoming a bottleneck because genomics runs many small jobs. π 2.0’s Lustre files system will include Intel SSD Data Center series drives to accelerate I/O across the storage cluster.
Results
While the system is still in the construction phase, researchers are well aware of the new capacity and technologies available on π 2.0. Stephen’s technical support department fields questions from users about scaling their codes.
“We are focused on developing scalable codes,” said Stephen, “offering help with methodologies, such as OpenMP* and MPI. We also have interns who will actually help researchers port their codes. Since we have experience with modernizing GPU codes, we can help them target their machine learning applications for the 2nd Gen Intel® Xeon® Scalable processors with features, such as Vector Neural Network Instructions (VNNI).”
The new supercomputer’s first customers will be the biggest users at the university, running their inhouse high-scalability codes optimized for 2nd Gen Intel® Xeon® Scalable processors. Other early projects will include machine learning jobs.
“Users are very excited about getting access to the system,” added James.
The new π 2.0 supercomputer at Shanghai Jiao Tong University will support research for commercial applications, open source codes for traditional CFD modeling and other science, and in-house high-scalability codes.
Solution Summary
- Inspur-built system with 658 nodes of 2nd Gen Intel® Xeon® Gold 6248 processors
- 26,320 total compute cores (52,640 threads)
- Intel® SSD DC Series for NVMe* for fast-response Lustre parallel file system
- Cornelis Networks for fast communications across compute nodes
- Frameworks for machine learning, including TensorFlow* and Pytorch* using Intel® Optimizations for TensorFlow and Intel® Distribution of Python*