CMCC Speeds Climate Change Modeling

CMCC improves its supercomputer architecture with Intel® Xeon® CPU Max series with HBM on existing clusters.

At a glance:

  • The Euro-Mediterranean Centre on Climate Change (CMCC) is a non-profit international research center that collaborates with experienced scientists, economists, and technicians to provide full analyses of climate impacts on socio-economic systems.

  • CMCC needed a new supercomputing cluster to keep up with the demands of its research into climate change. Intel ran performance tests of CMCC’s Nucleus for European Modeling of the Ocean (NEMO) workloads and demonstrated that the Intel® Xeon® CPU Max 9480 processor with HBM would deliver up to 3.47x the performance of CMCC’s old Zeus cluster.1

author-image

By

CMCC needed a new supercomputing cluster to keep up with the demands of its research into climate change. Intel ran performance tests of CMCC’s ocean modeling workloads and demonstrated that the Intel® Xeon® CPU Max 9480 processor with high-bandwidth memory (HBM) would deliver up to 3.47x the performance of CMCC’s old Zeus cluster.1 CMCC is now in the process of deploying a new cluster called Cassandra based on this processor.

Challenge

Speed up time to results for complex climate modeling workloads, including Nucleus for European Modelling of the Ocean (NEMO). Implement an architecture that improves performance for memory-bound applications. Improve modeling resolution up to 7 km. Cut electricity costs in the data center, which reached €600,000 in 2023 for the data center where the old cluster Zeus was located.

Solution

CMCC is implementing a new cluster called Cassandra, based on Intel Xeon CPU Max 9480 processors with high-bandwidth memory (HBM). The Lenovo servers use direct liquid cooling, which is more energy-efficient than the air cooling used for Zeus.

Results

Running single-node tests using CMCC’s NEMO workloads, the Intel Xeon CPU Max 9480 processor with high-bandwidth memory (HBM) delivered 3.36x and 3.47x the performance of CMCC’s newest existing cluster, Juno.1

The Intel Xeon CPU Max 9480 processor with HBM was 2.84x faster than the Juno architecture on a 32-node test of the larger NEMO GLOB16 workload.1

Predicting Climate Change

“Global warming will increase the likelihood of extreme weather events, droughts, and floods,” said Giovanni Aloisio, strategic advisor at the Euro-Mediterranean Centre on Climate Change (CMCC) and former director of the CMCC Supercomputing Center. “We are studying how to project the temperature increase in 50 to 100 years to inform our governments.”

The simulations produced by CMCC, along with work from other international climate science institutions, are used by the Intergovernmental Panel on Climate Change (IPCC). It publishes reports every two years with advice on mitigation and adaptation measures governments should consider.

CMCC has eleven research divisions, which are organized into three institutes: the Institute for Earth System Prediction (IESP), the Institute for Climate Resilience (ICR), and the European Institute on Economics and the Environment (EIEE).

The researchers at these institutes share the supercomputing center. Today, they use two generations of supercomputers (see Table 1): Zeus (established in 2019) and Juno (2022). The plan was for Zeus to be decommissioned when Juno was available. Still, there were huge challenges in migrating models and their huge volumes of data between the machines and their locations.

Table 1. CMCC’s existing supercomputing architectures of Zeus and Juno compared with the new Cassandra architecture.

As a result, Zeus is still being used by 275 researchers. “Modeling the climate system is a complex problem,” said Aloisio. “There are many interacting processes to model and a great range of timescales and geographic scales to analyze. It requires sophisticated mathematics and computational resources. We also need to manage large volumes of data produced by the simulation.”

The 1-year Mediterranean simulation, for example, produces 288 GB of output.

CMCC’s research work includes using coupled models, which combine models from different domains. For example, the terrestrial vegetation model could be coupled with the atmospheric model, which is coupled with the sea ice and oceanic model.

One of the applications CMCC runs is Nucleus for European Modelling of the Ocean, better known as NEMO. It has three engines for modeling ocean thermodynamics, sea ice, and biogeochemical processes.

“NEMO, like many climate models, is an example of memory-bound code,” said Francesca Mele, who leads the NEMO group at the CMCC Advanced Computing Division. “That means it spends most of its execution time accessing memory to do the computation.”

The time it takes for models to run varies from minutes to days, depending on the configuration of the simulation. Increasing the memory bandwidth would allow CMCC to get results faster and enable more research to be conducted using the supercomputing resources.

Additionally, CMCC wanted to be able to improve the accuracy of its modeling. “We need to go to a higher resolution,” said Aloisio. “If we use a resolution of 1 degree of the Earth, that’s about 100km. We want to increase the resolution to 1/16 of a degree, which is about 7 km. When we increase the resolution, the complexity of the model increases, and we need more computation resources.”

The time to results for several models is highly dependent on the number of cores available for processing, so increasing the core count will shorten the time to results. CMCC was also seeing increasing demand with the number of users growing and the number of core hours they needed for their applications also increasing.

CMCC was looking for a new supercomputer architecture that would enable it to improve the speed and accuracy of its climate predictions.

Solution Details

CMCC is implementing a new cluster, Cassandra (see Table 1), based on Intel Xeon CPU Max. These are the only x86 CPUs with high-bandwidth memory (HBM). HBM can be used without code changes for workloads of up to 64 GB or for caching DDR5 memory. HBM can also be combined with DDR5 memory for workloads that require large memory capacity. In that case, code changes may be needed to optimize performance.

“HBM will enable us to increase memory use in our models,” said Aloisio. “We will start by using the cache mode without any code changes, and we will look at testing the HBM-only mode, which also works without code changes.”

The new cluster, which will replace Zeus, uses processors with a higher core count of 56, delivering 20,160 cores across the cluster. The memory per node will increase from 96 GB on Zeus to 1024 GB on Cassandra. “Having more memory per node, well balanced with the CPU speed, enables us to increase the resolution of our models,” said Mele.

The Juno cluster has a limited number of GPUs, which are continuously at full utilization. “We’re investigating the use of machine learning for our predictive model,” said Aloisio. “In the future, we expect to more fully exploit the accelerators built into the Intel Xeon processors, such as Intel® Advanced Vector Extensions 512 and Intel® Advanced Matrix Extensions.”

The new cluster includes an investment in 13.6 petabytes of usable storage capacity to accommodate the data generated by the simulations.

Using a new 10 gigabyte internet connection, CMCC can now move data quickly from the old data center where Zeus is to the data center where Juno is, and Cassandra will be. The transfer rate has increased from about 5 TB per day to 55 TB per day. Zeus will be decommissioned, and the old data center will be closed when Cassandra is operational. Zeus’s users will run their models on Cassandra. Juno will continue to operate.

Improving Sustainability

The new cluster is based on Lenovo liquid-cooled servers, which are more energy-efficient than air-cooled servers. These are manufactured in Hungary, which reduces transportation emissions and delivery time.

“Considering the mission of CMCC, sustainability is hugely important to us,” said Osvaldo Marra, director of the CMCC Supercomputing Center. “In 2023, our electricity costs for the Zeus data center were €600,000, and €450,000 for the new data center where Juno is based. Cassandra will be water-cooled, so we will save money in our operations compared to Zeus.”

Aloisio added: “Having the system manufactured in Europe is also important for us. We were impressed when we visited the facility in Budapest.”

Intel is a Strategic Ally

Intel worked closely with CMCC on running tests using CMCC’s own models. “Intel was very helpful in benchmarking,” said Aloisio. “The team was able to confirm that NEMO is memory bound and demonstrate the results of using the Intel Xeon CPU Max 9480 processor with DDR5 memory. In the past, we’ve only asked for standard benchmarks, but this time, we needed to have benchmarks based on NEMO, a real use case. It gave us a real indication of the performance we could achieve using the processor.”

CMCC is also working with Intel to explore quantum computing for climate modeling, and Aloisio is in the process of setting up a quantum computing group at CMCC.

“We don’t want to see Intel as just a vendor, but more as a partner for developing our performance,” said Aloisio. “It’s a strategic partnership for us.”

Test Results

Intel ran tests using NEMO workloads. These compared the performance of the Intel® Xeon® Platinum 8360Y processors used in the newer existing cluster, Juno, with the Intel Xeon CPU Max 9480 processor with high-bandwidth memory (HBM) and the Intel® Xeon® Platinum 8480+ processor.

While the Intel Xeon Platinum 8480+ processor delivered 1.61x and 1.77x the performance of the Juno architecture, the Intel Xeon CPU Max 9480 processor with HBM delivered 3.36x and 3.47x in the single-node tests conducted (see Figure 1). These tests used two generic NEMO workloads.

Figure 1. Comparing the single-node performance of three Intel® Xeon® processors for CMCC’s NEMO v4.2 workloads.

We also tested the larger NEMO GLOB16 model on a 32-node cluster, comparing the same processors. The Intel Xeon CPU Max 9480 processor with HBM was 2.84x faster than the Juno architecture.1

CMCC selected the Intel Xeon CPU Max 9480 processor with HBM for Cassandra.

Table 2. Test results for NEMO GLOB16 on a 32-node cluster.1

Spotlight on CMCC

The Euro-Mediterranean Centre on Climate Change (CMCC) is a non-profit international research center that collaborates with experienced scientists, economists, and technicians to provide full analyses of climate impacts on socio-economic systems. It is Italy’s national focal point for the Intergovernmental Panel on Climate Change (IPCC). CMCC’s consortium members and institutional partners include six universities, the National Institute of Geophysics and Volcanology, and the American non-profit Resources for the Future.

CMCC’s mission is to investigate and model our climate system and its interactions with society to provide reliable, rigorous, and timely scientific results. These, in turn, aim to stimulate sustainable growth, protect the environment, and develop science-driven adaptation and mitigation policies for the changing climate.

Solution Summary

  • Intel® Xeon® CPU Max 9480 processor. The new Cassandra cluster comprises 360 processors with high-bandwidth memory (HBM) for bandwidth-constrained workloads.
  • Lenovo ThinkSystem SD650 V3 Neptune DWC servers. The Lenovo servers use direct water cooling to enable CMCC to achieve high performance with a lower cooling cost.

 

Download the PDF ›