Boost HPC Performance with Amazon EC2 m6i Instances Featuring 3rd Gen Intel® Xeon® Scalable Processors

STREAM

  • Achieve up to 53% higher STREAM Triad throughput with 8-vCPU m6i instances vs. m5 instances.

  • Achieve up to 38% higher STREAM Triad throughput with 16-vCPU m6i instances vs. m5 instances.

author-image

By

Amazon EC2 m6i Instances Offered up to 53% Better STREAM Triad Performance Than m5 Instances with 2nd Gen Intel® Xeon® Scalable Processors

The problems facing businesses today are more complex than ever, from shifting competitive landscapes to supply chain challenges and beyond. Technology solutions cannot solve all of those issues—but with an increasing amount of data to work with, they can certainly help. The more complex the computational problem, however, the more resources are necessary to solve it. One critical resource is memory: Because many applications cache data in memory to improve performance, fast memory with high throughput is critical. If your organization has chosen to run your high-performance computing (HPC) workloads in the cloud, how do you choose the right instance to maximize performance?

The STREAM Triad benchmark, commonly used in HPC benchmarking, measures sustainable memory bandwidth and delivers throughput results in MB/s. Intel ran this benchmark on two sets of Amazon Web Services (AWS) Elastic Compute Cloud (EC2) instances:

  • m6i instances with 3rd Gen Intel® Xeon® Scalable processors
  • m5 instances with 2nd Gen Intel® Xeon® Scalable processors

To highlight the performance differences that multiple types of organizations might see, Intel tested both instances with 8 vCPUs and then again with 16 vCPUs. The m6i instances offered significantly better throughput at both instance sizes, outperforming the m5 instances by up to 53%. If companies are debating an investment in newer EC2 instances, these results indicate that opting for the m6i instances would be a smart decision.

Higher STREAM Triad Throughput for Smaller Instances

Figure 1 shows the results of STREAM Triad performance testing on 8-vCPU instances. At this smaller instance size, ideal for companies seeking to support more modest workloads, the m6i instance enabled by 3rd Gen Intel® Xeon® Scalable processors offered 53% better performance than the m5 instance.

Figure 1. Relative STREAM Triad performance of the 8-vCPU m6i instance vs. the 8-vCPU m5 instance. Higher numbers are better.

Higher STREAM Triad Throughput for Medium-Size Instances

Organizations that require more processing power may select medium-size instances, such as those with 16 vCPUs. At this size, the m6i instance with new 3rd Gen Intel® Xeon® Scalable processors offered 38% higher STREAM Triad throughput than the m5 instance with previous-generation processors (see Figure 2).

Figure 2. Relative STREAM Triad performance of the 16-vCPU m6i instance vs. the 16-vCPU m5 instance. Higher numbers are better.

Conclusion

HPC workloads require a great deal of power to be effective, and with a multitude of options in the cloud, it can be difficult for organizations to select options that meet their needs. These STREAM Triad benchmarking results indicate that companies could see higher performance by selecting Amazon EC2 m6i instances enabled by 3rd Gen Intel® Xeon® Scalable processors instead of m5 instances with older 2nd Gen Intel® Xeon® Scalable processors.

Learn More

To get started running your HPC workloads on Amazon EC2 m6i instances featuring 3rd Gen Intel® Xeon® Scalable processors, go to https://aws.amazon.com/ec2/instance-types/m6i/.

Testing done by Intel in Jan. 2022. All configurations used Ubuntu 20.04.3 LTS kernel 5.11.0-1022-aws on AWS us-west=2, with STREAM v5.10, ICC 2021.2 compiler, and set the OMP_NUM_THREADS equal to # of vCPUs. Compiler flags: -O3 -qopt-streaming-stores=always -qopt-zmm-usage=high -xCORE-AVX512 -qopenmp -mcmodel=large -DSTREAM_ARRAY_SIZE=268435456. m5.2xlarge: Intel Xeon Platinum 8259CL, 8 cores, 32GB RAM, up to 10 Gbps network BW, up to 4.75 Gbps storage BW; m6i.2xlarge: Intel Xeon Platinum 8375C, 8 cores, 32GB RAM, up to 10 Gbps network BW, up to 12.5 Gbps storage BW; m5.4xlarge: Intel Xeon Platinum 8259CL, 16 cores, 64GB RAM, up to 10 Gbps network BW, up to 4.75 Gbps storage BW; m6i.4xlarge: Intel Xeon Platinum 8375C, 16 cores, 64GB RAM, up to 10 Gbps network BW, up to 12.5 Gbps storage BW.