Choose Amazon EC2 m6i Instances Featuring 3rd Gen Intel® Xeon® Scalable Processors to Increase HPC Performance

STREAM

  • See up to 79% higher STREAM Triad throughput with 8-vCPU m6i instances vs. m6a instances.

  • See up to 2.75x the STREAM Triad throughput with 16-vCPU m6i instances vs. m6a instances.

author-image

By

Get Up to 2.75 Times the STREAM Triad Performance with Amazon EC2 m6i Instances Compared to m6a Instances with AMD EPYC Processors

As the global market continues to grow in complexity, more and more organizations are collecting and analyzing data to inform their business decisions. The right data, analyzed correctly, can help businesses find and resolve issues, plan for the future, and grow their customer bases. Analyzing large datasets requires a great deal of computing power, however, and especially for high-performance computing (HPC), it’s critical to choose the right cloud instances. Memory is a particularly urgent concern, with high speed and throughput being vital for applications that cache data in memory.

To help companies find instances with the memory throughput they need, Intel tested the sustained memory throughput of two sets of Amazon Web Services (AWS) Elastic Compute Cloud (EC2) instances:

  • m6i instances with 3rd Gen Intel® Xeon® Scalable processors.
  • m6a instances with AMD EPYC processors.

Using the STREAM Triad benchmark, which records the memory throughput of each instance in MB/s, Intel tested both instance families at 8 vCPUs and 16 vCPUs. Delivering up to 2.75x the memory throughput, the m6i instances significantly out-performed the m6a options, thereby answering the question of which may be better for memory-intensive HPC workloads.

Increased STREAM Triad Throughput for Smaller Instances

As Figure 1 shows, the 8-vCPU m6i instance, enabled by 3rd Gen Intel® Xeon® Scalable processors, outperformed its AMD processor-enabled counterpart by 79%. If your workloads are modestly sized, these 8-vCPU m6i instances are a great option for maximizing performance.

Figure 1. Relative STREAM Triad performance of the 8-vCPU m6i instance vs. the 8-vCPU m6a instance. Higher numbers are better.

Increased STREAM Triad Throughput for Medium-Size Instances

For larger workloads, Intel tested the same instance types at the 16-vCPU size. Again, we see that the m6i instance, enabled by 3rd Gen Intel® Xeon® Scalable processors, performed significantly better than the m6a instance featuring AMD processors. This time, though, the memory throughput of the m6i instance was 2.75 times that of the m6a instance (see Figure 2).

Figure 2. Relative STREAM Triad performance of the 16-vCPU m6i instance vs. the 16-vCPU m6a instance. Higher numbers are better.

Conclusion

For workloads that can benefit from high levels of sustained memory throughput, such as HPC workloads, companies must invest in cloud instances that are up to the task. This testing makes it clear that the 3rd Gen Intel® Xeon® Scalable processor-backed m6i instances from AWS offer superior memory bandwidth performance compared to m6a instances with AMD processors. Especially if your organization faces continued growth in workload sizes, consider the AWS EC2 m6i instances when seeking a cloud solution for your HPC workloads.

Learn More

To get started running your HPC workloads on Amazon EC2 m6i instances featuring 3rd Gen Intel® Xeon® Scalable processors, go to https://aws.amazon.com/ec2/instance-types/m6i/.

Testing done by Intel in Jan. 2022. All configurations used Ubuntu 20.04.3 LTS kernel 5.11.0-1022-aws on AWS us-west=2, with STREAM v5.10, ICC 2021.2 compiler, and set the OMP_NUM_THREADS equal to # of vCPUs. AMD configs used complier flags: -mcmodel medium -shared-intel -O3 -march=core-avx2 -DSTREAM_ARRAY_SIZE=268435456 -DNTIMES=100 -DOFFSET=0 -qopenmp -qopt-streaming-stores always -qopt-zmm-usage=high. Intel configs used compiler flags: -O3 -qopt-streaming-stores=always -qopt-zmm-usage=high -xCORE-AVX512 -qopenmp -mcmodel=large -DSTREAM_ARRAY_SIZE=268435456. m6a.2xlarge: AMD EPYC 7R13, 8 cores, 32GB RAM, up to 12.5 Gbps network BW, up to 6.6 Gbps storage BW; m6i.2xlarge: Intel Xeon Platinum 8375C, 8 cores, 32 GB RAM, up to 10 Gbps network BW, up to 12.5 Gbps storage BW; m6a.4xlarge: AMD EPYC 7R13, 16 cores, 64GB RAM, up to 12.5 Gbps network BW, up to 6.6 Gbps storage BW; m6i.4xlarge: Intel Xeon Platinum 8375C, 16 cores, 64GB RAM up to 10 Gbps network BW, up to 12.5 Gbps storage BW.