New LAMMPS* Release Shows Improved Performance1
Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS)* code for the new Intel® Xeon Phi™ processor and the Intel® Xeon Phi™ coprocessor is now available as part of the current LAMMPS downloads. It includes support for simulation of solid-state materials (metals, semiconductors), soft matter (biomolecules, polymers), and coarse-grained or mesoscopic systems. LAMMPS runs on single processors or in parallel using message-passing techniques with a spatial decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.
The LAMMPS optimizations are designed to provide significant performance improvements on Intel® Xeon® processors, Intel® Xeon Phi™ processors, and Intel® Xeon Phi™ coprocessors, with less energy consumed by the Intel® Xeon Phi™ processor2. Current optimizations in the Intel® Package Manager for LAMMPS address the following: data alignment, support for mixed and single-precision modes in addition to double, single instruction, multiple data (SIMD) directives to allow compiler vectorization for routines with data dependencies, modifications to better support compiler vectorization for the Intel® Xeon Phi™ processor and coprocessor, Intel® Advanced Vector Extensions (Intel® AVX), Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), neighbor-list padding to prevent execution of vector remainder code, and offload directives to manage data allocation, transfer, and concurrent computation on the coprocessor.
LAMMPS* — Coarse-Grain Water Simulation Example
The Intel® Package Manager for LAMMPS* includes optimized routines that can run on both Intel® Xeon® processors and Intel® Xeon Phi™ processors and coprocessors. In Figure 1, the simulation rate is shown on different processors for a coarse-grain water simulation with LAMMPS using Stillinger-Weber potential. Coarse-grain water simulations are used for molecular modeling of molecules at various granularity levels. LAMMPS simulation performance improvement with the Intel® Xeon Phi™ processor is shown compared to the Intel® Xeon® processor E5-2697 v4 and to the NVIDIA Tesla K80*, with up to 1.43X performance increase and up to 1.75X performance-per-watt compared to the Intel® Xeon® processor.
LAMMPS* - Workloads Benefit from Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
The Intel® Xeon Phi™ processor with Intel® Advanced Vector Extensions 512 (Intel® AVX-512) offers many benefits over Intel® Advanced Vector Extensions 2 (Intel® AVX2) in addition to the increased vector width. This includes new instructions to make best use of the processor for MD and other software and also full support for masking with fault-suppression that benefits vector performance of complex code with nested loops or branches. Figure 2 shows improvements, up to 2.29X, for Atomic Fluid, Protein, Liquid Crystal, Silicon, and Coarse-Grain Water benchmarks.
A suite of programs that allow users to carry out molecular dynamics simulations.
A versatile package used to perform molenano scale.
A parallel, object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems.
An integrated suite of open-source computer codes for electronic-structure calculations and materials modeling at the nano scale.
A computational chemistry software package that includes quantum chemical and molecular dynamics functionality.
|System Overview||Intel® Xeon® processor E5-2697 v4||Intel® Xeon Phi™ processor 7250||NVIDIA Tesla K80*:|
|Platform||Super Micro* SuperServer 1028GR-TR|
|Motherboard||Wildcat Pass Motherboard, BMC 1.33.9832, FRU/SDR Package 1.09||Adams Pass Motherboard, BMC 12.951, FRU/SDR Package 1.1||Super Micro* X10DRG-H Motherboard, CSE-118GHTS-R1K66BP FRU|
|Processor Information||Dual Socket Intel® Xeon® processor E5-2697 v4 2.3 GHz, 18 Cores/Socket, 36 Cores, 72 Threads (HT and turbo on)||Intel® Xeon Phi™ processor 7250, 68 core, 272 threads, 1400 MHz core freq. (turbo on), 1700 MHz uncore freq., MCDRAM 16 GB 7.2 GT/s||Dual Socket Intel® Xeon® processor E5-2697 v4 2.3 GHz, 18 Cores/Socket, 36 Cores, 72 Threads (HT and turbo on),|
|Memory Configuration||DDR4 128GB, 2400 MHz||DDR4 96GB 2400 MHz, MCDRAM flat memory mode||DDR4 128GB, 2400 MHz|
|BIOS Configuration||BIOS 09D10||Bios Version 2.0a|
|HDD Specs||1.0 TB SATA Western Digital* 1003FZEX-00MK2A0 System Disk||1.0 TB SATA drive Western Digital* 1003FZEX-00MK2A0 System Disk||500GB SATA Seagate* ST9500423AS System Disk|
|Operating System||Red Hat 6.7||Red Hat 6.7 (Santiago), quad cluster mode||Red Hat 7.2|
|Graphics Card||NVIDIA Tesla* K80 GPU, NVIDIA CUDA* 7.5.17 (Driver: 352.39), ECC enabled, persistence mode enabled|
|Power Consumption||448W mean power consumption for LAMMPS water simulation||378W mean power consumption for LAMMPS water simulation||608W mean power consumption for LAMMPS water simulation|
|Other Hardware / Software||Scalability tests performed on nodes with Intel® Omni-Path Host Fabric Interface Adapter 100 Series 1 Port PCIe* x16||Scalability tests performed on nodes with Intel® Omni-Path Host Fabric Interface Adapter 100 Series 1 Port PCIe x16||Number of MPI tasks on host varied to give best performance. CUDA MPS* used where possible|
Product and Performance Information
Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system.
Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/benchmarks.