Technical decision-makers know that properly configuring a server’s memory is a more nuanced procedure than just dropping in a few DDR sticks. Optimal performance comes from understanding the memory and processor specifications, then carefully configuring memory modules of the right size, speed, and slot locations. This article will explain how to configure both DRAM and Intel® Optane™ DC persistent memory for best performance.
Some Baseline Information
• Intel® Optane™ DC persistent memory is supported on second-generation Intel® Xeon® Scalable processors, specifically the Platinum and Gold versions.
• Each Intel® Xeon® Scalable processor has two integrated memory controllers. Each memory controller has three memory channels, and each channel can support up to two memory slots. This results in a maximum of 12 memory slots per CPU. Each slot is compatible with DDR4 DRAM DIMMs or Intel® Optane™ DC persistent memory modules.
• Intel® Optane™ DC persistent memory works in conjunction with DDR4 DRAM, so both types of memory will be present in the system. DDR4 DRAM read latency is less than 100-nanoseconds. Intel® Optane™ DC persistent memory idle read latency is about 350-nanoseconds. For comparison, the world’s highest performance SSD, the Intel® Optane™ SSD DC P4800X, has an idle read latency of about 10,000-nanoseconds.
• Optimizing performance of a system with Intel® Optane™ DC persistent memory depends on a number of factors, most of which are under the control of the IT organization. These include Operational Mode, DRAM Ratio, Slot Configuration, Processor Core Count, and Workload Behavior.
• Customers should evaluate use of Intel® Optane™ DC persistent memory in light of their target usage model, capacity needs, and TCO. There may be situations where required memory capacity is below 384 gigabytes-per-CPU, and neither data persistence nor App Direct control is a priority. In such cases, populating with high-volume, mainstream DRAM DIMMs may be the better overall choice.
Operational Mode Determines Memory Access Behavior
In App Direct Mode, the application is explicitly aware of the two memory types in the system, DRAM and Intel® Optane™ DC persistent memory. Developers can use App Direct mode to direct read/write operations to the best memory resource for optimal performance and capabilities. The most latencysensitive operations can be directed to DRAM, and operations that are less latency-sensitive, or involve very large data structures or data that needs to be made persistent, are directed to the Intel® Optane™ DC persistent memory.
Memory Mode does not require the application to use any persistent memory programming, instead perceiving only a large pool of volatile memory, just like a server today. Despite the non-volatile nature of the Intel® Optane™ memory on the modules, the data in Memory Mode is not persistent, and is discarded upon a power cycle. App Direct Mode is required to enable persistent data.
Select the Right DRAM Ratio
DRAM and Intel® Optane™ DC persistent memory work together in both App Direct and Memory modes. In Memory Mode, the DRAM in the system is not an independent memory resource, but instead is managed by the processor as a data cache for the Intel® Optane™ DC persistent memory. If data requested by the processor is stored in the DRAM cache (a cache-hit), the response latency is identical to an all-DRAM system. The latency is somewhat longer when the requested data is not in the DRAM (a cache-miss) and must be retrieved from the Intel® Optane™ DC persistent memory. The goal is to maximize the number of cache-hits for best performance.
One way to increase the probability of a cache-hit in Memory Mode is to increase the amount of DRAM in the system, but since DRAM tends to be more expensive per-gigabyte than Intel® Optane™ DC persistent memory, an overly-large DRAM cache will decrease the overall TCO benefits. For Memory Mode operation, Intel recommends a baseline ratio of one-gigabyte of DRAM for every eight-gigabytes of Intel® Optane™ DC persistent memory. Individual workload testing and field trials may indicate an adjustment in the DRAM cache, but start with this recommended ratio. For applications written to use App Direct Mode, consult the software’s documentation or contact the vendor regarding their recommended ratio of DRAM and Intel® Optane™ DC persistent memory.
Configuring the Memory Slots
High memory bandwidth via optimal placement of memory in the slots is key to processor performance. Intel® Xeon® Scalable processor-based platforms generally come in four major configurations, designated by the number of memory slots on each memory controller’s three channels (Figure 1).
For maximum memory bandwidth in 2-2-2, 2-2-1, and 2-1-1 configurations, a DDR4 DRAM DIMM should be placed on each of the memory channels (Figure 2). This ensures that no matter which memory channel an operation is directed to, a lowest-latency DRAM memory resource is available for either an App Direct-specified operation or Memory Mode cache-hit.
1-1-1 configurations will likely have lower overall bandwidth than 2-2-2, 2-2-1, or 2-1-1 configurations since one of the channels must be populated solely with Intel® Optane™ DC persistent memory instead of DRAM (Figure 3). The potential bandwidth limitations may be mitigated with tailored memory management in software using App Direct Mode, but the lower overall bandwidth is likely to have a dampening effect on performance in Memory Mode.
Consider Processor Core Count Effect on Memory Demands
Higher processor core counts enable more application threads, containers, or virtual machines to run on the server, but each active core increases the load on the memory sub-system. This is true whether or not Intel® Optane™ DC persistent memory is deployed. With Intel® Xeon® Scalable processors with more than 20 cores, a proper DRAM-to-Intel® Optane™ memory ratio and full memory bandwidth is critical to keep the cores fed with data in a timely manner. Intel strongly recommends fully-populated 2-2-2, 2-2-1, or 2-1-1 configurations when paired with processors with 20 cores or more. In addition to Intel’s guidance, seek processor recommendations from software vendors that have enabled App Direct Mode.
Workload Dynamics May Affect Cache-Hit Rate
Most workloads, such as databases, application servers, or content delivery, have a degree of predictability. For each piece of data in-use, a cluster of adjacent data can be anticipated. In Memory Mode, the Intel® Xeon® Scalable processor predicts the needed data and stores it in the DRAM cache. The more predictable the workload, the more efficient the caching algorithm, and the greater the cache-hit rate, which results in higher application performance. A fraction of applications are unpredictable and have little determinism, making accurate caching difficult. This increases the cachemiss rate and can reduce overall application performance compared to an all-DRAM configuration. Less-predictable workloads include certain OLAP benchmarks and specific configurations, which are bandwidth heavy, highly random, and non-localized, making any caching implementation challenging. For App Direct Mode, processor-directed caching is less of an issue, since the software vendor manages memory operations and targets read/write operations to the optimal memory type.
With proper configuration and mode usage, platforms equipped with Intel® Optane™ DC persistent memory are targeted to deliver comparable performance to all-DRAMbased configurations of equivalent size on data-intensive workloads, and do so with TCO savings. Intel recommends the “Best” configuration for most scenarios to deliver the highest overall performance across the widest variety of applications (Figure 4). The “Better” configurations will provide excellent bandwidth with DRAM on every channel, but have lower overall capacity and bandwidth to the Intel® Optane™ DC storage modules. The “Good” configuration is best deployed where the software is enabled with App Direct mode and exercises a great deal of control over memory, and should be used with processors with 20 cores or fewer.