Contemporary Amperex Technology Co., Ltd. (CATL), a prominent Chinese company as well as a global leader in new energy research and development and manufacturing, has been striving to improve productivity under its evolving strategy for digital and smart manufacturing. In the implementation of this strategy, its Manufacturing Execution System (MES) based on SAP HANA, a high-performance in-memory database, plays a key role in real-time data collection and quality control on its production lines. Increasing pressure on production capacity imposed new challenges on the system’s processing latency and reliability, so CATL needed a more efficient data processing and storage solution to improve the performance of its MES infrastructure.
Thanks to its in-depth collaboration with Intel, CATL was able to tackle these challenges by introducing the latest Intel® Xeon® Scalable platform based on the 2nd Gen Intel® Xeon® Scalable processor. The platform provides both a processor that boosts computing performance and Intel® Optane™ persistent memory featuring a revolutionary memory and storage architecture, near-DRAM performance and data persistence as well as optimal cost and capacity. With it, CATL built a new foundation for its MES’ core in-memory database. This not only eliminated performance bottlenecks the database encountered when massive temporary files quickly interoperated with disk drives during highly concurrent I/O operations, but also greatly shortened the time for reboot or primary-standby switch, ensuring that CATL’s production lines operated at high efficiency with minimal downtime, helping the company to further expand its production capacity, and finally strengthening CATL’s leadership in the energy and smart manufacturing sectors.
MES serves as the ‘central nervous system’ of our production lines for smart manufacturing. Its performance improvements are definitely crucial for higher productivity and output. The introduction of Intel Optane persistent memory helped eliminate a number of performance bottlenecks for MES. It is just like building stronger ‘synapses’ in the ‘central nervous system’ of our smart manufacturing, and lays an even more solid foundation for our increasing production and business operations in the future.” —Lai Tengfei, manager, Processes, IT Architecture and Solutions Department Contemporary Amperex Technology Co., Ltd.
Dual Models in New Energy and Smart Manufacturing
As a leading enterprise in the new energy and smart manufacturing sectors worldwide, CATL has received wide recognition for its high-performance green transportation and clean energy solutions based on its advanced battery technology. In 2018 alone, CATL was enlisted in “2018 Forbes China 50 Most Innovative Companies” and was also featured in a CCTV documentary series named Da Guo Zhong Qi 2: Neng Yuan Pian (“The Pillars of a Great Power II: Energy”).
Such impressive achievements were made possible by CATL’s key strengths such as its strong R&D and production capabilities, as well as the powerful information system developed specifically for smart manufacturing. Take the company’s key cell production line as an example, it enables automation, informationization, and intelligentization throughout the manufacturing process with a set of top-down systems and platforms including ERP, MES, process control and management, and a cell production and control network, as shown in Figure 1. MES, in particular, is closely related to the data from over 3,000 quality and safety control factors in dozens of manufacturing processes on the cell production line. It quickly and accurately collects real-time data from the production line for control, management, and analysis. This in turns enables management capabilities over production, quality control, batches, and traceability.
Figure 1: The smart digital manufacturing system supporting CATL’s cell production lines.
To ensure MES had sufficient real-time data processing capability, CATL started introducing a variety of Online Transaction Processing (OLTP) databases, especially the in-memory database solution, several years ago. As CATL’s new energy products gained further market penetration, the rapid growth of its business put an even greater demand on production capacity. The requirement for high-performance and low-latency data processing became increasingly urgent as well. In this circumstance, with analysis and estimate, CATL drew the conclusion that “production capacity must be increased by another 50% within a short period of time to meet the market demand.”1 Such a large increase in production capacity meant even narrower window left by each production session for the IT system. Test data showed that CATL’s IT systems had just 3 seconds to complete all operations including determining production flows and checking product quality, and that only a window of 100ms could be set aside for the back-end database operations.2
How to enhance performance, or reduce operational latency, is not the only challenge CATL’s core database was facing, reliability is another concern. Because the system has to process a huge flood of data—there are 100 to 200 million new records added to each data table and a total of 1 billion records added to all tables every day, and more than 10 billion calculations are carried out at the same time.2 Requirements for ultra-low processing latency, massive amounts of computation, pressure on storage capacity, and the need to ensure the integrity of critical data imposed strict challenges on the performance and reliability of the MES infrastructure. CATL needed more advanced data processing and storage solutions that were more compatible with the nature and requirements of in-memory databases.
Once these requirements and challenges were determined, CATL and Intel entered into a series of in-depth technological collaborations aimed at turning Intel’s advanced products and technologies into a viable solution. Taking into consideration both computation and storage, CATL eventually chose the Intel Xeon Scalable platform based on the 2nd Gen Intel Xeon Scalable processor with Intel Optane persistent memory. The Intel® Xeon® Platinum 8280 processor, the foundation of the platform, has a clock speed of 2.70 GHz, a 38.5 MB cache, 28 cores, 56 threads, and 3 Ultra Path Interconnect (UPI) links to provide more suitable and more powerful parallel computing capabilities for the in-memory database. The processor also supports up to 6 high-speed memory channels for communication with memory subsystems such as DRAM and Intel Optane persistent memory, enabling a breakthrough in performance for CATL’s MES based on the in-memory database.
Intel Optane Persistent Memory Breaks the MES Storage Bottleneck
There is no doubt that CATL’s new platform is a powerhouse in terms of performance and functionality, but how does it actually work? To answer this question, we must start by looking at SAP HANA, the high-performance in-memory database that CATL uses for its MES. The main bottleneck for any improvements to the performance of the in-memory database is the efficiency of conversion operations—while the data is usually stored in columns in the in-memory database, row-based storage is more conducive to improve efficiency when the MES implements the OLTP analysis. So, a variety of interoperations must be performed between the in-memory and on-disk databases for row/column conversion, log synchronization, and other tasks. All the in-memory database and on-disk databases involved must be put on “hold” while these operations are being carried out. In other words, if these “write temp data to disk” operations can be performed more quickly, they’ll have less impact on the system and make greater contributions to performance improvement.
To avoid the loss of data from power outages and other incidents, CATL’s MES also writes all data from the in-memory database to the on-disk database every 5 minutes with the Save Point function. In addition, strict cell production specifications and system recovery requirements mean that the massive amounts of logs generated on production lines are also written to the on-disk database in real time.
Figure 2: Shortening the “disk write” time helps improve overall system performance.
For CATL’s MES, all these operations must be prioritized and kept consistent, so some operations inevitably conflict with each other. For example, if the system is implementing Data Merge operations, the Save Point function must wait in the queue. As such, it is difficult to improve the overall system performance. The best way to avoid this is to speed up system “disk write” times as shown in Figure 2.
The SAP HANA in-memory database used by CATL’s MES is based on an all-DRAM memory architecture, while the on-disk database is usually deployed on NAND-based NVMe or SAS high-speed solid-state drives (SSDs). The storage latency of the latter is generally a thousand times higher than that of the former. So, when data is transferred between the in-memory database and the on-disk database, improvements in MES performance are held back due to the I/O performance gap between DRAM and SSDs. To fill the gap, CATL needed to find a storage product with near DRAM performance for its MES.
The Intel Optane persistent memory built with 3D XPoint™ memory media is a revolution in the memory and storage architecture. It ticked all of the boxes for CATL. With the unique media technology, it is much faster than the conventional NAND SSDs. And the advanced system memory controller, interface hardware, and software technology make it outperform in low latency, high I/O, and high stability. Depending on application scenarios, it can be used in Memory Mode to provide the volatile memory with larger capacity and lower cost, or in App Direct Mode to set up a memory pool with larger capacity and data persistence.
Figure 3: Access latency of different storage devices.3
As indicated by Figure 3, the access latency and other key performance indicators of Intel Optane persistent memory are close to DRAM. Its capacity and data persistence—or non-volatility—outclasses conventional DRAM. This means that when CATL’s MES uses Intel Optane persistent memory in App Direct Mode as its disk storage media, the “disk write” times needed for executing Data Merge, Save Point, and log functionality is greatly reduced, enhancing overall MES system performance.
The data persistence that Intel Optane persistent memory offers also significantly shortens MES reboot times. Before each MES update, terabytes of data must be written from the in-memory database to the disk drive. Once the update is completed, all of the data must be written from the disk back to the memory. Before the introduction of Intel Optane persistent memory, the entire process could take tens of minutes, during which production was halted. Since the introduction of Intel Optane persistent memory, the process now takes just 5-10 minutes.2 Production lines can now complete MES updates during a routine shift.
To improve system reliability, CATL also built a comprehensive active-standby switch mechanism for its MES to safeguard against production interruptions due to power outages, downtime, and other incidents. In the past, the active-standby switch could take tens of minutes. But now, with Intel Optane persistent memory in Memory Mode as its scalable memory, the standby host enables more I/O enhanced workloads since its memory capacity becomes greater, thereby accelerating switchovers significantly.
Near Real-Life Production Testing: Significant Performance Improvement with the New Solution
To validate the performance gains the new Intel Xeon Scalable platform, especially Intel Optane persistent memory, brought to its in-memory database, CATL, with Intel’s support, tested and evaluated the performance of this new product in scenarios such as Data Merge, Save Point, and system reboots. The results provide real and effective data for CATL to deploy its MES architecture and to choose products for its ongoing business growth in the future.
To set up test scenarios closer to the real-world production, CATL went to great lengths to design test processes and pressure models based on the existing production lines. For the simulated cell production test, 2 front-end application servers were deployed with 20 concurrent end devices each. A latency of 100ms was set between concurrent processes and database compound operations such as create, drop, or alter were simulated.
Testing in different scenarios allowed both parties to fully evaluate the performance of Intel Optane persistent memory running on MES based on in-memory database. Take the Data Merge test as an example, comparison of the normalized results, as shown in Figure 4, indicates that the Intel Optane persistent memory is up to 6.2X faster than the conventional SAS high-speed SSDs.4 This means MES disk write times are shortened, reducing greatly system holds due to conflicting operations.
Figure 4: Comparison of normalized results for Data Merge performance tests.
Conclusion and Outlook
Collaboration over the latest Intel Xeon Scalable platform and the Intel Optane persistent memory, in particular, got CATL and Intel’s long-term strategic cooperation on smart manufacturing off to a good start. This innovative memory product proves that it can help CATL’s MES overcome its previous bottlenecks caused by inadequate disk drive performance and boost CATL’s production capacity through innovations in IT infrastructure.
CATL is now planning to introduce Intel Optane persistent memory to its high-performance offline analysis system as well. Thus, it is able to generate production reports with all relevant information from massive historical data in just a few seconds, enabling more efficient decision and analysis. CATL is also looking at how it can utilize the Intel Xeon Scalable platform with built-in AI acceleration to implement AI applications to defect inspection and other scenarios. Through this, CATL hopes to inject more “smart DNA” to manufacturing along with digitization and automation.
CATL’s Solution Benefits:
- Intel Optane persistent memory in App Direct Mode helps CATL’s MES increase its overall system performance by effectively reducing the “disk write” times for executing Data Merge, Save Point, and Log operations between the in-memory database and persistent memory
- In App Direct Mode, Intel Optane persistent memory leverages its data persistence to accelerate reboots of MES based on the in-memory database and minimize production line downtime
- In Memory Mode, Intel Optane persistent memory serves as scalable memory for the standby host of MES’ in-memory database, reducing switching time between MES active and standby hosts and ensuring high system reliability