UCloud Speeds Up Packet Processing Capacity by 5x1 with Intel® Technologies

Key Takeways

  • Gaming, e-commerce and retail industries demand a huge amount of high-frequency, small-packet transmission.

  • Cloud service providers (CSPs) able to meet this demand can capitalize on significant commercial opportunities in this sector.

  • UCloud sped up the packet processing capacity of its Net-Enhanced UHost platform by 5x1 using 2nd Generation Intel® Xeon® Scalable processors, Intel® SSDs, and 25GbE Intel® Ethernet.

author-image

By

Delivering cloud services to industries such as gaming, e-commerce, and retail can be demanding. These types of businesses typically require a huge amount of high-frequency, small-packet transmission to meet their own customers’ expectations. Meeting this particular demand of these consumer-oriented businesses can be tough, and puts a strain on network I/O.

One of the leading cloud computing companies in China, UCloud, delivers cloud services to a number of consumer services businesses, providing infrastructure, platform, artificial intelligence, and big data platforms. The company wanted to improve its packet processing capacity as part of the release of its new Net-Enhanced UHost solution, to offer a faster, more competitive product to its customer base.

We chose Next-Generation Intel® Xeon® Scalable processors because they can provide us with high clock frequency and strong computing performance, allowing us to innovate around the needs of our customers. Our new Net-Enhanced UHost can effectively solve the peak demands of customers in the e-commerce and gaming industries.

By basing its Net-Enhanced UHost solution on 2nd Generation Intel® Xeon® Scalable processors, UCloud has been able to launch it with a packet processing capacity of up to 5 million packets per second (pps), five times faster than the previous generation UHost.1 In addition, UCloud has updated its infrastructure using Intel® SSDs with up to 24,000 IOPS and 25GbE Intel® Ethernet, which helps UCloud meet the surge demands of peak network traffic.

More about 2nd Generation Intel® Xeon® Scalable Processors

The new 2nd Generation Intel® Xeon® Scalable processors provide the foundation for a powerful data-centric solution that creates an evolutionary leap in agility and scalability. Disruptive by design, this innovative processor sets a new level of platform convergence and capabilities across compute, storage, memory, network, and security. Enterprises and cloud and communications service providers can now drive forward their most ambitious digital initiatives with a feature-rich, highly versatile platform.

  • Up to 30x improvement in inference performance on Intel® Xeon® Platinum 9282 processor (56 cores) with Intel® Deep Learning Boost (Intel® DL Boost) for ResNet-50 (image classification workload) using Intel® Optimization for Caffe* vs. Intel® Xeon® Platinum 8180 processor at launch2
  • Up to 2x system memory capacity and support up to 36TB on 8-socket systems with Intel® Optane™ DC persistent memory3
  • Up to 2x average generational gains on 2-socket servers with new Intel® Xeon® Platinum 9200 processor4
  • Up to 1.33x average generational gains on Intel® Xeon® Gold processor5

Notices and Disclaimers

Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. // Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. // Intel does not control or audit third-party benchmark data or the web sites referenced in this document. // Performance results are based on testing as of the date set forth in the configurations and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. // Intel does not control or audit third-party data. You should review this content, consult other sources, and confirm whether referenced data are accurate. // Intel® technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No product or component can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com. // Intel, the Intel logo, Xeon, and Optane are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. // *Other names and brands may be claimed as the property of others. // © Intel Corporation.

Product and Performance Information

1

The results were provided by UCloud and were based on its internal tests. For more information, please contact UCloud.

2

30x inference throughput improvement on Intel® Xeon® Platinum 9282 processor with Intel® Deep Learning Boost (Intel® DL Boost): Tested by Intel as of 2/26/2019. Platform: Dragon rock 2 socket Intel® Xeon® Platinum 9282 processor (56 cores per socket), HT ON, turbo ON, Total Memory 768 GB (24 slots/ 32 GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0241.112020180249, CentOS* 7 Kernel 3.10.0-957.5.1.el7.x86_64, Deep Learning Framework: Intel® Optimization for Caffe* version: https://github.com/intel/caffe d554cbf1, ICC 2019.2.187, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a), model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, No datalayer synthetic Data: 3x224x224, 56 instance/2 socket, Datatype: INT8 vs. Tested by Intel as of July 11, 2017: 2S Intel® Xeon® Platinum 8180 processor CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS* Linux release 7.3.1611 (Core), Linux* kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800 GB, 2.5in SATA 6Gb/s, 25nm, MLC). Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, synthetic dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50). Intel® C++ Compiler ver. 17.0.2 20170213, Intel® Math Kernel Library (Intel® MKL) small libraries version 2018.0.20170425. Caffe run with “numactl -l”.

3

2x system memory capacity determined by 50% of the memory channels populated with Intel® Optane™ DC persistent memory using products that are add up to twice the max capacity of all of the DRAM capacity. Example for 8S system that provides 96 memory slots: 36TB capacity = 48 slots populated with 512GB modules of Intel® Optane™ DC persistent memory, and 48 slots populated with 256GB DRAM DIMMs.

4

2x Average Generational Gains: On 2-socket servers with 2nd Gen Intel® Xeon® Platinum 9200 processor. Geomean of est SPECrate2017_int_base, est SPECrate2017_fp_base, STREAM-Triad, Intel® Distribution for LINPACK* Benchmark, server-side Java*. Platinum 92xx vs. Platinum 8180. Baseline: 1-node, 2x Intel® Xeon® Platinum 8180 processor on Wolf Pass with 384 GB (12 X 32GB 2666) total memory, ucode 0x200004D on RHEL7.6, 3.10.0-957.el7.x86_64, IC19u1, AVX512, HT on all (off Stream, LINPACK), Turbo on all (off Stream, LINPACK), result: Est int throughput=307, est fp throughput=251, STREAM-Triad=204, LINPACK=3238, server-side Java=165724, test by Intel on 1/29/2019. New configuration: 1-node, 2x Intel® Xeon® Platinum 9282 processor on Walker Pass with 768 GB (24x 32GB 2933) total memory, ucode 0x400000A on RHEL7.6, 3.10.0-957.el7.x86_64, IC19u1, AVX512, HT on all (off Stream, LINPACK), Turbo on all (off Stream, LINPACK), result: Est int throughput=635, est fp throughput=526, STREAM-Triad=407, LINPACK=6411, server-side Java=332913, test by Intel on 2/16/2019.

5

Up to 33% Average Generational Gains (1.33x) on Intel® Xeon® Gold processor Mainstream CPUs: Geomean of est SPECrate2017_int_base, est SPECrate2017_fp_base, STREAM-Triad, Intel® Distribution for LINPACK* Benchmark, server-side Java*. Gold 5218 vs Gold 5118. Baseline: 1-node, 2x Intel® Xeon® Gold 5118 processor on Wolf Pass with 384 GB (12 X 32GB 2666 (2400)) total memory, ucode 0x200004D on RHEL7.6, 3.10.0-957.el7.x86_64, IC18u2, AVX2, HT on all (off Stream, LINPACK), Turbo on, result: Est int throughput=119, est fp throughput=134, STREAM-Triad=148.6, LINPACK=822, server-side Java=67434, test by Intel on 11/12/2018. New configuration: 1-node, 2x Intel® Xeon® Gold 5218 processor on Wolf Pass with 384 GB (12 X 32GB 2933 (2666)) total memory, ucode 0x4000013 on RHEL7.6, 3.10.0-957.el7.x86_64, IC18u2, AVX2, HT on all (off Stream, LINPACK), Turbo on, result: Est int throughput=162, est fp throughput=172, STREAM-Triad=185, LINPACK=1088, server-side java=98333, test by Intel on 12/7/2018.