Executive Summary
Telecommunications (telco) providers face a slew of industry disruptions. The data demands of 5G, streaming media, and edge computing are increasing faster than the evolving capabilities of the telecom networks that support them. And as competition between providers intensifies, customers continue to expect more from their services.
To keep pace with data speeds and meet customer expectations, AT&T, one of the world’s largest telecommunications providers, has created a new paradigm for network equipment: a software-centric model that puts customers first.
To move control of the network from hardware to software, AT&T has switched to using disaggregated, decoupled, open components stacked to create switching or routing platforms. This contrasts with legacy equipment available from single-source providers, which can be hardware-centric and closed, and which often uses proprietary technologies.
AT&T has used an open model to develop its 100/400 Gb large Ethernet multiplexer (EMUX). The AT&T team chose Intel® Virtual RAID on CPU (Intel® VROC) for RAID 1 storage in the EMUX to enhance reliability, increase performance, and lower cost.
Intel VROC Snapshot
Intel VROC, an integrated RAID solution, delivers top-tier reliability, performance, and cost-effectiveness, making it an ideal solution for modern telco and data-centric demands.
Improved reliability: Intel VROC increases system reliability by eliminating the need for a RAID host bus adapter (HBA) card.
Increased performance: Intel VROC delivers up to 165% higher performance compared to a RAID HBA card.1
Lower cost: Intel VROC provides up to 60% cost savings compared to a RAID HBA card.2
AT&T Open Network Equipment Model
AT&T contributes specifications to the Open Compute Project (OCP) to implement its new network equipment paradigm. OCP is an organization that shares designs for data center products and best practices among industry-leading technology companies. AT&T’s contribution attracts organizations interested in expanding their market reach. By building to AT&T’s open specifications, original design manufacturers (ODMs), original equipment manufacturers (OEMs), independent software vendors (ISVs), and systems integrators (SIs) can gain access to new markets and customers as AT&T deploys its next-generation network infrastructure and services.
Figure 1. AT&T has shifted to an open architecture with common hardware and flexible software.3
As the adoption of disaggregated routers and switches increases, the volume of disaggregated equipment sold scales up. This helps to drive equipment prices down. With open, flexible solutions, AT&T and its fellow telecom providers can harness innovative technologies at lower prices and sooner than if they waited for single-source, proprietary solutions to reach the market. Figure 1 illustrates how AT&T has moved from a conventional architecture with proprietary hardware and software from single providers to an open, software-based architecture with broad ecosystem support.
AT&T 100/400 Gigabit Large EMUX
The AT&T large EMUX router is a specific example of a disaggregated equipment model designed by AT&T that benefits customers with enhanced reliability, increased performance, and lower costs. As new 400 gigabit Ethernet (GbE) networks come online, network providers must multiplex edge traffic running on 100 GbE networks to 400 GbE networks. The 100/400 Gb large EMUX addresses this need (see Figure 2).
Figure 2. EMUX router system block diagram showing the role of Intel VROC and Intel® Volume Management Device (Intel® VMD).
The AT&T team, working with other industry partners, designed the EMUX to efficiently pre-aggregate traffic from 100 GbE to 400 GbE platforms. The EMUX design aggregates multiple traffic streams from broadband, mobile, and enterprise networks into a single pipeline, thus simplifying edge sites.
AT&T guided innovation and interoperability for this router by specifying design features, including the chassis, networking, CPU, storage, motherboard, power system, security, and hardware management. The design has been released as an OCP specification. When building out the EMUX for production, AT&T specified a multi-core Intel® Xeon® processor with Intel VROC for RAID 1 storage for the EMUX platform as it is compatible with the open specifications. The Intel VROC design enhances the reliability needed to support solid-state drive (SSD) failure scenarios when mission-critical applications require instantaneous drive failover. It also increases storage performance due to the direct connection of SSDs to the CPU.
What is RAID?
RAID stands for "redundant array of independent disks." RAID is a technology used in data storage to improve data fault tolerance, redundancy, and performance by combining multiple physical hard drives into a single logical unit. In storage systems, RAID helps ensure data availability and protection against drive failures.
RAID can be configured in various ways to best meet organizations’ redundancy needs. For example, AT&T uses Intel VROC as RAID 1 storage in its 100/400 Gb large EMUX router. RAID 1 consists of an exact copy of a set of data on two or more disks, referred to as “mirroring.”
Intel VROC Overview
Intel VROC is an integrated RAID solution on Intel Xeon processors that provides reliability, availability, and serviceability (RAS) for NVM Express (NVMe) storage. Intel VROC is enabled via Intel® Volume Management Device (Intel® VMD), an integrated hardware accelerator.
Legacy hardware RAID products traditionally isolate the storage subsystem behind a discrete adapter, called the RAID host bus adapter (HBA) card. A RAID HBA card controls RAID arrays and is an intermediary between the storage devices and the host. Use of an HBA card was ideal for slower storage technologies, but with faster NVMe storage, a fundamentally new RAID architecture is needed. The Intel VROC integrated RAID solution takes the robust functionality and enterprise quality of hardware RAID and combines it with the flexibility and upgradability of software RAID to address this need.
Figure 3. Intel VROC eliminates the need for an HBA card; fewer system components increase system reliability.
Roadmap
Intel has a comprehensive roadmap for Intel VROC, including upcoming support for capabilities like local key management through a Trusted Platform Module (TPM) for self-encrypting drives, out-of-band management (OOB) integration with AMI BIOS, NVMe secure erase through Unified Extensible Firmware Interface (UEFI), data RAID 0 and RAID 5 capability for VMware ESXi, OOB/Redfish enhancements, and user experience (UX) enhancements. Reach out to your Intel representative for further information.
Logging
Intel VROC uses the MDRAID subsystem to log RAID events within the Linux kernel logging and system log file. These messages can be found within most Linux distributions inside “/var/log/messages.” Any triggered events will be reported to syslog and can be monitored and filtered by the user. End users can also define their own event messages using the “/etc/mdadm.conf” file. The user-defined messages are called by the monitoring service when an alert is detected. These messages are printed to the “/tmp/vroc_alerts.log” file, and the user can develop their own method of handling these user-defined messages.
Best-in-Class Redundant Boot Solution
To improve uptime and reduce the chance of system failure, many systems use a redundant operating system (OS) image volume by using RAID 1 with two mirrored storage devices. Because pre-boot support and functionality outside the OS are required for RAID 1, hardware RAID solutions have been the only option for these server designs until recently. With Intel VROC, this functionality can now be delivered as an embedded platform feature without additional hardware. Intel VROC UEFI drivers are integrated with platform BIOS images, meaning that RAID 1 boot volumes can be created in the pre-boot environment and managed by the Intel VROC RAID stack. This allows for a cost-effective and flexible RAID 1 boot solution that can connect directly to Intel VMD domains on the CPU or the platform controller hub (PCH) with various form factors (M.2, U.2, and E1.S).
Superior Reliability
In contrast to traditional hardware-based RAID solutions, Intel VROC uses hardware and software to provide reliable, cost-effective, high-performance protection against data loss. Enterprises can use Intel VROC to help deliver consistent performance even during a fault, meaning that enterprise workloads have a far better chance of meeting service-level agreements (SLAs) in the event of drive failures.
Intel VROC provides equal, if not better, value than a hardware RAID solution by delivering several key benefits:
- Can significantly shorten rebuild times while still providing excellent performance (input/output operations per second, [IOPS]), resulting in minimal impact on service levels for demanding application workloads—a critical factor for a business’ response to inevitable drive failures. In addition, the difference in CPU utilization between Intel VROC and HBA-based systems is minimal.
- Allows NVMe system users to avoid RAID 6 performance overhead without sacrificing data reliability by utilizing a RAID 5 with hot spare configuration.
- Protects data during power loss using a patented journaling method that closes the RAID 5 write hole completely.
- Offers flexibility for OEMs, ODMs, or end users to certify NVMe drives using self-certification tools offered by Intel.
An HBA-based RAID system acts as a single point of failure and contributes to lower hardware mean time between failures (MTBF). However, Intel VROC does not require additional hardware components and, therefore, does not impact hardware MTBF. Also, HBA-based RAID cards have backup batteries that can significantly impact system hardware MTBF and add complexity and cost.
Intel VROC in the AT&T Large EMUX
The AT&T design team used Intel VROC, integrated directly into the BIOS/UEFI firmware, for RAID 1 storage within the AT&T large EMUX router and AT&T Distributed Disaggregated Chassis (DDC) products. RAID 1, which generates mirroring, brings invaluable benefits to data storage. With mirrored disks, RAID 1 ensures data redundancy, allowing the system to maintain seamless operation even if one disk fails. This provides fault tolerance and safeguards data integrity with identical copies. Intel VROC eliminates the need for an HBA card in the system, resulting in enhanced reliability, reduced supply-chain complexity, and lowered costs in terms of initial investment and ongoing maintenance. Eliminating the HBA card improves overall reliability by minimizing potential points of failure, and it contributes to increased energy efficiency and improved thermal performance.
AT&T Industry Contributions
AT&T’s open network equipment designs have brought many benefits to the telecom industry. Open network equipment benefits a large set of suppliers and helps increase their number, which attracts new telecom equipment providers into the market. Specific open network equipment benefits include:
- Scalable architectures. Open network equipment designed by AT&T brings data center scale-out architecture principles to service provider networks. This empowers service providers to build agile, resilient, and cost-effective infrastructure that can efficiently meet the evolving demands of modern network services and applications.
- Disaggregated architectures. Disaggregated network platform architectures designed by AT&T accelerate innovation by taking advantage of the decoupling of hardware and software. Service providers and enterprises can use the latest technology wherever needed by decoupling the two. They can fast-track innovation to bring new products and services to the market quickly.
- Open-source architectures. By establishing an open and free architecture for anyone in the industry, AT&T fosters collaboration and innovation, democratizes design access, and encourages sustainable and adaptable solutions for the ever-increasing needs of 5G, streaming media, and edge computing.
- Optimized lifecycle management. AT&T contributed the multi-updater tool to the developer community. This integrated tool automates firmware upgrades on white boxes. It can upgrade all firmware components, including BIOS, complex programmable logic device (CPLD), BMC, and SSD firmware. Intel VROC firmware is a part of BIOS firmware and is integrated into the multi-updater tool for ease of upgrading.
- Reduced costs. Telco operators can take advantage of economies of scale and benefit from available silicon products that provide a better final product cost structure.
AT&T Leads the Way with Open Networking
AT&T is a leader in designing and deploying open network equipment built on data center principles like disaggregation and open source architecture. The company helps the telecom industry improve operational efficiencies and reduce capital expenditure by creating white-box solutions.
The AT&T EMUX router is a specific example of a disaggregated equipment model designed by AT&T. It benefits customers with enhanced reliability, increased performance, and lower costs. The router features Intel VROC, an enterprise RAID solution for NVMe SSDs directly attached to Intel Xeon processors.4 It empowers telecom equipment with faster data access, robust reliability, and lower cost, making it an excellent choice for modernizing and optimizing telecom infrastructure.
Learn How Intel VROC Can Benefit Your Product
AT&T selected Intel VROC for its AT&T EMUX router to create high-performance RAID arrays. In contrast to hardware-only RAID solutions that rely on separate controller cards, Intel VROC uses a hardware and software approach to create a cost-effective, high-performance, manageable, and scalable solution directly integrated into Intel® Xeon® processors.