Article ID: 000057368 Content Type: Maintenance & Performance Last Reviewed: 11/14/2023

What Is RAID Write Hole (RWH) Protection in Intel® Virtual RAID on CPU (Intel® VROC)?

BUILT IN - ARTICLE INTRO SECOND COMPONENT
Summary

Brief explanation of the Intel® VROC feature RAID Write Hole (RWH) protection

Description

Unable to find what the RAID Write Hole protection is used for in RAID 5 volumes.

Resolution

Intel® Virtual RAID on CPU (Intel® VROC) can protect RAID 5 data even when both unexpected power loss and RAID volume degradation occur at the same time. This double fault condition is, at times, referred to as RAID Write Hole (RWH). Many RAID solutions have dealt with this challenge by acquiring a backup power unit. Intel® VROC addresses this problem by using a journaling drive that can preserve the partial parity and reduce the potential data loss issue.

There are two available modes of Intel® VROC RAID Write Hole protection:

  1. Distributed: The RAID Write Hole journal is stored on RAID member drives and there is no need for any additional drivers. This mode provides full protection against the RAID Write Hole but introduces a performance penalty for write-intensive workloads.
  2. Journaling Drive: The RAID Write Hole journal is stored on a separate journaling drive. That drive cannot be used for any other purpose. The performance penalty for write-intensive workloads depends on the performance of the journaling drive, but typically the penalty is lower compared to the distributed mode.
Considerations when enabling Intel® VROC RWH protection
  • The journaling drive needs to be at least as big as the smallest drive in the RAID volume (it will dictate the maximum size of the RAID volume).
  • For RWH protection, we recommend working in distributed mode (where the journaling information is stored across all three RAID members) and not journaling drive mode (where journaling information is stored on a single drive).
  • Journaling is different than parity. Journaling happens first, even before the data is committed to the RAID 5 volume, and once the RAID 5 volume is ready, the journal is no longer needed.
  • The reason the journaling drive should be at least as big as the smallest drive member in the RAID 5 volume is due to endurance considerations. While journaling does not actually require so much space, it does need to be written over again and again, requiring good endurance/capacity, and is what could make distributed journaling more effective.
  • Distributed journaling utilizes the Power Loss Imminent (PLI) functioning of the drives, which is in turn using memory on the drive to store data instead of NAND, making endurance not a factor or concern anymore for journaling.
  • The other option is to use an Intel® Optane™ SSD as a dedicated journaling device, with much lesser capacity drive with much higher endurance (as a dedicated journaling solution).
Additional information

RAID Write Hole (RWH) is a fault scenario, related to parity-based RAID. It occurs when a power-failure/crash and a drive-failure (for example: strip write or complete drive crash) occur at the same time or very close to each other. Unfortunately, these system crashes and disk failures are correlated events. This can lead to silent data corruption or irrecoverable data due to lack of atomicity of write operations across member disks in parity-based RAID volumes. Due to the lack of atomicity, the parity of an active stripe during a power-fail may be incorrect and inconsistent with the rest of the strip data; data on such inconsistent stripes does not have the desired protection, and worse, can lead to incorrect corrections (silent data errors).