The Dell PERC (PowerEdge RAID Controller) cards provide a competitive server-based fault-tolerant storage solution. NOTE that Dell often quotes a server baseline configuration without any write cache on the RAID card – this makes the baseline config appear to be an attractive low price. DO NOT buy any system without a write cache built into the RAID controller – this will be listed as battery-backed or non-volatile or flash-backed write-cache (BBWC, NVWC, FBWC). It is also important to configure your server or storage unit WITH hot-swap drives AND redundant hot-swap power supplies. You want the raid controller to be able to indicate failure of the disk with the led lights on each disk caddy – you can easily identify the failed drive and swap it out while the storage volume is online. The same applies for power supplies (PSU’s) which are a common failed component – the server needs to be able to indicate the failed PSU with led lights and server technician must be able to swap out the failed unit while the system is running. Another benefit to redundant power suppies – ability to swap out UPS units or migrate to new PDU, etc.
Back to the RAID discussion – your write cache ON-CONTROLLER is crucial to the write performance of your fault-tolerant storage system. The OS will be able to complete IO operations (IOPS) after write has completed to the raid controller cache – this allows the raid controller to continue writing to the disk while the OS returns to other non-disk tasks. If the system loses power, the flash-backed or battery-backed cache saves the unwritten data and completes the write when the power is restored to the disks.
UNFORTUNATELY there is a huge problem with MANY of the PERC / MegaRAID implementations in the field that is a BIG RISK for catastrophic DATA LOSS. The issue is with the unclear description of “Disk Write Cache” option within the Dell and other vendors RAID configuration utilities. Ironically, the “default” setting applied by many of these utilities will be incorrect introducing a risk of write corruption. The trouble is a combination of manufacturer defaults, confusing wording of the disk cache option, and lack of adequate documentation.
The PHYSICAL DISK CACHE is designed for use on consumer NON-RAID computers on cheap usually slow disks where the risk of data loss may impact only one person. In this case, an on-disk write cache (VOLATILE) speeds up performance of writes to the disk while not allowing for the cached data to be saved if power is lost. In fault-tolerant storage systems, we cannot tolerate this data loss – a RAID controller must be aware that a write operation has been successfully written to the physical disk platter (not disk-cache) before considering the write operation complete. In order to make this guarantee, any DISK CACHE must be DISABLED. This ability to prevent data loss (corruption due to power failure) is an important fault-tolerant storage capability for business information systems. When the Operating System thinks a write operation has been completed – the storage subsystem must not lose it on the way to the disk!
The big confusion is that the MegaRAID tools configure this disk cache policy under “virtual disk configuration” next to the controller “write cache” policy and the wording does not clarify whether the disk-cache is a physical-disk setting or controller-based virtual disk cache setting. To make matters worse, the MegaRAID configuration and status reports DO NOT indicate whether any given physical disk cache is currently enabled or disabled. In my opinion these are TERRIBLE UI DECISIONS on the part of the MegaRAID software utility development team! Users need to know if they’re disabling physical disk write-cache, or if they’re disabling crucial performance-improving controller-based write cache.
Moral of the story: DISABLE DISK WRITE-CACHE in your disk-cache policy under each virtual disk in your PERC / MegaRAID storage configuration. DO Enable your controller-based Write-Back cache on each virtual disk and make sure that your RAID card has healthy battery-backed or flash-backed non-volatile write cache. If your raid card has NO fault-tolerant write cache (usually listed as NO Battery for legacy reasons) – recycle it and replace it with a proper RAID card with true controller write cache. Users of your system may not be able to tolerate the poor disk performance if your controller lacks a write cache – especially with the extra write delays of parity based raid (5, 6, 50, 60, etc).
Here are some references with more authoritative sources to back the claims I make here.
- MegaRAID disk write cache policy (ibm.com) “Disk Cache Policy should always be ‘Disabled’ when creating a Virtual Drive attached a RAID controller. This is to prevent loss of data in case of a power failure.”
- Configuring RAID for Optimal Performance (PDF, intel.com) “[p. 6] Disk Cache Policy determines whether the hard-drive write cache is enabled or disabled. When Disk Cache Policy is enabled, there is a risk of losing data in the hard drive cache if a
power failure occurs. The data loss may be fatal and may require restoring the data from a backup device. It is critical to have protection against power failures.” – yes, there should be no option other than *disabled* IMHO
Resources from Dell tend to just add to the confusion – perhaps this is because they’re just re-branding the AMI/LSI MegaRAID technology as “PERC” and do not quite understand it themselves? Here are a couple examples of the confusion from Dell and their customers (including myself when I read their documentation and forums on this topic).
Hopefully this blog post will help clarify this confusing topic and result in more properly configured RAID storage solutions based on the AMI / LSI MegaRAID cards. A BIG THANKS to the Intel and IBM teams who posted helpful documentation with an answer to this common Dell PERC RAID question. Feel free to leave a comment if this helped you out or if you have something to add.