Dell PERC / MegaRAID Disk Cache Policy

The Dell PERC (PowerEdge RAID Controller) cards provide a competitive server-based fault-tolerant storage solution. NOTE that Dell often quotes a server baseline configuration without any write cache on the RAID card – this makes the baseline config appear to be an attractive low price. DO NOT buy any system without a write cache built into the RAID controller – this will be listed as battery-backed or non-volatile or flash-backed write-cache (BBWC, NVWC, FBWC). It is also important to configure your server or storage unit WITH hot-swap drives AND redundant hot-swap power supplies. You want the raid controller to be able to indicate failure of the disk with the led lights on each disk caddy – you can easily identify the failed drive and swap it out while the storage volume is online. The same applies for power supplies (PSU’s) which are a common failed component – the server needs to be able to indicate the failed PSU with led lights and server technician must be able to swap out the failed unit while the system is running. Another benefit to redundant power suppies – ability to swap out UPS units or migrate to new PDU, etc.

Back to the RAID discussion – your write cache ON-CONTROLLER is crucial to the write performance of your fault-tolerant storage system. The OS will be able to complete IO operations (IOPS) after write has completed to the raid controller cache – this allows the raid controller to continue writing to the disk while the OS returns to other non-disk tasks. If the system loses power, the flash-backed or battery-backed cache saves the unwritten data and completes the write when the power is restored to the disks.

UNFORTUNATELY there is a huge problem with MANY of the PERC / MegaRAID implementations in the field that is a BIG RISK for catastrophic DATA LOSS. The issue is with the unclear description of “Disk Write Cache” option within the Dell and other vendors RAID configuration utilities. Ironically, the “default” setting applied by many of these utilities will be incorrect introducing a risk of write corruption. The trouble is a combination of manufacturer defaults, confusing wording of the disk cache option, and lack of adequate documentation.

The PHYSICAL DISK CACHE is designed for use on consumer NON-RAID computers on cheap usually slow disks where the risk of data loss may impact only one person. In this case, an on-disk write cache (VOLATILE) speeds up performance of writes to the disk while not allowing for the cached data to be saved if power is lost. In fault-tolerant storage systems, we cannot tolerate this data loss – a RAID controller must be aware that a write operation has been successfully written to the physical disk platter (not disk-cache) before considering the write operation complete. In order to make this guarantee, any DISK CACHE must be DISABLED. This ability to prevent data loss (corruption due to power failure) is an important fault-tolerant storage capability for business information systems. When the Operating System thinks a write operation has been completed – the storage subsystem must not lose it on the way to the disk!

The big confusion is that the MegaRAID tools configure this disk cache policy under “virtual disk configuration” next to the controller “write cache” policy and the wording does not clarify whether the disk-cache is a physical-disk setting or controller-based virtual disk cache setting. To make matters worse, the MegaRAID configuration and status reports DO NOT indicate whether any given physical disk cache is currently enabled or disabled. In my opinion these are TERRIBLE UI DECISIONS on the part of the MegaRAID software utility development team! Users need to know if they’re disabling physical disk write-cache, or if they’re disabling crucial performance-improving controller-based write cache.

Moral of the story: DISABLE DISK WRITE-CACHE in your disk-cache policy under each virtual disk in your PERC / MegaRAID storage configuration. DO Enable your controller-based Write-Back cache on each virtual disk and make sure that your RAID card has healthy battery-backed or flash-backed non-volatile write cache. If your raid card has NO fault-tolerant write cache (usually listed as NO Battery for legacy reasons) – recycle it and replace it with a proper RAID card with true controller write cache. Users of your system may not be able to tolerate the poor disk performance if your controller lacks a write cache – especially with the extra write delays of parity based raid (5, 6, 50, 60, etc).

Here are some references with more authoritative sources to back the claims I make here.

  • MegaRAID disk write cache policy ( “Disk Cache Policy should always be ‘Disabled’ when creating a Virtual Drive attached a RAID controller. This is to prevent loss of data in case of a power failure.”
  • Configuring RAID for Optimal Performance (PDF, “[p. 6] Disk Cache Policy determines whether the hard-drive write cache is enabled or disabled. When Disk Cache Policy is enabled, there is a risk of losing data in the hard drive cache if a
    power failure occurs. The data loss may be fatal and may require restoring the data from a backup device. It is critical to have protection against power failures.” – yes, there should be no option other than *disabled* IMHO

Resources from Dell tend to just add to the confusion – perhaps this is because they’re just re-branding the AMI/LSI MegaRAID technology as “PERC” and do not quite understand it themselves? Here are a couple examples of the confusion from Dell and their customers (including myself when I read their documentation and forums on this topic).

Hopefully this blog post will help clarify this confusing topic and result in more properly configured RAID storage solutions based on the AMI / LSI MegaRAID cards. A BIG THANKS to the Intel and IBM teams who posted helpful documentation with an answer to this common Dell PERC RAID question. Feel free to leave a comment if this helped you out or if you have something to add.


About notesbytom

Keeping technology notes on to free up my mind to solve new problems rather than figuring out the same ones repeatedly :-).
This entry was posted in System Administration and tagged , , , , . Bookmark the permalink.

5 Responses to Dell PERC / MegaRAID Disk Cache Policy

  1. WynX says:

    Thanks for this article, just what I was looking for when trying to ensure proper configuration for a Dell PERC controller. One option that is still not clear to me is the following confusing status on my virtual drives:

    Disk Cache Policy : Default

    I use megacli on linux to interact with the controller and get this info:
    megacli -LDInfo -LAll -aAll
    or more compact
    megacli -LDGetProp -DskCache -LAll -aAll

    Of course the status ‘Enabled’ or ‘Disabled’ is clear, but I can’t find what the ‘Default’ behavior is (is it enabled or disabled)?

    I can disable the Disk Cache using: megacli -LDSetProp DisDskCache -LAll -aAll
    Additionally I wouldn’t know how to return the setting to ‘Default’.

    • notesbytom says:

      Hello @WynX, I believe the the “Default” disk cache policy will accept whatever setting the drive firmware uses when the disk powers on (unmodified by the RAID Controller). This means that some disks might have on-disk write cache enabled (risking data loss), and some disks may have on-disk write cache disabled (safe for use with RAID system). The RAID controller is not smart enough to query the current setting of each disk firmware, but it is capable of applying a disk cache setting to each disk managed by the controller. That’s why it is important to apply this setting in the RAID controller directly to have a known state of the on-disk write cache (safe setting would be “disabled”). Default would be the “unknown” disk-firmware-default setting (Also known as “Unchanged”).

  2. art says:

    Do some near-line drives now have non-volatile write caches? In this case after a power fail they complete the write?

    • notesbytom says:

      Hello @art, I’m not sure about non-volatile “write” cache on near-line drives. Please provide some reference if you have detail on this type of product. I believe many “hybrid” drives will cache Read operations in on-drive flash to speed up subsequent reads without hitting the disk again. This is Read-Only cache which has no impact on write-back safety during system crashes.

      • Art says:

        Current SAS marketing literature from both Seagate and HGST mention non-volatile write caches implemented in unusual ways. Seagate uses back EMF from the drive spinning down to transfer write cache to NAND. I think HGST buffers the write cache in a special area on disk.

        However it’s unclear which specific models of SAS drives would have this. And if you are under contract with Dell and a drive fails they will send you whatever. So it seems too dangerous to use.

        Which begs the question- why does Dell have it turned on by default in the latest OpenManage when configuring RAID6 virtual disks on a PERC H830?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s