RAID

What is the problem with Raid 5

Basically, if you want to keep your data don't use RAID5.

Author: Art Kagel

OK, I will deal with why RAID10 (and by extention RAID1) is not as much at risk. Remember that the problem is not the partial media failure trashing the data on the one drive that is in the process of progressively failing over time but the problem that good data will be read from another stripe member (call it drive #1) for the stripe block containing damaged data on the failing drive (call it drive #2) and modified causing a read of the remaining stripe members and the calculation of new parity after which the modified block will be written back to drive 1 and the parity back to that stripe's parity drive (perhaps drive#3). Now since the data from drive 2 was not needed by the application (here IDS) the fact that it was damaged was not detected and so the new parity was calculated using garbage resulting in a parity that can ONLY be used accurately to recreate the trashed block on drive 2. Now suppose that drive #4 suffers from a catastrophic failure and has to be replaced. The damaged drive has continued to fail and now returns a different pattern of bits than was used to calculate the trashed parity block on drive 3. Now when the missing block data on the drive 4 replacement is calculated it too will become garbage and two disk blocks are now unusable, the damage has propagated. Now that dealt with what happens if the bad block was NOT read directly and detected by IDS. If it was IDS will mark the chunk OFFLINE and refuse to use it until you repair the damage. The only way you can do that is if you restore from backup or remove the partially damaged drive and try to rebuild it from the parity as if it had completely failed. HOWEVER, if all, or even several, of the drives in that array are from the same manufacturing lot or are even of similar age, there is a good chance that the previous problem has already trashed the parity of other blocks so that you will possibly be reconstructing a new drive that will have more bad data blocks than the one it replaces.

With RAID10, each drive in each mirrored pair is written independently. If a block on drive a is trashed the data on drive 1b (its mirror) is fine. If the bad data is read from the drive that is failing (say 1a) the engine will recognize it and mark the chunk down. All you have to do is remove drive 1a and mark the chunk back online rebuilding the mirror online. No problems and less chance that there are other damaged blocks on the one remaining mirror than on any of the 5 or more drives in a RAID5 stripe. If the data is NOT read from 1a but from 1b and modified it will be rewritten to BOTH drives improving the chances that it will be correctly readable if read from the failing 1a next time just because the flux changes will have been renewed, if the platter is too far gone we are just back to the possibility that the bad block will be read and flagged by IDS later. In no case can the data on 2a/2b, 3a/3b, 4a/4b, etc be damaged. Yes, if we were talking about ANY old data file on RAID10 the damage might propagate but since IDS has its own methods for detecting bad data reads this probability is vanishingly small (the damage to the block would have to NOT alter the contents of the first 28 bytes or the last 4 to 1020 bytes thus not damaging the page header, page trailer, or the slot table to not be detected by IDS)

Hardware, OS or Informix Mirroring

Hardware mirroring is usually best, as it's faster. Next is Operating System. Last comes IDS mirroring. IDS mirroring contains a little bit of logic regarding how to handle chunks that are down, but with any luck the Operating System or hardware mirroring will never let IDS see that situation anyway. If you are not using Informix mirroring then turn it off in the ONCONFIG file, [MIRRORING 0] you can always it back at a later date.

Jonathan Leffler explains

I simply haven't yet heard a convincing explanation of why our software can do it better 
than the O/S can. 

It does, of course, depend on the mirroring support from the O/S, and especially on a 
multi-CPU machine where the O/S I/O's are handled by a single CPU under the native 
mirroring system, DSA could have an advantage if it has multiple threads handling the writes 
in parallel. But I'm not convinced that O/S mirroring is that bad. It also depends on the 
intelligence or otherwise of the disk controllers. Etc. 

Unless the O/S has screwed up badly, I don't think that the Informix mirroring provides 
much (if any) advantage. 

I don't have any concrete evidence either way, and it is very difficult to determine 
experimentally. I know that there were once plans at one time to spend a day or two 
assessing the effect of LVMs (logical volume managers) on the performance of OnLine. I 
also know that it didn't happen -- I hope it was in part because I pointed out to the 
person who was asked to do the test that controlling the parameters of the test was going 
to be difficult, and was going to need considerably more than a day or two simply to work 
out what to test and how, independently of the time taken to create and load suitable data 
sets (mainly large ones) under multiple different configurations with differing amounts of 
RAID-ness, different numbers of controllers, different places where the mirroring occurs,  
different numbers of CPUs, different numbers of AIO threads, striping, etc. 

So, yes, I think maybe you are being lead astray by listening to Informix marketing talk. 

One of the claimed advantages for DSA disk handling is that it can selectively read from 
either the primary or the secondary of a mirrored pair -- so can the O/S mirrored systems, 
and here is evidence that at least one does precisely that: 

From: johnbr@atl.hp.com (John Bria)

HP's Logical Volume Manager will allow you to "stripe" non-array disks by creating 
extents on a rotating basis across multiple drives. This may/may not be advantageous 
as you develop a fragmentation strategy. 

If you use HP-UX mirroring, reads will be routed to the mirror copy if the primary is 
busy. Under heavy disk loads, this is very advantageous. 

Personal Tools