How to recover from a RAID1 disk failure


THIS DOCUMENT IS CURRENTLY IN DRAFT FORM. USE AT YOUR OWN RISK.

This procedure will work with e-smith server and gateway version 4.1 and 4.1.x.

If you lose one of your RAID 1 disks, this is how to repair it. Let's being with the assumption that you had a RAID 1 e-smith system based on two IDE disks /dev/hda and /dev/hdc. /dev/hda has a failure and needs to be replaced.

  1. At a convenient time, shutdown your e-smith server and replace the faulty disk. The new disk should have the same geometry as both the old disk and the current working disk.

  2. Boot the e-smith server.

  3. Switch to a login prompt (press Alt+F2 if you are viewing the console) and login as root.

  4. Partition the new disk. It should be partitioned exactly the same as the other disk. Use the following command to determine the current partition details for the working disk /dev/hdc:

    
    	fdisk -l /dev/hdc
    

    You should see details similar to:

    	
    	Disk /dev/hdc: 64 heads, 63 sectors, 1015 cylinders
    	Units = cylinders of 4032 * 512 bytes
    
    	   Device Boot    Start       End    Blocks   Id  System
    	/dev/hdc1   *         1       131    264064+  fd  Linux raid autodetect
    	/dev/hdc2           132      1015   1782144    5  Extended
    	/dev/hdc5           132       137     12064+  fd  Linux raid autodetect
    	/dev/hdc6           138      1015   1770016+  fd  Linux raid autodetect
    

    Set up the identical partitions on /dev/hda using the command

    
    	fdisk /dev/hda
    

    Use the fdisk -l command to double check to make sure the partitions are exactly the same as those on the working disk, /dev/hdc.

  5. Determine which partitions have been mirrored. Look at the file /proc/mdstat, where you should see something like this (note that this file is from a working system and not one that has failed):

    
    	# cat /proc/mdstat
    	Personalities : [raid1] 
    	read_ahead 1024 sectors
    	md2 : active raid1 hdc1[1] hda1[0] 264000 blocks [2/2] [UU]
    	md0 : active raid1 hdc5[1] hda5[0] 11968 blocks [2/2] [UU]
    	md1 : active raid1 hdc6[2] hda6[0] 1769920 blocks [2/2] [UU]
    	unused devices: <none>
    

    This file indicates that you have three "meta-devices" that are mirrored:

    • md0 - using hdc5 and hda5

    • md1 - using hdc6 and hda6

    • md2 - using hdc1 and hda1

  6. Re-attach the partitions from the new disk to the RAID devices:

    
    	/sbin/raidhotadd /dev/md0 /dev/hda5
    	/sbin/raidhotadd /dev/md1 /dev/hda6
    	/sbin/raidhotadd /dev/md2 /dev/hda1
    
  7. You can see the progress of the raid resyncronization by examining /proc/mdstat. The following example output shows that both /dev/md0 and /dev/md2 are fully synchronized and /dev/md1 is 58% synchronized.

    
    	# cat /proc/mdstat
    	Personalities : [raid1] 
    	read_ahead 1024 sectors
    	md2 : active raid1 hdc1[1] hda1[0] 264000 blocks [2/2] [UU]
    	md0 : active raid1 hdc5[1] hda5[0] 11968 blocks [2/2] [UU]
    	md1 : active raid1 hdc6[2] hda6[0] 1769920 blocks [2/1] [U_] recovery=58% finish=2.6min
    	unused devices: <none>
    

For more information about using RAID with Linux, view the Software RAID HOWTO from the Linux Documentation Project at: http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO.html. When reading the document, remember that the e-smith server and gateway supports only RAID1 (disk mirroring).