md raid replace drive, software mdRAID

On this occasion we will see how to regenerate a software RAID in Linux.

Detected by SMART error type:

Smarctl diagnosis:

Source   
[root@simba ~]# smartctl -H /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-279.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate     0x002f   001   001   051    Pre-fail  Always   FAILING_NOW 330223

We sda HD pre-fail and with a rather alarming message, we must remove the Raid to replace it.

We can see the RAID:

Source   
root@simba ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0]
511988 blocks super 1.0 [2/2] [UU]
md3 : active raid1 sdb5[1] sda5[0]
1924327292 blocks super 1.1 [2/2] [UU]
bitmap: 1/15 pages [4KB], 65536KB chunk
md2 : active raid1 sdb3[1] sda3[0]
8190968 blocks super 1.1 [2/2] [UU]
md1 : active raid1 sdb2[1] sda2[0]
20478908 blocks super 1.1 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
unused devices: <none>

Everything is fine if it appears UU, if you only see one U there is any fault or is degraded. In this case, the RAID is still good, but the HD is in pre-fail, it is not incompatible.

Before continuing we will verify that grub is installed on the 2 HD:

Source   
grub
grub> find /grub/stage1
(hd0,0)
(hd1,0)

It is on sda and sdb. Ok we can continue because one of the steps is to restart the computer and if we had not installed grub in the 2 HDs, could be that we found ourselves with the machine will not boot (which is always an unpleasant surprise).

The disk is being used by md0, md1, md2 and md3. Mark as failure to not degrade performance:

Source   
mdadm --manage /dev/md0 --fail /dev/sda1
mdadm --manage /dev/md1 --fail /dev/sda2
mdadm --manage /dev/md2 --fail /dev/sda3
mdadm --manage /dev/md3 --fail /dev/sda5

Now we have the disks marked as failure:

Source   
[root@simba ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0](F)
511988 blocks super 1.0 [2/1] [_U]
md3 : active raid1 sdb5[1] sda5[0](F)
1924327292 blocks super 1.1 [2/1] [_U]
bitmap: 1/15 pages [4KB], 65536KB chunk
md2 : active raid1 sdb3[1] sda3[0](F)
8190968 blocks super 1.1 [2/1] [_U]
md1 : active raid1 sdb2[1] sda2[0](F)
20478908 blocks super 1.1 [2/1] [_U]
bitmap: 1/1 pages [4KB], 65536KB chunk
unused devices: <none>

We remove the HD RAID permanently:

Source   
mdadm --manage /dev/md0 --remove /dev/sda1
mdadm --manage /dev/md1 --remove /dev/sda2
mdadm --manage /dev/md2 --remove /dev/sda3
mdadm --manage /dev/md3 --remove /dev/sda5

Now we have excluded the RAID sda:

Source   
[root@simba ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1]
511988 blocks super 1.0 [2/1] [_U]
md3 : active raid1 sdb5[1]
1924327292 blocks super 1.1 [2/1] [_U]
bitmap: 4/15 pages [16KB], 65536KB chunk
md2 : active raid1 sdb3[1]
8190968 blocks super 1.1 [2/1] [_U]
md1 : active raid1 sdb2[1]
20478908 blocks super 1.1 [2/1] [_U]
bitmap: 1/1 pages [4KB], 65536KB chunk
unused devices: <none>

We stopped the machine, replace the sda drive and we started.

Now we have sdb disk with data and sda disk is completely empty, copy the partition table sdb to sda (very careful not wrong) with:

Source   
[root@simba ~]# sfdisk -d /dev/sdb | sfdisk -f /dev/sda

Add volumes:

Source   
mdadm --manage /dev/md0 --add /dev/sda1
mdadm --manage /dev/md1 --add /dev/sda2
mdadm --manage /dev/md2 --add /dev/sda3
mdadm --manage /dev/md3 --add /dev/sda5

Now you can see the status of reconstruction:

Source   
[root@simba ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[2] sdb1[1]
511988 blocks super 1.0 [2/2] [UU]
md3 : active raid1 sda5[2] sdb5[1]
1924327292 blocks super 1.1 [2/1] [_U]
resync=DELAYED
bitmap: 7/15 pages [28KB], 65536KB chunk
md2 : active raid1 sda3[2] sdb3[1]
8190968 blocks super 1.1 [2/1] [_U]
resync=DELAYED
md1 : active raid1 sda2[2] sdb2[1]
20478908 blocks super 1.1 [2/1] [_U]
[=>...................]  recovery =  5.8% (1198848/20478908) finish=4.0min speed=79923K/sec
bitmap: 1/1 pages [4KB], 65536KB chunk
unused devices: <none>

Once they have finished replicas we can verify that grub is installed in the 2 HDs again.

Source   
grub
grub> find /grub/stage1
(hd0,0)
(hd1,0)

In this case it was already replicated but it can be reconfigured if necessary, just in case (in (hd0,0) sda1):

Source   
[root@simba ~]# grub
Probing devices to guess BIOS drives. This may take a long time.
GNU GRUB  version 0.97  (640K lower / 3072K upper memory)
[ Minimal BASH-like line editing is supported.  For the first word, TAB
lists possible command completions.  Anywhere else TAB lists the possible
completions of a device/filename.]
grub> find /grub/stage1
find /grub/stage1
(hd0,0)
(hd1,0)
grub> device (hd0) /dev/sda
device (hd0) /dev/sda
grub> root (hd0,0)
root (hd0,0)
Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
setup (hd0)
Checking if "/boot/grub/stage1" exists... no
Checking if "/grub/stage1" exists... yes
Checking if "/grub/stage2" exists... yes
Checking if "/grub/e2fs_stage1_5" exists... yes
Running "embed /grub/e2fs_stage1_5 (hd0)"...  27 sectors are embedded.
succeeded
Running "install /grub/stage1 (hd0) (hd0)1+27 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub> quit
quit

And ready, I hope you might help.

Leave a Reply