Monday, January 16, 2017

An Unfortunate Series Of Events

First I got an email telling me of a fail event detected on my RAID
device. Then 12 minutes later a second email this time from SMART
monitoring complaining that one of my hard drives could not be opened.
There is a difference between a failed to open device and a failed
device situation.

So I verified these situations with the terminal. I verified the SMART
monitoring report.

#smartctl -a /dev/questionable-device

I verified the status of the RAID array.

#mdadm --detail /dev/md0

I have a degraded array, one device is removed and faulty. The device is
/dev/sdd. This is the 1TB hard
drive which is the oldest of the 4 I'm using.

I opened the black box. I pulled the SATA cables and examined them for
anything out of the ordinary. I put them back again. I opened the
terminal to check if I have access to the hard drives.

#smartctl -a /dev/all-the-drives

I added the faulty drive to the RAID array with:

#mdadm --manage --add /dev/md0 /dev/faulty-drive

I checked the status of the RAID array.

#cat /proc/mdstat

donato@desktop:~$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0]
[raid1] [raid10]
md0 : active raid5 sdd[3] sdb1[0] sdc1[1]
1953260544 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2]
[=====>...............] recovery = 26.4% (257972512/976630272)
finish=104.2min speed=114919K/sec
bitmap: 6/8 pages [24KB], 65536KB chunk

unused devices: <none>

I guess this time it's for real. The RAID array is rebuilding itself.
Back to my monitoring station then.
Post a Comment

Reflections On My Blogging: Keeping It Honest

When you're facing a white, blank screen trying to decide what to write, it seemed hopeless and hopeful at the same time. It's like...