ZFS on FreeBSD – Disk Failures

This morning i’m looking into how ZFS on FreeBSD reacts to disk failure. Using the setup documented in the previous two posts (3 x 2TB HDDs in raidz). First task was to shut down the machine and pull one of the drives. Then restart the machine and see what happened…

# zpool status 
  pool: DATA
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
	the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	DATA        DEGRADED     0     0     0
	  raidz1    DEGRADED     0     0     0
	    ada1    ONLINE       0     0     0
	    ada2    UNAVAIL      0     0     0  cannot open
	    ada2    ONLINE       0     0     0

errors: No known data errors

All the data seems to be in tack and available, so now i’ll shut down and push the disk back in.

  pool: DATA
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed after 0h0m with 0 errors on Wed Oct 19 07:58:20 2011
config:

	NAME        STATE     READ WRITE CKSUM
	DATA        ONLINE       0     0     0
	  raidz1    ONLINE       0     0     0
	    ada1    ONLINE       0     0     0
	    ada2    ONLINE       0     0     5  22K resilvered
	    ada3    ONLINE       0     0     0

errors: No known data errors

So it looks as if it all worked but we need to clear the error as follows:

# zpool clear DATA
# zpool status
  pool: DATA
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Wed Oct 19 07:58:20 2011
config:

	NAME        STATE     READ WRITE CKSUM
	DATA        ONLINE       0     0     0
	  raidz1    ONLINE       0     0     0
	    ada1    ONLINE       0     0     0
	    ada2    ONLINE       0     0     0  22K resilvered
	    ada3    ONLINE       0     0     0

errors: No known data errors

Now i’m going to do a ‘scrub’ to check for errors..

# zpool scrub DATA
# zpool status
  pool: DATA
 state: ONLINE
 scrub: scrub completed after 0h0m with 0 errors on Wed Oct 19 08:07:37 2011
config:

	NAME        STATE     READ WRITE CKSUM
	DATA        ONLINE       0     0     0
	  raidz1    ONLINE       0     0     0
	    ada1    ONLINE       0     0     0
	    ada2    ONLINE       0     0     0
	    ada3    ONLINE       0     0     0

errors: No known data errors

Thats seemed to work ok, so the next task is to remove a disk form the array from the command line and then put it all back again.

# zpool offline DATA ada2
# zpool status
  pool: DATA
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Online the device using 'zpool online' or replace the device with
	'zpool replace'.
 scrub: scrub completed after 0h0m with 0 errors on Wed Oct 19 08:07:37 2011
config:

	NAME        STATE     READ WRITE CKSUM
	DATA        DEGRADED     0     0     0
	  raidz1    DEGRADED     0     0     0
	    ada1    ONLINE       0     0     0
	    ada2    OFFLINE      0     0     0
	    ada3    ONLINE       0     0     0

errors: No known data errors

# zpool online DATA ada2
# zpool status
  pool: DATA
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Wed Oct 19 09:17:44 2011
config:

	NAME        STATE     READ WRITE CKSUM
	DATA        ONLINE       0     0     0
	  raidz1    ONLINE       0     0     0
	    ada1    ONLINE       0     0     0
	    ada2    ONLINE       0     0     0  512 resilvered
	    ada3    ONLINE       0     0     0

errors: No known data errors
This entry was posted in FreeBSD Administration and tagged , , . Bookmark the permalink.

Leave a Reply