This morning i’m looking into how ZFS on FreeBSD reacts to disk failure. Using the setup documented in the previous two posts (3 x 2TB HDDs in raidz). First task was to shut down the machine and pull one of the drives. Then restart the machine and see what happened…
# zpool status pool: DATA state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: none requested config: NAME STATE READ WRITE CKSUM DATA DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 ada1 ONLINE 0 0 0 ada2 UNAVAIL 0 0 0 cannot open ada2 ONLINE 0 0 0 errors: No known data errors
All the data seems to be in tack and available, so now i’ll shut down and push the disk back in.
pool: DATA state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed after 0h0m with 0 errors on Wed Oct 19 07:58:20 2011 config: NAME STATE READ WRITE CKSUM DATA ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 5 22K resilvered ada3 ONLINE 0 0 0 errors: No known data errors
So it looks as if it all worked but we need to clear the error as follows:
# zpool clear DATA # zpool status pool: DATA state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Wed Oct 19 07:58:20 2011 config: NAME STATE READ WRITE CKSUM DATA ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 22K resilvered ada3 ONLINE 0 0 0 errors: No known data errors
Now i’m going to do a ‘scrub’ to check for errors..
# zpool scrub DATA # zpool status pool: DATA state: ONLINE scrub: scrub completed after 0h0m with 0 errors on Wed Oct 19 08:07:37 2011 config: NAME STATE READ WRITE CKSUM DATA ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 errors: No known data errors
Thats seemed to work ok, so the next task is to remove a disk form the array from the command line and then put it all back again.
# zpool offline DATA ada2 # zpool status pool: DATA state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scrub: scrub completed after 0h0m with 0 errors on Wed Oct 19 08:07:37 2011 config: NAME STATE READ WRITE CKSUM DATA DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 ada1 ONLINE 0 0 0 ada2 OFFLINE 0 0 0 ada3 ONLINE 0 0 0 errors: No known data errors # zpool online DATA ada2 # zpool status pool: DATA state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Wed Oct 19 09:17:44 2011 config: NAME STATE READ WRITE CKSUM DATA ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 512 resilvered ada3 ONLINE 0 0 0 errors: No known data errors