So, for a while now, I’ve been experiencing crappy performance of a Western Digital Green drive (WD15EARS) I have an a zfs mirror storing my time machine backups (using OpenIndiana and Netatalk).
Yesterday, the drive started reporting errors. Unfortunately, the system hung – that’s not so cool – ZFS is supposed to keep working when a drive fails… Aside from that, when I rebooted, the system automatically started a scrub to verify data integrity, and after about 10 minutes:
pool: rpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Mar 10 10:19:42 2011
1.68G scanned out of 1.14T at 107M/s, 3h5m to go
146K resilvered, 0.14% done
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c8t1d0s0 DEGRADED 0 0 24 too many errors (resilvering)
c8t0d0s0 ONLINE 0 0 0
cache
c12d0s0 ONLINE 0 0 0
errors: No known data errors
Check it out. It’s found 24 errors on the Western Digital Drive, but so far no data errors have been found, because they were correct on the other drive.
That’s obvious, right? But what other operating systems can tell the difference between the right and wrong data when they’re both there??? Most raid systems only detect a total drive failure, but don’t deal with incorrect data coming off the drive !!
Sure backing up to a network (Time Machine’s sparse image stuff) is *way* slower than a directly connected firewire drive, but in my opinion, it’s well worth doing it this way for the data integrity that you don’t get on a single USB or Firewire drive.
Thank you ZFS for keeping my data safe. B*gger off Western Digital for making crappy drives. I’m off to get a replacement today… what will it be? Samsung or Seagate?