Matt Connolly's Blog

my brain dumps here…

Tag Archives: backup

Building netatalk in SmartOS

I’m looking at switching my home backup server from OpenIndiana to SmartOS. (there’s a few reasons, and that’s another post).

One of the main functions of my box is to be a Time Machine backup for my macs (my laptop and my wife’s iMac). I found this excellent post about building netatalk 3.0.1 in SmartOS, but it skipped a few of the dependencies, and did the patch after configure, which means if you change you reconfigure netatalk, then you need to reapply the patch.

Based on that article, I came up with a patch for netatalk, and here’s a gist of it: https://gist.github.com/mattconnolly/5230461

Prerequisites:

SmartOS already has most of the useful bits installed, but these are the ones I needed to install to allow netatalk to build:

$ sudo pkgin install gcc47 gmake libgcrypt

Build netatalk:

Download the latest stable netatalk. The netatalk home page has a handy link on the left.

$ cd netatalk-3.0.2
$ curl 'https://gist.github.com/mattconnolly/5230461/raw/27c02a276e7c2ec851766025a706b24e8e3db377/netatalk-3.0.2-smartos.patch' > netatalk-smartos.patch
$ patch -p1 < netatalk-smartos.patch
$ ./configure --with-bdb=/opt/local --with-init-style=solaris --with-init-dir=/var/svc/manifest/network/ --prefix=/opt/local
$ make
$ sudo make install

With the prefix of ‘/opt/local’ netatalk’s configuration file will be at ‘/opt/local/etc/afp.conf’

Enjoy.

[UPDATE]

There is a very recent commit in the netatalk source for an `init-dir` option to configure which means that in the future this patch won’t be necessary, and adding `--with-init-dir=/var/svc/manifest/network/` will do the job. Thanks HAT!

[UPDATE 2]

Netatalk 3.0.3 was just released, which includes the –init-dir option, so the patch is no longer necessary. Code above is updated.

Advertisements

A Ruby gem to backup and migrate data from a Rails app

Ever wanted to quickly duplicate your database in a rails app so you can do some development work on real data, but not risk breaking it?

Background:

Rails has some very good strategies for managing your schema with rake tasks such as db:migrate, db:schema:load, etc. Switching between production and development environments lets you use different databases.

Great, but how do you get the data from one environment to another?

With MySQL you can duplicate a database (easy with phpMyAdmin), and with Sqlite3 you can simply copy the database.db file.

But what if I want to go from MySQL to Sqlite3 just for working??

There are some database differences, and there’s a rake task that’s already managing the database structure. I need something to move the data as well.

In my particular case, I’m using redmine as an issue tracker, hacking on it, and also evaluating the Chiliproject fork. So I’d like to be able to move all my data: schema, data and files easily from one to another.

The Rails-Backup-Migrate Gem

So, I put together a gem to do this. It’s inspired by Tobias Lütke’s backup.rake file he posted a good 5 years ago: http://blog.leetsoft.com/2006/5/29/easy-migration-between-databases

It appears to be for an early version of rails. It also had some issues with tables belonging to “many-to-many” relationships, and was also slow because it instantiated every single ActiveRecord object as it went.

I’ve sped the process up by skipping the ActiveRecord object steps, and updated the table definitions for rails 2.3 (redmine runs on this) and it appears to be working for rails 3.0 as well.

This gem provides the following rake tasks to a rails app:

rake site:backup[backup_file]                          # Backup everything: schema, database to yml, and all files in 'files' directory.
rake site:backup:db[backup_file]                       # Dump schema and entire db in YML files to a backup file.
rake site:backup:files[backup_file]                    # Archive all files in the `files` directory into a backup file.
rake site:backup:schema[backup_file]                   # Dump schema to a backup file.
rake site:restore[backup_file]                         # Erase and reload db schema and data from backup files, and restore all files in the 'files' directory.
rake site:restore:db[backup_file]                      # Erase and reload entire db schema and data from backup file.
rake site:restore:files[backup_file]                   # Restore all files in the 'files' directory.
rake site:restore:schema[backup_file]                  # Erase and reload db schema from backup file.

The default backup_file is "site-backup.tgz" and is relative to rails app root directory.

Installing:

Installing manually:

$ gem install rails-backup-migrate

Installing with Bundler:

Add the following to your Gemfile:

gem rails-backup-migrate

And then

$ bundle install

Using rails-backup-migrate:

To use the gem in your rails app, simply add the following line to the end of your Rakefile:

require 'rails-backup-migrate'

Which will load the above tasks into your rails app’s task list. If you’re using bundler

$ # backup everything into the default file 'site-backup.tgz'
$ rake site:backup
$ # now, restore into another instance (eg: clone/checkout another copy of your project)
$ cd /path/to/another/instance/of/my/rails/app
$ rake site:restore[/path/to/original/instance/of/my/rails/app/site-backup.tgz]

Done.

The source code is available on github: https://github.com/mattconnolly/rails-backup-migrate

This is the first gem I’ve made and published, and I hope it makes life a little bit easier for someone!

Rails data backups, independent of database

With my continuing my interest in redmine / ChiliProject, I’m really wanting a way of backing up my data that is database independent. I did a bunch of searching around, and the best solution I found was here. However, it appears a little out dated, and didn’t work in Rails 2.3 for a few reasons:

  • There isn’t always a 1:1 mapping between tables and models. (Example: many-many relationships create extra tables).
  • The excluded tables list was outdated (probably from an earlier Rails)[1]

There were also some comments about the export being quite slow. This could have been because every record of every table was loaded up in an ActiveRecord model instance before being dumped to the YML files.

I addressed these issues and made a new version. Can’t attach a file to this blog, so it’s on pastie for now: http://pastie.org/1877530

I’ve tested this getting data out of mysql and into sqlite3 and that works just fine. This is really handy for getting the data into a test environment quickly without having to set up more mysql databases, users, etc.

[1]: I’m testing this for Rails 2.3.11 as Redmine and ChiliProject are Rails 2.3.x apps.

If this saves you some time, or could be improved, please let me know.

ZFS = Data integrity

So, for a while now, I’ve been experiencing crappy performance of a Western Digital Green drive (WD15EARS) I have an a zfs mirror storing my time machine backups (using OpenIndiana and Netatalk).

Yesterday, the drive started reporting errors. Unfortunately, the system hung – that’s not so cool – ZFS is supposed to keep working when a drive fails… Aside from that, when I rebooted, the system automatically started a scrub to verify data integrity, and after about 10 minutes:

  pool: rpool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Thu Mar 10 10:19:42 2011
    1.68G scanned out of 1.14T at 107M/s, 3h5m to go
    146K resilvered, 0.14% done
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         DEGRADED     0     0     0
          mirror-0    DEGRADED     0     0     0
            c8t1d0s0  DEGRADED     0     0    24  too many errors  (resilvering)
            c8t0d0s0  ONLINE       0     0     0
        cache
          c12d0s0     ONLINE       0     0     0

errors: No known data errors

Check it out. It’s found 24 errors on the Western Digital Drive, but so far no data errors have been found, because they were correct on the other drive.

That’s obvious, right? But what other operating systems can tell the difference between the right and wrong data when they’re both there??? Most raid systems only detect a total drive failure, but don’t deal with incorrect data coming off the drive !!

Sure backing up to a network (Time Machine’s sparse image stuff) is *way* slower than a directly connected firewire drive, but in my opinion, it’s well worth doing it this way for the data integrity that you don’t get on a single USB or Firewire drive.

Thank you ZFS for keeping my data safe. B*gger off Western Digital for making crappy drives. I’m off to get a replacement today… what will it be? Samsung or Seagate?

ZFS saved my Time Machine backup

For a while now, I’ve been using Time Machine to backup to an AFP share hosted by netatalk on an OpenIndiana low powered home server.

Last night, Time Machine stopped, with an error message: “Time Machine completed a verification of your backups. To improve reliability, Time Machine must create a new backup for you“.

Periodically I create ZFS snapshots of the volume containing my Time Machine backup. I haven’t enabled any automatic snapshots yet (like OpenIndiana/Solaris’s Time Slider service), so I just do it manually every now and then.

So, I shutdown netatalk, rolled back the snapshot, checked the netatalk database, restarted netatalk, and was then back in business.

# /etc/init.d/netatalk stop
# zfs rollback rpool/MacBackup/TimeMachine@20100130
# /usr/local/bin/dbd -r /MacBackup/TimeMachine
# /etc/init.d/netatalk start

Lost only a day or two’s incremental backups, which was much more palatable than having to do another complete backup of >250GB.

ZFS is certainly proving to be useful, even in a low powered home backup scenario.

Western Digital Green Lemon

I have an OpenSolaris backup machine with 2 x 1.5 TB drives mirrored. One is a Samsung Silencer, the other is a Western Digital Green drive. The silencer is, ironically, the noisier of the two, but way outperforms the WD drive.

I’ve done some failure tests on the mirror by unplugging one drive while copying files to/from the backup server from my laptop.

First, I was copying from the server, onto a single FW drive, writing at a solid 30MB/s. I disconnected the Samsung drive while it was running and the file copy proceeded without fault at about 25MB/s of the single WD drive.

`zpool status` showed the drive was UNAVAIL and that the pool would continue to work in a degraded state. When I reconnected the drive, `cfgadm` showed it as connected by unconfigured. When I reconfigured the Samsung drive, the pool automatically resilvered any missing data. (wasn’t much because I was reading from the network) in a matter of seconds.

Failure test #2 was to remove the WD drive. I copied data to the server from the laptop, and the progress was intermittent… bursts of 30MB/s, then nothing for quite a few seconds, etc…. I disconnected the WD drive, and hey presto, the transfer rate instantly jumped up to a solid 20MB/s. This samsung drive definitely writes a whole stack faster than the WD drive. (A mirror is as fast as the slowest writing drive).

And here’s the lemon part. When I reconnected the WD drive, it showed up as disconnected. The samsung was connected, but unconfigured. To my frustration, I couldn’t reconnect the drive:

$ cfgadm
Ap_Id                          Type         Receptacle   Occupant     Condition
sata1/0                        sata-port    disconnected unconfigured failed
$ cfgadm -c connect sata1/0
cfgadm: Insufficient condition
I did a bit of searching and found this page: SolarisZfsReplaceDrive : use the -f force option:
$ pfexec cfgadm -f -c connect sata1/0
Activate the port: /devices/pci@0,0/pci8086,4f4d@1f,2:0
This operation will enable activity on the SATA port
Continue (yes/no)? yes
$ cfgadm
Ap_Id                          Type         Receptacle   Occupant     Condition
sata1/0                        disk         connected    unconfigured unknown
sata1/1::dsk/c8t1d0            disk         connected    configured   ok

So, now OpenSolaris sees the drive as connected, let’s configure it and zpool should see it straight away…

$ pfexec cfgadm -c configure sata1/0
$ cfgadm
Ap_Id                          Type         Receptacle   Occupant     Condition
sata1/0::dsk/c8t0d0            disk         connected    configured   ok
sata1/1::dsk/c8t1d0            disk         connected    configured   ok
$ zpool status -x
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 0h0m, 0.00% done, 465h28m to go
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            c8t0d0s0  ONLINE       0 1.14K     0  544K resilvered
            c8t1d0s0  ONLINE       0     0     0

Oh man… I have to resilver the whole drive. Why!!??! The other drive remembered it was a part of the pool and intelligently went about resilvering the differences. This drive looks like it was to resilver the whole damn thing.

After a while:

$ zpool status
  pool: rpool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h23m, 5.05% done, 7h20m to go
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            c8t0d0s0  ONLINE       0     0     0  12.3G resilvered
            c8t1d0s0  ONLINE       0     0     0

And here’s another interesting bit… the performance of the WD drive (c8t0d0) on my machine is really poor:

$ iostat -x 5

                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   61.2    0.0 1056.2  0.0  9.1    0.0  148.1   0 100 c8t0d0
   79.0    0.0  978.7    0.0  0.0  0.0    0.0    0.6   0   3 c8t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t0d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   72.0    0.0  178.8  0.0  7.2    0.0   99.6   0 100 c8t0d0
  111.8    0.0  361.3    0.0  0.0  0.0    0.0    0.3   0   1 c8t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t0d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   51.6    0.0  120.4  0.0  7.5    0.0  145.9   0 100 c8t0d0
   79.4    0.0  143.7    0.0  0.0  0.0    0.0    0.2   0   1 c8t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t0d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   62.2    0.0 1968.5  0.0  8.3    0.0  133.7   0 100 c8t0d0
   81.8    0.0 2616.7    0.0  0.0  0.3    0.0    3.2   0   8 c8t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t0d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   34.6    0.0 1880.2  0.0  7.1    0.0  204.9   0  79 c8t0d0
   28.4   11.6 1413.5   41.7  0.0  0.1    0.0    3.1   0   7 c8t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t0d0

Check it out. 100% busy use of the drive, and it’s writing less than 2MB/s. Compare that to the %b busy for the Samsun (on c8t1d0) for reading the same amount of data. And check out the average service time (asvc_t) – that’s bad like a cd-rom!! Yikes.

It doesn’t get reconnect to the system, its service time is way slow and its write performance stinks. This WD drive is a total lemon!

My first real Time Machine backup on a ZFS mirror

So following my last post about the impact of compression on ZFS, I’ve created a ZFS file system with Compression ON and am sharing it via Netatalk to my MacBook Pro.

I connected the Mac via gigabit ethernet for the original backup, and it backed up 629252 items (193.0 GB) in 7 hours, 23 minutes, 4.000 seconds, according the backup log. That’s an average of 7.4MB/sec. Nowhere near the maximum transfer rates that I’ve seen to the ZFS share, but acceptable nonetheless.

`zfs list` reports that the compression ratio is 1.11x. I would have expected more, but oh well.

And now my incremental backups are also working well over the wireless connection. Excellent.

ZFS performance networked from a Mac

Before I go ahead and do a full time machine backup to my OpenSolaris machine with a ZFS mirror, I thought I’d try and test out what performance hit there might be when using compression. I also figured, that I’d test out the impact of changing the recordsize too. Optimising this for the data record size seems to be best practices for databases, and since Time Machine stores data in a Mac Disk Image (SparseBundle) it probably writes data in 4k chunks matching the allocation size of the HFS filesystem in the disk image.

There were three copy tasks done:

  1. Copy a single large video file (1.57GB) to the Netatalk AFP share,
  2. Copy a single large video file (1.57GB) to a locally (mac) mounted disk image stored on the Netatalk AFP share,
  3. Copy a folder with 2752 files (117.3MB) to a locally (mac) mounted disk image stored on the Netatalk AFP share.

Here’s the results:

To Netatalk AFP share To Disk Image stored on AFP share To Disk Image stored on AFP share
ZFS Recordsize and compression 1 video file, 1.57GB 1 video file, 1.57GB 2752 files, 117.3MB
128k, off 0m29.826s

53.9MB/s

2m5.889s

12.7MB/s

1m45.809s

1.1MB/s

128k, on 0m52.179s

30.9MB/s

1m36.084s

16.7MB/s

1m34.367s

1.24MB/s

128k, gzip 0m31.290s

51.4MB/s

2m32.485s

10.5MB/s

2m29.141s

0.79MB/s

4k, off 0m27.131s

59.3MB/s

2m16.138s

11.8MB/s

2m47.718s

0.70MB/s

4k, on 0m25.651s

62.7MB/s

1m59.459s

13.5MB/s

1m41.551s

1.2MB/s

4k, gzip 0m30.348s

53.0MB/s

5m16.195s

5.08MB/s

4m48.378s

0.41MB/s

I think there was something else happening on the server for the 128k compression=on test, impacting on its data rate.

Conclusion:

The clear winner is the default compression and default record size. It must be that even my low powered Atom processor machine can compress the data faster than it can be written to disk resulting in less bandwidth to disc and therefore increasing performance at the same time as saving space. Well done ZFS!

Snow Leopard Time Machine is oh so slow

I erased my time machine drive – because it couldn’t combine the old data with the new HD. No problems. Start a new backup = SLOW. here’s the console log:

29/08/09 7:15:01 PM com.apple.backupd7253 Starting standard backup
29/08/09 7:15:09 PM com.apple.backupd7253 Backing up to: /Volumes/Mattski500/Backups.backupdb
29/08/09 7:15:10 PM com.apple.backupd7253 Detected system migration from: /Volumes/Mattski500/Backups.backupdb/MattBook/2009-08-29-024306/MacBookPro
29/08/09 7:15:21 PM com.apple.backupd7253 Backup content size: 179.1 GB excluded items size: 9.5 GB for volume MacBookPro
29/08/09 7:15:21 PM com.apple.backupd7253 No pre-backup thinning needed: 204.79 GB requested (including padding), 212.36 GB available
29/08/09 8:15:11 PM com.apple.backupd7253 Copied 10.2 GB of 169.6 GB, 7804 of 1502626 items
29/08/09 9:15:11 PM com.apple.backupd7253 Copied 12.3 GB of 169.6 GB, 15888 of 1502626 items
29/08/09 10:15:12 PM com.apple.backupd7253 Copied 16.0 GB of 169.6 GB, 16700 of 1502626 items
29/08/09 11:15:12 PM com.apple.backupd7253 Copied 17.4 GB of 169.6 GB, 26237 of 1502626 items
30/08/09 12:15:12 AM com.apple.backupd7253 Copied 18.7 GB of 169.6 GB, 71519 of 1502626 items
30/08/09 1:15:13 AM com.apple.backupd7253 Copied 23.4 GB of 169.6 GB, 99607 of 1502626 items
30/08/09 2:15:14 AM com.apple.backupd7253 Copied 30.3 GB of 169.6 GB, 112358 of 1502626 items
30/08/09 3:15:14 AM com.apple.backupd7253 Copied 36.1 GB of 169.6 GB, 125770 of 1502626 items
30/08/09 4:15:15 AM com.apple.backupd7253 Copied 38.2 GB of 169.6 GB, 160636 of 1502626 items
30/08/09 5:15:16 AM com.apple.backupd7253 Copied 43.2 GB of 169.6 GB, 175095 of 1502626 items
30/08/09 6:15:17 AM com.apple.backupd7253 Copied 45.2 GB of 169.6 GB, 189965 of 1502626 items
30/08/09 7:15:17 AM com.apple.backupd7253 Copied 46.7 GB of 169.6 GB, 213777 of 1502626 items
30/08/09 8:15:18 AM com.apple.backupd7253 Copied 47.1 GB of 169.6 GB, 288288 of 1502626 items
30/08/09 9:15:18 AM com.apple.backupd7253 Copied 48.5 GB of 169.6 GB, 336746 of 1502626 items
30/08/09 10:15:19 AM com.apple.backupd7253 Copied 50.4 GB of 169.6 GB, 352332 of 1502626 items
30/08/09 11:15:20 AM com.apple.backupd7253 Copied 57.4 GB of 169.6 GB, 363639 of 1502626 items
30/08/09 12:15:20 PM com.apple.backupd7253 Copied 58.5 GB of 169.6 GB, 386480 of 1502626 items
30/08/09 1:15:21 PM com.apple.backupd7253 Copied 59.8 GB of 169.6 GB, 412356 of 1502626 items
30/08/09 2:15:22 PM com.apple.backupd7253 Copied 61.0 GB of 169.6 GB, 435309 of 1502626 items
30/08/09 3:15:23 PM com.apple.backupd7253 Copied 62.2 GB of 169.6 GB, 464146 of 1502626 items
30/08/09 4:15:23 PM com.apple.backupd7253 Copied 65.9 GB of 169.6 GB, 485498 of 1502626 items
30/08/09 5:15:23 PM com.apple.backupd7253 Copied 82.4 GB of 169.6 GB, 489045 of 1502626 items
30/08/09 6:15:25 PM com.apple.backupd7253 Copied 100.3 GB of 169.6 GB, 491058 of 1502626 items
30/08/09 7:15:26 PM com.apple.backupd7253 Copied 110.3 GB of 169.6 GB, 493059 of 1502626 items
30/08/09 8:15:26 PM com.apple.backupd7253 Copied 119.9 GB of 169.6 GB, 493059 of 1502626 items
30/08/09 9:15:27 PM com.apple.backupd7253 Copied 135.1 GB of 169.6 GB, 493059 of 1502626 items
30/08/09 10:15:28 PM com.apple.backupd7253 Copied 148.6 GB of 169.6 GB, 493059 of 1502626 items
30/08/09 11:15:28 PM com.apple.backupd7253 Copied 152.6 GB of 169.6 GB, 509372 of 1502626 items
31/08/09 12:15:29 AM com.apple.backupd7253 Copied 153.3 GB of 169.6 GB, 534726 of 1502626 items
31/08/09 1:15:29 AM com.apple.backupd7253 Copied 153.7 GB of 169.6 GB, 550336 of 1502626 items
31/08/09 2:15:29 AM com.apple.backupd7253 Copied 155.3 GB of 169.6 GB, 562214 of 1502626 items
31/08/09 3:15:30 AM com.apple.backupd7253 Copied 156.4 GB of 169.6 GB, 573554 of 1502626 items
31/08/09 4:15:30 AM com.apple.backupd7253 Copied 157.5 GB of 169.6 GB, 584987 of 1502626 items
31/08/09 5:15:31 AM com.apple.backupd7253 Copied 157.8 GB of 169.6 GB, 594213 of 1502626 items
31/08/09 5:26:04 AM com.apple.backupd7253 CoreEndianFlipData: error -4940 returned for rsrc type FREF (id 133, length 7, native = no)
31/08/09 5:26:04 AM com.apple.backupd7253 CoreEndianFlipData: error -4940 returned for rsrc type FREF (id 133, length 7, native = no)
31/08/09 5:28:49 AM com.apple.backupd7253 CoreEndianFlipData: error -4940 returned for rsrc type FREF (id 129, length 7, native = no)
31/08/09 5:28:52 AM com.apple.backupd7253 CoreEndianFlipData: error -4940 returned for rsrc type FREF (id 129, length 7, native = no)
31/08/09 6:15:32 AM com.apple.backupd7253 Copied 158.9 GB of 169.6 GB, 613142 of 1502626 items

48 hours and it’s still not finished – although close now.

My hard drive is attached by firewire, and at a respectable transfer rate of 10MB/s = (36GB/hr) this should have taken a respectable 5 hours. I know my firewire drive can in fact sustain much higher than 10MB/s, so this baffles me as to why it’s getting an average rate less than 1MB/sec. Grrrrr.