Matt Connolly's Blog

my brain dumps here…

Category Archives: OpenSolaris

Building netatalk in SmartOS

I’m looking at switching my home backup server from OpenIndiana to SmartOS. (there’s a few reasons, and that’s another post).

One of the main functions of my box is to be a Time Machine backup for my macs (my laptop and my wife’s iMac). I found this excellent post about building netatalk 3.0.1 in SmartOS, but it skipped a few of the dependencies, and did the patch after configure, which means if you change you reconfigure netatalk, then you need to reapply the patch.

Based on that article, I came up with a patch for netatalk, and here’s a gist of it: https://gist.github.com/mattconnolly/5230461

Prerequisites:

SmartOS already has most of the useful bits installed, but these are the ones I needed to install to allow netatalk to build:

$ sudo pkgin install gcc47 gmake libgcrypt

Build netatalk:

Download the latest stable netatalk. The netatalk home page has a handy link on the left.

$ cd netatalk-3.0.2
$ curl 'https://gist.github.com/mattconnolly/5230461/raw/27c02a276e7c2ec851766025a706b24e8e3db377/netatalk-3.0.2-smartos.patch' > netatalk-smartos.patch
$ patch -p1 < netatalk-smartos.patch
$ ./configure --with-bdb=/opt/local --with-init-style=solaris --with-init-dir=/var/svc/manifest/network/ --prefix=/opt/local
$ make
$ sudo make install

With the prefix of ‘/opt/local’ netatalk’s configuration file will be at ‘/opt/local/etc/afp.conf’

Enjoy.

[UPDATE]

There is a very recent commit in the netatalk source for an `init-dir` option to configure which means that in the future this patch won’t be necessary, and adding `--with-init-dir=/var/svc/manifest/network/` will do the job. Thanks HAT!

[UPDATE 2]

Netatalk 3.0.3 was just released, which includes the –init-dir option, so the patch is no longer necessary. Code above is updated.

Goodbye OpenSolaris, Hello OpenIndiana

After the demise of OpenSolaris no thanks to Oracle, there’s finally a community fork available: OpenIndiana. I did the upgrade from OpenSolaris, following the instructions here, and it all seemed pretty straight forward. There were a few things that I’d installed (eg wordpress) which had dependencies on the older OpenSolaris packages, but apart from those few things, it appears like everything’s moved over to the new OpenIndiana package server nicely.

Netatalk (for my Time Machine backup) still runs perfectly.

It certainly will be interesting to see what comes from the community fork!

Cheap SATA pci card for my OpenSolaris server

So I just bought a super cheap ~ $15 Sata pci card (4 ports) for my OpenSolaris time machine server so I can add more drives in the future.

The card turns out to be a Silicon Image 3114 Raid card, which is, apparently, pretty common and popular. However, OpenSolaris doesn’t have a driver for it. There is a bug entry to make one, but it’s not yet fixed:

Reading the bug, there appears to be a work-around, but, this didn’t work for me:

# update_drv -a -i ' "pci1095,3114" ' ata
devfsadm: driver failed to attach: ata
Warning: Driver (ata) successfully added to system but failed to attach

However, a little bit more exploration revealed that my motherboard SATA controller uses the “ahci” driver, not the “ata” driver. Try give that a go:

# update_drv -a -i ' "pci1095,3114" ' ahci

Success. Well, at least no error messages…. no time to add hard disks!!!

EDIT:

The above is totally unnecessary. All I needed to do was flash the card with the latest bios, using the regular IDE bios (no RAID – not needed thanks to ZFS), and then OpenSolaris automatically recognised the card as a PCI IDE card. That’s good because it works, but not quite as fully equipped as a SATA interface that would support hot swapping for example….

Western Digital Green Lemon

I have an OpenSolaris backup machine with 2 x 1.5 TB drives mirrored. One is a Samsung Silencer, the other is a Western Digital Green drive. The silencer is, ironically, the noisier of the two, but way outperforms the WD drive.

I’ve done some failure tests on the mirror by unplugging one drive while copying files to/from the backup server from my laptop.

First, I was copying from the server, onto a single FW drive, writing at a solid 30MB/s. I disconnected the Samsung drive while it was running and the file copy proceeded without fault at about 25MB/s of the single WD drive.

`zpool status` showed the drive was UNAVAIL and that the pool would continue to work in a degraded state. When I reconnected the drive, `cfgadm` showed it as connected by unconfigured. When I reconfigured the Samsung drive, the pool automatically resilvered any missing data. (wasn’t much because I was reading from the network) in a matter of seconds.

Failure test #2 was to remove the WD drive. I copied data to the server from the laptop, and the progress was intermittent… bursts of 30MB/s, then nothing for quite a few seconds, etc…. I disconnected the WD drive, and hey presto, the transfer rate instantly jumped up to a solid 20MB/s. This samsung drive definitely writes a whole stack faster than the WD drive. (A mirror is as fast as the slowest writing drive).

And here’s the lemon part. When I reconnected the WD drive, it showed up as disconnected. The samsung was connected, but unconfigured. To my frustration, I couldn’t reconnect the drive:

$ cfgadm
Ap_Id                          Type         Receptacle   Occupant     Condition
sata1/0                        sata-port    disconnected unconfigured failed
$ cfgadm -c connect sata1/0
cfgadm: Insufficient condition
I did a bit of searching and found this page: SolarisZfsReplaceDrive : use the -f force option:
$ pfexec cfgadm -f -c connect sata1/0
Activate the port: /devices/pci@0,0/pci8086,4f4d@1f,2:0
This operation will enable activity on the SATA port
Continue (yes/no)? yes
$ cfgadm
Ap_Id                          Type         Receptacle   Occupant     Condition
sata1/0                        disk         connected    unconfigured unknown
sata1/1::dsk/c8t1d0            disk         connected    configured   ok

So, now OpenSolaris sees the drive as connected, let’s configure it and zpool should see it straight away…

$ pfexec cfgadm -c configure sata1/0
$ cfgadm
Ap_Id                          Type         Receptacle   Occupant     Condition
sata1/0::dsk/c8t0d0            disk         connected    configured   ok
sata1/1::dsk/c8t1d0            disk         connected    configured   ok
$ zpool status -x
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 0h0m, 0.00% done, 465h28m to go
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            c8t0d0s0  ONLINE       0 1.14K     0  544K resilvered
            c8t1d0s0  ONLINE       0     0     0

Oh man… I have to resilver the whole drive. Why!!??! The other drive remembered it was a part of the pool and intelligently went about resilvering the differences. This drive looks like it was to resilver the whole damn thing.

After a while:

$ zpool status
  pool: rpool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h23m, 5.05% done, 7h20m to go
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            c8t0d0s0  ONLINE       0     0     0  12.3G resilvered
            c8t1d0s0  ONLINE       0     0     0

And here’s another interesting bit… the performance of the WD drive (c8t0d0) on my machine is really poor:

$ iostat -x 5

                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   61.2    0.0 1056.2  0.0  9.1    0.0  148.1   0 100 c8t0d0
   79.0    0.0  978.7    0.0  0.0  0.0    0.0    0.6   0   3 c8t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t0d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   72.0    0.0  178.8  0.0  7.2    0.0   99.6   0 100 c8t0d0
  111.8    0.0  361.3    0.0  0.0  0.0    0.0    0.3   0   1 c8t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t0d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   51.6    0.0  120.4  0.0  7.5    0.0  145.9   0 100 c8t0d0
   79.4    0.0  143.7    0.0  0.0  0.0    0.0    0.2   0   1 c8t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t0d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   62.2    0.0 1968.5  0.0  8.3    0.0  133.7   0 100 c8t0d0
   81.8    0.0 2616.7    0.0  0.0  0.3    0.0    3.2   0   8 c8t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t0d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   34.6    0.0 1880.2  0.0  7.1    0.0  204.9   0  79 c8t0d0
   28.4   11.6 1413.5   41.7  0.0  0.1    0.0    3.1   0   7 c8t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c9t0d0

Check it out. 100% busy use of the drive, and it’s writing less than 2MB/s. Compare that to the %b busy for the Samsun (on c8t1d0) for reading the same amount of data. And check out the average service time (asvc_t) – that’s bad like a cd-rom!! Yikes.

It doesn’t get reconnect to the system, its service time is way slow and its write performance stinks. This WD drive is a total lemon!

ZFS performance networked from a Mac

Before I go ahead and do a full time machine backup to my OpenSolaris machine with a ZFS mirror, I thought I’d try and test out what performance hit there might be when using compression. I also figured, that I’d test out the impact of changing the recordsize too. Optimising this for the data record size seems to be best practices for databases, and since Time Machine stores data in a Mac Disk Image (SparseBundle) it probably writes data in 4k chunks matching the allocation size of the HFS filesystem in the disk image.

There were three copy tasks done:

  1. Copy a single large video file (1.57GB) to the Netatalk AFP share,
  2. Copy a single large video file (1.57GB) to a locally (mac) mounted disk image stored on the Netatalk AFP share,
  3. Copy a folder with 2752 files (117.3MB) to a locally (mac) mounted disk image stored on the Netatalk AFP share.

Here’s the results:

To Netatalk AFP share To Disk Image stored on AFP share To Disk Image stored on AFP share
ZFS Recordsize and compression 1 video file, 1.57GB 1 video file, 1.57GB 2752 files, 117.3MB
128k, off 0m29.826s

53.9MB/s

2m5.889s

12.7MB/s

1m45.809s

1.1MB/s

128k, on 0m52.179s

30.9MB/s

1m36.084s

16.7MB/s

1m34.367s

1.24MB/s

128k, gzip 0m31.290s

51.4MB/s

2m32.485s

10.5MB/s

2m29.141s

0.79MB/s

4k, off 0m27.131s

59.3MB/s

2m16.138s

11.8MB/s

2m47.718s

0.70MB/s

4k, on 0m25.651s

62.7MB/s

1m59.459s

13.5MB/s

1m41.551s

1.2MB/s

4k, gzip 0m30.348s

53.0MB/s

5m16.195s

5.08MB/s

4m48.378s

0.41MB/s

I think there was something else happening on the server for the 128k compression=on test, impacting on its data rate.

Conclusion:

The clear winner is the default compression and default record size. It must be that even my low powered Atom processor machine can compress the data faster than it can be written to disk resulting in less bandwidth to disc and therefore increasing performance at the same time as saving space. Well done ZFS!

Mac File sharing from OpenSolaris

I’ve just played around with 3 different ways of sharing files from OpenSolaris to my Mac:

  1. Using ZFS built in SMB sharing
  2. Using ZFS built in iSCSI sharing (with globalSAN iSCSI initiator for mac)
  3. Using AFP from netatalk 2.1

Using ZFS built in SMB sharing

This is by far the easiest, it requires no special software to install on either machine after the OS itself.

Using ZFS built in iSCSI sharing

Setting up the iSCSI share is just as easy as the SMB, however Mac doesn’t have an iSCSI client built in. You need to download and install the globalSAN iSCSI initiator for Mac.

This method should be good for Time Machine because the iSCSI device appears as a real hard drive, which you then format as Mac OS Extended and Time Machine’s funky little linked files and things should all work perfectly. No need to worry about users and accounts on the server, etc. In theory , this should deliver the best results, but it’s actually the worst performing of the lot.

Using AFP from netatalk 2.1

A little bit of work is required to install Netatalk 2.1. See my previous post and thanks again to the original posters where I learned how to do this.

This one should also be a very good candidate since it appears as a Mac server on the network and you should be able to access this shared Time Machine directly from the OS X install disc – an important consideration if the objective is to use it as a Time Machine backup (which it is for me).

Additionally, this one proved to have the best performance:

Performance

I tested copying 3GB files to each of the above shares and then reading it back again. Here’s the results:

Writing 3GB over gigabit ethernet:

iSCSI: 44m01s – 1.185MB/s

SMB share: 4m27 – 11.73MB/s

AFP share: 2m49 – 18.52MB/s

Reading 3GB over gigabit ethernet:

iSCSI: 4m36 – 11.34MB/s

SMB share: 1m45 – 29.81MB/s

AFP share: 1m16s – 41.19MB/s

The iSCSI was by far the slowest. Ten times slower than SMB – yikes! I have no idea if that’s due to the globalSAN initiator software or something else. Whatever the case, it’s not acceptable for now.

And Netatalk comes up trumps. Congratulations to everyone involved in Netatalk – great work indeed!

OpenSolaris screen sharing with Mac

I found two ways of connecting my Mac to my OpenSolaris box remotely:

1. Running a gnome-session over SSH.

$ ssh -X username@opensolaris.local gnome-session

And up pops a X11 app on the Mac and you can see the desktop. It’s slow and clunky, but it works.

2. Using Mac Snow Leopard’s built in Screen Sharing client.

This requires more configuration on the OpenSolaris side of this – apparently the Mac OS will only connect to a server that requires authentication in its expected method. This article showed me how to do it: Share your OpenSolaris 2008.11 screen to Mac Os X.

The second method is much prettier, doesn’t have windows that disappear under the Gnome application bar at the top, neatly puts everything in a window to the remote machine, and to boot it is actually way faster too.

Oh, and here’s a trick. Screen Sharing application for some reason doesn’t give you a nice interface to connect to a remote machine manually (as opposed to clicking the Share button in a Finder window). This also works, open your favourite web browser and type vnc://opensolaris.local/ to launch screen sharing on the machine “opensolaris.local” (also works with ip addresses).

Enjoy.

OpenSolaris + TimeMachine backup + network discovery

I found several tutorials on blogs about how to build “netatalk” on OpenSolaris to enable file sharing using the AFP protocol for better use by TimeMachine backups on the mac. Here are a few:

As far as making the AFP service discoverable, most people point to this article:

However, I found a better way to make the service discoverable, and that is by using the avahi-bridge. You’ll need to make sure that you have multicast dns running and the avahi bridge

1. Install netatalk according to the above instructions.

2. Install multicast dns and avahi bridge (I’m using snv_134 development branch, note that the package names have changed since 2009.06 release)
# pkg install system/network/avahi
# pkg install service/network/dns/mdns

3. enable both services (why is one called mdns and the next multicast?)

# svcadm enable network/dns/multicast:default
# svcadm enable system/avahi-bridge-dsd:default

4. setup a service xml file for avahi-bridge:

# cat > /etc/avahi/services/afp.service
<?xml version="1.0" standalone='no'?><!--*-nxml-*-->
<!DOCTYPE service-group SYSTEM "avahi-service.dtd">
<service-group>
<name replace-wildcards="yes">%h</name>
<service>
<type>_device-info._tcp</type>
<port>548</port>
<txt-record>model=RackMac</txt-record>
</service>
<service>
<type>_afpovertcp._tcp</type>
<port>548</port>
</service>
</service-group>

If you’re interested in advertising other services so that they are discoverable by the Mac, this is a great article (it’s for linux, but the avahi side of things is the same using avahi-bridge): Benjamin Sherman » Advertising Linux Services via Avahi/Bonjour.

Now I have a server in my Finder side bars just like any other mac server.