Matt Connolly's Blog

my brain dumps here…

Tag Archives: OpenIndiana

Building netatalk in SmartOS

I’m looking at switching my home backup server from OpenIndiana to SmartOS. (there’s a few reasons, and that’s another post).

One of the main functions of my box is to be a Time Machine backup for my macs (my laptop and my wife’s iMac). I found this excellent post about building netatalk 3.0.1 in SmartOS, but it skipped a few of the dependencies, and did the patch after configure, which means if you change you reconfigure netatalk, then you need to reapply the patch.

Based on that article, I came up with a patch for netatalk, and here’s a gist of it: https://gist.github.com/mattconnolly/5230461

Prerequisites:

SmartOS already has most of the useful bits installed, but these are the ones I needed to install to allow netatalk to build:

$ sudo pkgin install gcc47 gmake libgcrypt

Build netatalk:

Download the latest stable netatalk. The netatalk home page has a handy link on the left.

$ cd netatalk-3.0.2
$ curl 'https://gist.github.com/mattconnolly/5230461/raw/27c02a276e7c2ec851766025a706b24e8e3db377/netatalk-3.0.2-smartos.patch' > netatalk-smartos.patch
$ patch -p1 < netatalk-smartos.patch
$ ./configure --with-bdb=/opt/local --with-init-style=solaris --with-init-dir=/var/svc/manifest/network/ --prefix=/opt/local
$ make
$ sudo make install

With the prefix of ‘/opt/local’ netatalk’s configuration file will be at ‘/opt/local/etc/afp.conf’

Enjoy.

[UPDATE]

There is a very recent commit in the netatalk source for an `init-dir` option to configure which means that in the future this patch won’t be necessary, and adding `--with-init-dir=/var/svc/manifest/network/` will do the job. Thanks HAT!

[UPDATE 2]

Netatalk 3.0.3 was just released, which includes the –init-dir option, so the patch is no longer necessary. Code above is updated.

Comparing Amazon EC2 to Joyent SmartOS

Recently, I’ve been using Amazon web services (EC2, especially) quite a bit more at work. At home, I still use OpenIndiana, so I’ve been really interested in comparing Joyent’s offerings against Amazons first hand. In particular, my tasks I have in Amazon’s cloud always feel CPU bound, so I’ve decided to do a comparison of just CPU performance, giving some context to Amazon’s jargon ECU (Elastic Compute Unit) by comparing it with a Joyent SmartOS instance, as well as my MacBook Pro, iMac and OpenIndiana server.

So I spun up a Joyent Micro SmartOS instance and an Amazon EC2 linux Micro and small instances.

Joyent startup is impressive. The workflow is simple and easy to understand. I chose the smartosplus64 machine just because it was near the top of the list.

Amazon startup is about what I’ve learned to expect. Many more pages of settings later we’re up and running.

Installing ruby 1.9.3 with RVM

Ubuntu linux has fantastic community support, and many packages just work out of the box. Following the RVM instructions was easy to get it installed.

SmartOS, like OpenIndiana often requires a bit more work.

I made this patch to get ruby to compile: https://gist.github.com/4104287
Thanks to this article: http://www.hiawatha-webserver.org/forum/topic/1177

A Simple Benchmark

Here’s a really quick ruby benchmark, that will sort 5 million random numbers in a single thread:

require 'benchmark'

array = (1..5000000).map { rand }
Benchmark.bmbm do |x|
  x.report("sort!") { array.dup.sort! }
  x.report("sort") { array.dup.sort }
end

I also tested my MacBook Pro, my iMac and my Xeon E3 OpenIndiana server to get some perspective.

Here’s the results:

Machine Benchmark (sec)
MacBook Pro 2.66gHz core i7 (2010) 86.99
iMac 24″ 2.5GHz core i5 (2012) 19.30
Xeon E3-1230 3.2GHz OpenIndiana server 35.57
Joyent EXTRA SMALL SmartOS 64-bit 55.10
Amazon MICRO Ubuntu 64-bit 361.42
Amazon SMALL Ubuntu 64-bit 123.69

Snap. Amazon is *SLOW*! And iMac the surprise winner!

And so what is this Elastic Compute Unit (ECU) jargon that Amazon have created? Since the Amazon Small instance is 1 ECU, we can reverse measure the others into compute units. And by converting their hourly price to a monthly price (* 24 hours * 365.25 days / 12 months), we can also determine the price per ECU:

Machine Benchmark (sec) $/hour ECUs $/month/ECU
MacBook Pro 2.66gHz core i7 (2010) 86.99 1.422
iMac 24″ 2.5GHz core i5 (2012) 19.30 6.409
Xeon E3-1230 3.2GHz OpenIndiana server 35.57 3.477
Joyent EXTRA SMALL SmartOS 64-bit ruby 55.10 $0.03 2.245 $9.76
Amazon MICRO Ubuntu 64-bit 361.42 $0.02 0.342 $42.69
Amazon SMALL Ubuntu 64-bit 123.69 $0.07 1.000 $47.48

Snap. Amazon is *EXPENSIVE*!

My laptop with 4 threads could do the CPU work of 5.7 small amazon EC2 instances, worth $270/month. And my Xeon box with 8 threads could do the work of 27.8 small instances, worth $1320/month. (I built the whole machine for $1200!!). Mind you, these comparisons are on the native operating system, but if you’re running a machine in house this is an option, so might be worth consideration.

I’ve read that comparing SmartOS to Linux in a virtual machine isn’t a fair comparison because you’re not comparing apples with apples; one is operating system level virtualisation (Solaris Zones), the other is a full virtual machine (Xen Hypervisor). Well tough. All I need to do is install tools and my code and get work done. And if I can do that faster then that is a fair comparison.

Conclusion

Joyent CPU comes in more than 4 times cheaper than Amazon EC2.

Amazon need to lift their game in terms of CPU performance. They offer a great service that obviously extends far beyond a simple CPU benchmark. But when you can get the same work done in Joyent significantly faster for the comparable price, you’ll get far more mileage per instance, which is ultimately going to save the dollars.

 

EDIT: 19/11/12: Joyent’s machine is called “Extra Small”, not Micro as I originally had it.

OpenIndiana – running openvpn as a service

Here’s a gist for the XML manifest to run openvpn as a service:

https://gist.github.com/2484917

It expects that there is an openvpn config file at /etc/openvpn/config which, you’ll need to configure with your settings, certificates, etc.

If you configure it to run a tap interface then bonjour advertising will work over the link, which is great if you want time machine or other bonjour services to work to an OpenIndiana server from a mac connecting from anywhere with openvpn.

Passenger apache module for OpenIndiana

I did a bit of hunting and made some patches to the ‘passenger’ gem so that it’s apache module would compile for OpenIndiana. Changes are in my github fork:

https://github.com/mattconnolly/passenger

And I just noticed that one of the fixes was in a patch in Joyent’s SmartOS instructions for using passenger.

I tested this also on a VM guest installation of Solaris 11 Express, and it worked too. I’d be interested to hear if it works for others on OpenIndiana, Solaris or SmartOS.

So with updates to rvm, latest version of ruby and with this patched version of passenger, I’m finally good to go to deploy rails apps on OpenIndiana. Woot!

ZFS = Data integrity

So, for a while now, I’ve been experiencing crappy performance of a Western Digital Green drive (WD15EARS) I have an a zfs mirror storing my time machine backups (using OpenIndiana and Netatalk).

Yesterday, the drive started reporting errors. Unfortunately, the system hung – that’s not so cool – ZFS is supposed to keep working when a drive fails… Aside from that, when I rebooted, the system automatically started a scrub to verify data integrity, and after about 10 minutes:

  pool: rpool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Thu Mar 10 10:19:42 2011
    1.68G scanned out of 1.14T at 107M/s, 3h5m to go
    146K resilvered, 0.14% done
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         DEGRADED     0     0     0
          mirror-0    DEGRADED     0     0     0
            c8t1d0s0  DEGRADED     0     0    24  too many errors  (resilvering)
            c8t0d0s0  ONLINE       0     0     0
        cache
          c12d0s0     ONLINE       0     0     0

errors: No known data errors

Check it out. It’s found 24 errors on the Western Digital Drive, but so far no data errors have been found, because they were correct on the other drive.

That’s obvious, right? But what other operating systems can tell the difference between the right and wrong data when they’re both there??? Most raid systems only detect a total drive failure, but don’t deal with incorrect data coming off the drive !!

Sure backing up to a network (Time Machine’s sparse image stuff) is *way* slower than a directly connected firewire drive, but in my opinion, it’s well worth doing it this way for the data integrity that you don’t get on a single USB or Firewire drive.

Thank you ZFS for keeping my data safe. B*gger off Western Digital for making crappy drives. I’m off to get a replacement today… what will it be? Samsung or Seagate?

Building Netatalk from source on OpenIndiana

On my OpenIndiana backup server, I’ve built quite a few packages from source over the last year. Today I went to try building netatalk. It required a few more things to get it going:

# pkg install developer/build/libtool
# pkg install developer/build/automake-110

Now, for some reason, the automake package installs `aclocal-1.10` but not `aclocal` which the netatalk bootstrap script looks for.

# ln -s /usr/bin/aclocal-1.10 /usr/bin/aclocal
# ln -s /usr/bin/automake-1.10 /usr/bin/automake

Then I can proceed with the normal build process:
$ ./bootstrap
$ ./configure
$ make
$ sudo make install

ZFS for Mac Coming soon…

A little birdy told me, that there might be a new version of ZFS ported to Mac OS X coming up soon…

It seems the guys at Tens Compliment are working on a port of ZFS at a much more recent version than what was left behind by apple and forked as a Google code project: http://code.google.com/p/maczfs/

On my mac, I have installed the Mac-ZFS which can be found at the Google Code project. (I don’t have any ZFS volumes, it’s installed because I wanted to know what version it was up to.)

bash-3.2# uname -prs
Darwin 10.6.0 i386
bash-3.2# zpool upgrade
This system is currently running ZFS pool version 8.

All pools are formatted using this version.

My backup server at home is running OpenIndiana oi-148:

root@vault:~# uname -prs
SunOS 5.11 i386
root@vault:~# zpool upgrade
This system is currently running ZFS pool version 28.

All pools are formatted using this version.

Pretty exciting that we can get the same zpool version as the latest OpenIndiana… think of the backup/restore possibilities sending a snapshot over to a remote machine.

ZFS – dud hard drive slowing whole system

I have a low-power server running OpenIndiana oi-148. It has 4GB RAM and with three drives in it, like so:

matt@vault:~$ zpool status
  pool: rpool
 state: ONLINE
 scan: resilvered 588M in 0h3m with 0 errors on Fri Jan  7 07:38:06 2011
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            c8t1d0s0  ONLINE       0     0     0
            c8t0d0s0  ONLINE       0     0     0
        cache
          c12d0s0     ONLINE       0     0     0

errors: No known data errors

I’m running netatalk file sharing for mac, and using it as a time machine backup server for my mac laptop.

When files are copying to the server, I often see periods of a minute or so where network traffic stops. I’m convinced that there’s some bottleneck in the storage side of things because when this happens, I can still ping the machine and if I have an ssh window, open, I can still see output from a `top` command running smoothly. However, if I try and do anything that touches disk (eg `ls`) that command stalls. At the time it comes good, everything comes good, file copies across the network continue, etc.

If I have a ssh terminal session open and run `iostat -nv 5` I see something like this:

                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    1.2   36.0  153.6 4608.0  1.2  0.3   31.9    9.3  16  18 c12d0
    0.0  113.4    0.0 7446.7  0.8  0.1    7.0    0.5  15   5 c8t0d0
    0.2  106.4    4.1 7427.8  4.0  0.1   37.8    1.4  93  14 c8t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.4   73.2   25.7 9243.0  2.3  0.7   31.6    9.8  34  37 c12d0
    0.0  226.6    0.0 24860.5  1.6  0.2    7.0    0.9  25  19 c8t0d0
    0.2  127.6    3.4 12377.6  3.8  0.3   29.7    2.2  91  27 c8t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   44.2    0.0 5657.6  1.4  0.4   31.7    9.0  19  20 c12d0
    0.2   76.0    4.8 9420.8  1.1  0.1   14.2    1.7  12  13 c8t0d0
    0.0   16.6    0.0 2058.4  9.0  1.0  542.1   60.2 100 100 c8t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.2    0.0   25.6  0.0  0.0    0.3    2.3   0   0 c12d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
    0.0   11.0    0.0 1365.6  9.0  1.0  818.1   90.9 100 100 c8t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.2    0.0    0.1    0.0  0.0  0.0    0.1   25.4   0   1 c12d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
    0.0   17.6    0.0 2182.4  9.0  1.0  511.3   56.8 100 100 c8t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c12d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
    0.0   16.6    0.0 2058.4  9.0  1.0  542.1   60.2 100 100 c8t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c12d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
    0.0   15.8    0.0 1959.2  9.0  1.0  569.6   63.3 100 100 c8t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.2    0.0    0.1    0.0  0.0  0.0    0.1    0.1   0   0 c12d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
    0.0   17.4    0.0 2157.6  9.0  1.0  517.2   57.4 100 100 c8t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c12d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
    0.0   18.2    0.0 2256.8  9.0  1.0  494.5   54.9 100 100 c8t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c12d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c8t0d0
    0.0   14.8    0.0 1835.2  9.0  1.0  608.1   67.5 100 100 c8t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.2    0.0    0.1    0.0  0.0  0.0    0.1    0.1   0   0 c12d0
    0.0    1.4    0.0    0.6  0.0  0.0    0.0    0.2   0   0 c8t0d0
    0.0   49.0    0.0 6049.6  6.7  0.5  137.6   11.2 100  55 c8t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   55.4    0.0 7091.2  1.9  0.6   34.9    9.9  27  28 c12d0
    0.2  126.0    8.6 9347.7  1.4  0.1   11.4    0.6  20   7 c8t0d0
    0.0  120.8    0.0 9340.4  4.9  0.2   40.5    1.5  77  18 c8t1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    1.2   57.0  153.6 7271.2  1.8  0.5   31.0    9.4  26  28 c12d0
    0.2  108.4   12.8 6498.9  0.3  0.1    2.5    0.6   6   5 c8t0d0
    0.2  104.8    5.2 6506.8  4.0  0.2   38.2    1.4  67  15 c8t1d0

The stall occurs when the drive c8t1d0 is 100% waiting, and doing only slow i/o, typically writing about 2MB/s. However, the other drive is all zeros… doing nothing.

The drives are:
c8t1d0 – Western Digital Green – SATA_____WDC_WD15EARS-00Z_____WD-WMAVU2582242
c8t0d0 – Samsung Silencer – SATA_____SAMSUNG_HD154UI_______S1XWJDWZ309550

I’ve installed smartmon and done a short and long test on both drives, all resulting in no found errors.

I expect that the c8t1d0 WD Green is the lemon here and for some reason is getting stuck in periods where it can write no faster than about 2MB/s. Why? I don’t know…

Secondly, what I wonder is why it is that the whole file system seems to hang up at this time. Surely if the other drive is doing nothing, a web page can be served by reading from the available drive (c8t0d0) while the slow drive (c8t1d0) is stuck writing slow. Is this a bug in ZFS?

If anyone has any ideas, please let me know!

ZFS saved my Time Machine backup

For a while now, I’ve been using Time Machine to backup to an AFP share hosted by netatalk on an OpenIndiana low powered home server.

Last night, Time Machine stopped, with an error message: “Time Machine completed a verification of your backups. To improve reliability, Time Machine must create a new backup for you“.

Periodically I create ZFS snapshots of the volume containing my Time Machine backup. I haven’t enabled any automatic snapshots yet (like OpenIndiana/Solaris’s Time Slider service), so I just do it manually every now and then.

So, I shutdown netatalk, rolled back the snapshot, checked the netatalk database, restarted netatalk, and was then back in business.

# /etc/init.d/netatalk stop
# zfs rollback rpool/MacBackup/TimeMachine@20100130
# /usr/local/bin/dbd -r /MacBackup/TimeMachine
# /etc/init.d/netatalk start

Lost only a day or two’s incremental backups, which was much more palatable than having to do another complete backup of >250GB.

ZFS is certainly proving to be useful, even in a low powered home backup scenario.

Goodbye OpenSolaris, Hello OpenIndiana

After the demise of OpenSolaris no thanks to Oracle, there’s finally a community fork available: OpenIndiana. I did the upgrade from OpenSolaris, following the instructions here, and it all seemed pretty straight forward. There were a few things that I’d installed (eg wordpress) which had dependencies on the older OpenSolaris packages, but apart from those few things, it appears like everything’s moved over to the new OpenIndiana package server nicely.

Netatalk (for my Time Machine backup) still runs perfectly.

It certainly will be interesting to see what comes from the community fork!