Matt Connolly's Blog

my brain dumps here…

Tag Archives: SmartOS

Happy Holidays

This is a quick happy holidays and thank you to all the people and companies that have done great things in 2013. In no particular order:

Podcasters:

I’ve enjoyed many a podcast episode this year. My favourites are the Edge Cases featuring Wolf Rentzsch and Andrew Pontious, Accidental Tech Podcast featuring John Siracusa, Casey Liss and Marco Arment and Rails Casts by Ryan Bates.
Thank you all for your hard work putting your respective shows together. Your efforts are greatly appreciated, and I hope you are getting enough out of it so that it’s worthwhile continuing in 2014!

Companies:

JetBrains, makers of Rubymine. These guys pump out great work. If you’re keen to get involved in the early access program you can get nightly or weekly builds. Twice this year I’ve submitted a bug and within a week had it verified by JetBrains, fixed, in a build and in my hands. Their CI system even updates the bug with the build number including the fix. Seriously impressive. They set the bar so high, I challenge any company (including myself) to match their effective communication rapid turn around on issues.

Joyent for actually innovating in the cloud, and your contributions to open source projects such as NodeJS and SmartOS! Pretty impressive community engagement, not only in open source code, but community events too… What a shame I don’t live in San Francisco to attend and thank you guys in person.

Github for helping open source software and providing an awesome platform for collaboration. So many projects benefit from being on Github.

Apple, thanks for making great computers and devices. Well done on 64 bit ARM. The technology improvements in iOS 7 are great, however, my new iPhone 5S doesn’t feel a single bit faster than my previous iPhone 5 due to excessive use of ease out animations which have no place in a User Interface. Too many of my bug reports have been closed as “works as intended”, when the problem is in the design not the implementation. Oh well.

Products / Services:

Strava has helped me improve in my cycling and fitness. The website and iPhone apps are shining examples of a great user experience: works well, easy to use, functional and good looking. Thanks for a great product.

Reveal App is a great way to break down the UI of an iOS app. Awesome stuff.

Twitter has been good, mostly because of how people use it. I suppose it’s more thanks to the people on Twitter who I follow.

Black Star Coffee, it’s how I start my day! Great coffee.

Technologies:

ZeroMQ: This is awesome. Reading the ZeroMQ guide was simply fantastic. This has changed my approach to communications in computing. Say goodbye to mutexes and locks and hello to messages and event driven applications. Special thanks to Pieter Hintjens for his attention to the ZeroMQ mailing lists, and to all of the contributors to a great project.

SmartOS: Totally the best way to run a hypervisor stack. The web page says it all: ZFS + DTrace + Zones + KVM. Get into it. Use ZFS. You need a file system that can verify your data. Hard drives cannot be trusted. I repeat, use ZFS.

Using ZFS Snapshots on Time Machine backups.

I use time machine because it’s an awesome backup program. However, I don’t really trust hard drives that much, and I happen to be a bit of a file system geek, so I backup my laptop an iMac to another machine that stores the data on ZFS.

I first did this using Netatalk on OpenSolaris, then OpenIndiana, and now on SmartOS. Netatalk is an open source project for running AFP (Apple Filesharing Protocol) services on unix operatings systems. It has great support for new features in the protocol required for Time Machine. As far as I’m aware, all embedded NAS devices use this software.

Sometimes, Time Machine “eats itself”. A backup will fail with a message like “Verification failed”, and you’ll need to make a new one. I’ve never managed to recover the disk from this point using Disk Utility.

My setup is RaidZ of 3 x 2TB drives, giving a total of 4TB of storage space (and 2TB redundancy). In the four years I’ve been running this, I have had 3 drives go bad and replace them. They’re cheap drives, but I’ve never lost data due to a bad disk and having to replace it. I’ve also seen silent data corruptions, and know that ZFS has corrected them for me.

Starting a new backup is a pain, so what do I do?

ZFS Snapshots

I have a script, which looks like this:

ZFS=zones/MacBackup/MattBookPro
SERVER=vault.local
if [ -n "$1" ]; then
  SUFFIX=_"$1"
fi
SNAPSHOT=`date "+%Y%m%d_%H%M"`$SUFFIX
echo "Creating zfs snapshot: $SNAPSHOT"
ssh -x $SERVER zfs snapshot $ZFS@$SNAPSHOT

This uses the zfs snapshot command to create a snapshot of the backup. There’s another one for my iMac backup. I run this script manually for the ZFS file system (directory) for each backup. I’m working on an automatic solution that listens to system logs to know when the backup has completed and the volume is unmounted, but it’s not finished yet (like many things). Running the script takes about a second.

Purging snapshots

My current list of snapshots looks like this:

matt@vault:~$ zfs list -r -t all zones/MacBackup/MattBookPro
NAME                                      USED  AVAIL  REFER  MOUNTPOINT
zones/MacBackup/MattBookPro               574G   435G   349G  /MacBackup/MattBookPro
...snip...
zones/MacBackup/MattBookPro@20131124_1344 627M      -   351G  -
zones/MacBackup/MattBookPro@20131205_0813 251M      -   349G  -
zones/MacBackup/MattBookPro@20131212_0643 0         -   349G  -

The used at the top shows the space used by this file system and all of its snapshots. The used column for the snapshot shows how much space is used by that snapshot on its own.

Purging old snapshots is a manual process for now. One day I’ll get around to keeping a snapshots on a rule like time machine’s hourly, daily, weekly rules.

Rolling back

So when Time Machine goes bad, it’s as simple as rolling back to the latest snapshot, which was a known good state.

My steps are:

  1. shut down netatalk service
  2. zfs rollback
  3. delete netatalk inode database files
  4. restart netatalk service
  5. rescan directory to recreate inode numbers (using netatalks “dbd -r ” command.)

This process is a little more involved, but still much faster than making a whole new backup.

The main reason for this is that HFS uses an “inode” number to uniquely identify each file on a volume. This is one trick that Mac Aliases use to track a file even if it changes name and moves to another directory. This concept doesn’t exist in other file systems, and so Netatalk has to maintain a database of which numbers to use for which files. There’s some rules, like inode numbers can’t be reused and they must not change for a given file.

Unfortunately, ZFS rollback, like any other operation on the server that changes files without netatalk knowing, ends up with files that have no inode number. The bigger problem seems to be deleting files and leaving their inodes in that database. This tends to make Time Machine quite unhappy about using that network share. So after a rollback, I have a rule that I nuke netatalk’s database and recreate it.

This violates the rule that inode numbers shouldn’t change (unless they magically come out the same, which I highly doubt), but this hasn’t seemed to cause a problem for me. Imagine plugging a new computer into a time machine volume, it has no knowledge of what the inode numbers were, so it just starts using them as is. It’s more likely to be an issue for Netatalk scanning a directory and seeing inodes for files that are no longer there.

Recreating the netatalk inode database can take an hour or two, but it’s local to the server and much faster than a complete network backup which also looses your history.

Conclusion

This used to happen a lot. Say once every 3-4 months when I first started doing it. This may have been due to bugs in Time Machine, bugs in Netatalk or incompatibilities between them. It certainly wasn’t due to data corruptions.

Pros:

  • Time Machine, yay!
  • ZFS durability and integrity.
  • ZFS snapshots allow point in time recovery of my backup volume.
  • ZFS on disk compression to save backup space!
  • Netatalk uses standard AFP protocol, so time machine volume can be accessed from your restore partition or a new mac – no extra software required on the mac!

Cons:

  • Effort – complexity to manage, install & configure netatalk, etc.
  • Rollback time.
  • Network backups are slow.

As time has gone on, both Time Machine and Netatalk have improved substantially. And I’ve added an SSD cache to the server, and its is swimmingly fast and reliable. And thanks to ZFS, durable and free of corruptions. I think I’ve had this happen only twice in the last year, and both times was on Mountain Lion. I haven’t had to do a single rollback since starting to use Mavericks beta back around June.

Where to from here?

I’d still like to see a faster solution, and I have a plan: a network block device.

This would, however, require some software to be installed on the mac, so it may not be as easy to use in a disaster recover scenario.

ZFS has a feature called a “volume”. When you create one, it appears to the system (that’s running zfs) as another block device, just like a physical hard disk, or file. A file system can be created on this volume which can then be mounted locally. I use this for the disks in virtual machines, and can snapshot them and rollback just as if they were a file system tree of files.

There’s an existing server module that’s been around for a while: http://nbd.sourceforge.net

If this volume could be mounted across the network on a mac, the volume could be formatted as HFS+ and Time Machine could backup to it using local disk mode, skipping all the slow sparse image file system work. And there’s a lot of work. My time machine backup of a Mac with a 256GB disk creates a whopping 57206 files in the bands directory of the sparseimage. It’s a lot of work to traverse these files, even locally on the server.

This is my next best solution to actually using ZFS on mac. Whatever “reasons” Apple has for ditching them are not good enough simply because we don’t know what they are. ZFS is a complex beast. Apple is good at simplifying things. It could be the perfect solution.

Network latency in SmartOS virtual machines

Today I decided to explore network latency in SmartOS virtual machines. Using the rbczmq ruby gem for ZeroMQ, I made two very simple scripts: a server that replies “hello” and a benchmark script that times how long it takes to send and receive 5000 messages after establishing the connection.

The server code is:

require 'rbczmq'
ctx = ZMQ::Context.new
sock = ctx.socket ZMQ::REP
sock.bind("tcp://0.0.0.0:5555")
loop do
  sock.recv
  sock.send "reply"
end

The benchmark code is:

require 'rbczmq'
require 'benchmark'

ctx = ZMQ::Context.new
sock = ctx.socket ZMQ::REQ
sock.connect(ARGV[0])

# establish the connection
sock.send "hello"
sock.recv

# run 5000 cycles of send request, receive reply.
puts Benchmark.measure {
  5000.times {
    sock.send "hello"
    sock.recv
  }
}

The test machines are:

* Mac laptop – server & benchmarking
* SmartOS1 (SmartOS virtual machine/zone) server & benchmarking
* SmartOS2 (SmartOS virtual machine/zone) benchmarking
* Linux1 (Ubuntu Linux in KVM virtual machine) server & benchmarking
* Linux2 (Ubuntu Linux in KVM virtual machine) benchmarking

The results are:

Source      Dest        Connection      Time          Req-Rep/Sec
------      ----        ----------      ----          --------
Mac         Linux1      1Gig Ethernet   5.038577      992.3
Mac         SmartOS1    1Gig Ethernet   4.972102      1005.6
Linux2      Linux1      Virtual         1.696516      2947.2
SmartOS2    Linux1      Virtual         1.153557      4334.4
Linux2      SmartOS1    Virtual         0.952066      5251.8
Linux1      Linux1      localhost       0.836955      5974.0
Mac         Mac         localhost       0.781815      6395.4
SmartOS2    SmartOS1    Virtual         0.470290      10631.7
SmartOS1    SmartOS1    localhost       0.374373      13355.7

localhost tests use 127.0.0.1

SmartOS has an impressive network stack. Request-reply times from one SmartOS machine to another are over 3 times faster than when using Linux under KVM (on the same host). This mightn’t make much of a difference to web requests coming from slow mobile device connections, but if your web server is making many requests to internal services (database, cache, etc) this could make a noticeable difference.

Building netatalk in SmartOS

I’m looking at switching my home backup server from OpenIndiana to SmartOS. (there’s a few reasons, and that’s another post).

One of the main functions of my box is to be a Time Machine backup for my macs (my laptop and my wife’s iMac). I found this excellent post about building netatalk 3.0.1 in SmartOS, but it skipped a few of the dependencies, and did the patch after configure, which means if you change you reconfigure netatalk, then you need to reapply the patch.

Based on that article, I came up with a patch for netatalk, and here’s a gist of it: https://gist.github.com/mattconnolly/5230461

Prerequisites:

SmartOS already has most of the useful bits installed, but these are the ones I needed to install to allow netatalk to build:

$ sudo pkgin install gcc47 gmake libgcrypt

Build netatalk:

Download the latest stable netatalk. The netatalk home page has a handy link on the left.

$ cd netatalk-3.0.2
$ curl 'https://gist.github.com/mattconnolly/5230461/raw/27c02a276e7c2ec851766025a706b24e8e3db377/netatalk-3.0.2-smartos.patch' > netatalk-smartos.patch
$ patch -p1 < netatalk-smartos.patch
$ ./configure --with-bdb=/opt/local --with-init-style=solaris --with-init-dir=/var/svc/manifest/network/ --prefix=/opt/local
$ make
$ sudo make install

With the prefix of ‘/opt/local’ netatalk’s configuration file will be at ‘/opt/local/etc/afp.conf’

Enjoy.

[UPDATE]

There is a very recent commit in the netatalk source for an `init-dir` option to configure which means that in the future this patch won’t be necessary, and adding `--with-init-dir=/var/svc/manifest/network/` will do the job. Thanks HAT!

[UPDATE 2]

Netatalk 3.0.3 was just released, which includes the –init-dir option, so the patch is no longer necessary. Code above is updated.

Comparing Amazon EC2 to Joyent SmartOS

Recently, I’ve been using Amazon web services (EC2, especially) quite a bit more at work. At home, I still use OpenIndiana, so I’ve been really interested in comparing Joyent’s offerings against Amazons first hand. In particular, my tasks I have in Amazon’s cloud always feel CPU bound, so I’ve decided to do a comparison of just CPU performance, giving some context to Amazon’s jargon ECU (Elastic Compute Unit) by comparing it with a Joyent SmartOS instance, as well as my MacBook Pro, iMac and OpenIndiana server.

So I spun up a Joyent Micro SmartOS instance and an Amazon EC2 linux Micro and small instances.

Joyent startup is impressive. The workflow is simple and easy to understand. I chose the smartosplus64 machine just because it was near the top of the list.

Amazon startup is about what I’ve learned to expect. Many more pages of settings later we’re up and running.

Installing ruby 1.9.3 with RVM

Ubuntu linux has fantastic community support, and many packages just work out of the box. Following the RVM instructions was easy to get it installed.

SmartOS, like OpenIndiana often requires a bit more work.

I made this patch to get ruby to compile: https://gist.github.com/4104287
Thanks to this article: http://www.hiawatha-webserver.org/forum/topic/1177

A Simple Benchmark

Here’s a really quick ruby benchmark, that will sort 5 million random numbers in a single thread:

require 'benchmark'

array = (1..5000000).map { rand }
Benchmark.bmbm do |x|
  x.report("sort!") { array.dup.sort! }
  x.report("sort") { array.dup.sort }
end

I also tested my MacBook Pro, my iMac and my Xeon E3 OpenIndiana server to get some perspective.

Here’s the results:

Machine Benchmark (sec)
MacBook Pro 2.66gHz core i7 (2010) 86.99
iMac 24″ 2.5GHz core i5 (2012) 19.30
Xeon E3-1230 3.2GHz OpenIndiana server 35.57
Joyent EXTRA SMALL SmartOS 64-bit 55.10
Amazon MICRO Ubuntu 64-bit 361.42
Amazon SMALL Ubuntu 64-bit 123.69

Snap. Amazon is *SLOW*! And iMac the surprise winner!

And so what is this Elastic Compute Unit (ECU) jargon that Amazon have created? Since the Amazon Small instance is 1 ECU, we can reverse measure the others into compute units. And by converting their hourly price to a monthly price (* 24 hours * 365.25 days / 12 months), we can also determine the price per ECU:

Machine Benchmark (sec) $/hour ECUs $/month/ECU
MacBook Pro 2.66gHz core i7 (2010) 86.99 1.422
iMac 24″ 2.5GHz core i5 (2012) 19.30 6.409
Xeon E3-1230 3.2GHz OpenIndiana server 35.57 3.477
Joyent EXTRA SMALL SmartOS 64-bit ruby 55.10 $0.03 2.245 $9.76
Amazon MICRO Ubuntu 64-bit 361.42 $0.02 0.342 $42.69
Amazon SMALL Ubuntu 64-bit 123.69 $0.07 1.000 $47.48

Snap. Amazon is *EXPENSIVE*!

My laptop with 4 threads could do the CPU work of 5.7 small amazon EC2 instances, worth $270/month. And my Xeon box with 8 threads could do the work of 27.8 small instances, worth $1320/month. (I built the whole machine for $1200!!). Mind you, these comparisons are on the native operating system, but if you’re running a machine in house this is an option, so might be worth consideration.

I’ve read that comparing SmartOS to Linux in a virtual machine isn’t a fair comparison because you’re not comparing apples with apples; one is operating system level virtualisation (Solaris Zones), the other is a full virtual machine (Xen Hypervisor). Well tough. All I need to do is install tools and my code and get work done. And if I can do that faster then that is a fair comparison.

Conclusion

Joyent CPU comes in more than 4 times cheaper than Amazon EC2.

Amazon need to lift their game in terms of CPU performance. They offer a great service that obviously extends far beyond a simple CPU benchmark. But when you can get the same work done in Joyent significantly faster for the comparable price, you’ll get far more mileage per instance, which is ultimately going to save the dollars.

 

EDIT: 19/11/12: Joyent’s machine is called “Extra Small”, not Micro as I originally had it.