Hating on HATEOAS#

Wed Jan 5 13:39:09 2011

Topics:

In 2011 I will not start blog posts with the word "so".

Lately I've been thinking about RESTfulness again. An observation that has been widely made is that Roy T. Fielding's definition of REST differs wildly from what most of the rest (sorry) of the world thinks it is - while all people with taste and discrimination must surely agree that the trend from evil-tasting SOAPy stuff back to simple HTTP-based APIs is a Good Thing, the "discoverability" and "hypertext" aspects of Canonical REST are apparently not so widely considered as important for practical use.

My own small contribution to this debate is that the reason people are not trying to do HATEOAS is that they've been told that the web at large - the large part of the WWW that's mediated through ordinary web browsers under the direction of human brains - is an example of how it works. And the more I think about it the more I think that the example is rubbish and unhelpful.

It's a rubbish example because the browsers through which we're viewing these resources have very limited support for most of the HTTP verbs and HTTP response codes that REST requires, hence silly workarounds like tunnelling PUT inside POST using _method=put. In a way that's a trivial complaint because the workarounds do exist, but it's still a mess. (Note for purists: I write "REST requires" when what I really mean is "HTTP defines", but you don't qualify for the "RESTful" badge if you're misusing HTTP, and there's little social cachet in describing an API as consensual HTTP )

It's also a rubbish example because humans tend to expect a multi-stage "workflow" or "wizard" interaction, but HTML has lousy support for updating state and indicating a transition at the same time. A representation of a resource might include a Form to update the state of that resource, but it says nothing about what you can do when it's been updated. Alternatively (or additionally) it might include a navigation link to another resource, but that will be fetched with a GET and won't change anything server-side. Let's take a typical shopping cart as example: a form with two buttons for "update quantity" and "go to checkout" - whichever button you press, the resource that gets POSTed to is the same in either case, and any application state transition that might happen after you click is driven by the server sending a redirect (or not) - in effect, the data sent by the client smooshes together both the updated resource state and the navigation, which doesn't smell to me like hypertext. And as a side note, we may yet decide to ignore the client's indication of where it wants to go next if the data supplied is not valid for the current state of the resource, and instead send another copy of the shopping cart page prefixed with a pretty red box that says "sorry, you can't have 3.2j widgets" - and in all probability send it with a "200 OK" response code because there's no point sending any fancy kind of 40x when you don't know whether the browser will display it or will substitute with its own error page.

And thirdly it's a rubbish example because of the browser history stack and the defensive server-side programming that becomes necessary when your users start to treat your story as a Choose Your Own Adventure game. The set of state transitions available to the user is in practice not just the ones in the document you're showing him, but also all the other ones you've shown him in any of n previous documents: some of them may still be allowed, but others (changing the order details after you've charged his card) may not. Sending him "409 conflict" in these situations is probably not going to make him any wiser - you're going to have to think about the intention behind his navigational meander and do something that makes sense for the mental model you think he has. Once the user has hit the Back button and desynced the application state from the server-side resource state, you're running to catch up.

To summarise, a web application designed for humans needs to support human-friendly navigation and validation in ways which current browsers can't while keeping true to the intended uses of HTML and HTTP and RESTful style in general. This doesn't mean I think HATEOAS is bad as a concept - I just think we should be looking elsewhere than the human-driven web for an example of where it's good (and I haven't really found a compelling one yet).

I have a nasty feeling that the comments on this site are presently broken, but responses by email (dan @ telent.net) are welcome - please say if you want your email published or not.

How to create a diskless elastichosts node#

Sat Jan 22 21:35:28 2011

Topics:

Elastichosts is a PAYG (or monthly contract) "cloud" virtual server provider based on the Linux kvm technology. At $WORK we use it to provide a horizontally scalable app service, and we need to be able to add new app servers in less time than it takes to copy a complete working Debian system. Also we want to be running the same version of the same software on every server (think "security updates") and we don't want to be paying for another 3GB of Debian that we don't really need on each box. So, we need that stuff to be shared.

Elastichosts don't directly support kvm snapshots (or they didn't when I asked them about it) which leaves us looking for alternative ways to do the same thing. This blog entry describes one such approach: we use a read-only CD image for the root filesystem and then mount /usr and /home over NFS and a ramdisk (populated at boot) on /var. It's all done using standard Debian tools and Debian setup as of the "squeeze" 6.0 release.

The finished thing is on github at https://github.com/telent/squeeze-cd-nfsroot/ . To use, basically you clone the repo into /usr/local/client, edit the files, and run make. Slightly less basically, you almost certainly need to know what edits to make to which files, and you may also want to know how it works anyway. So read on ...

(Yes, you should be able to clone it elsewhere because I shouldn't have hardcoded that directory name into the Makefile. This may be fixed in a future version if I ever find the need to install it somewhere else myself. Or see the 'conclusion' section if you want to fix it yourself)

How the client boots

the client boots off a CD (ISO9660) image created by initramfs-tools which is configured to look for an nfsroot directory. This directory is created on the server by a Makefile rule that copies the server's root dir and the replaces, renames and changes a bunch of stuff in /etc

it then mounts a ramdisk on /tmp and another on /var. There is an initscript populate_var which creates all the empty directories that daemons will expect when they start up. Note that these directories are entirely ephemeral, which means for example that syslog must be configured to log remotely

it mounts /usr and /home (readonly) directly from the server. This means that most of the packages on the server are available immediately on the clients - unless they include config files in /etc, in which case they aren't until you rerun the Makefile that creates the nfsroot (after, possibly, adjusting the config appropriately for the client)

A short guide to customising the system

These files are copied to the client - you may want to review their contents

template/etc/fstab needs to have the right hostname for your NFS server
template/etc/initramfs-tools/initramfs.conf - check DEVICE and NFSROOT settings
template/etc/network/interfaces may need tweaking
template/etc/resolv.conf is set up for our network, not yours
template/etc/init.d/populate_var might need directories added or chown invocations removed, depending on what packages you have installed
template/etc/rsyslog.conf needs editing for the syslog server's IP address

And also

insserv calls in Makefile may need adjusting if you have other services on the server that you don't want to also run on the client

And on the server

you'll need to be exporting the nfsroot/ directory as NFS, ditto /home and /usr. My /etc/exports looks something like this

/usr/local/client/nfsroot 10.0.0.0/24(ro,no_root_squash,no_subtree_check)
/home 10.0.0.0/24(ro,no_root_squash,no_subtree_check)
/usr 10.0.0.0/24(ro,no_root_squash,no_subtree_check)

you need to run a dhcp server (I use "dnsmasq", which also provides DNS service). Make sure this is only running on your vlan address: I don't know whether elastichosts will filter out rogue DHCP servers running on their network or will just come around and break your fingers for trying, but either way it's not a good idea.

If you want the clients able to syslog, you need to configure the syslog server to accept syslog messages from them. rsyslogd seems to be standard in Debian these days - I mention this because it does remote syslogging over TCP not the traditional UDP, so make sure both ends are speaking the same protocol and you don't have iptables rules between them that are dropping your messages on the floor.

How we build the files

The nfsroot

Creating the nfsroot is done by the Makefile rootfs target

It starts by rsyncing the real root into nfsroot/ with a whole bunch of exclusions, then copies files from template/ over the copied files to cater for the bits that need a different configuration on the client than they do on the server, then does some other fiddling around. Most notable:

we have to copy libcrypto and libz into /usr because the dhcp client needs those libraries and it runs before /usr is mounted (see http://bugs.debian.org/592361 - though according to that page this bug is now fixed)

we blat the generated files etc/udev/rules.d/*persistent* which are correct for the server but not for the client.

Debian will run better with no /etc/hostname than it will with the wrong one

Debian squeeze uses a slightly exciting parallelising dependency-based system for running init scripts, so we can't just copy files into init.d, we need to run insserv to make it see then. (As a long-time Unix user who doesn't pay enough attention when these kinds of changes are made, this took ages to work out). Similarly to disable daemons that run only on the server, we use insserv -r.

a couple of files need to be writable, so we replace them with symlinks
- @/etc/network/run@ is pointed to /lib/init/rw
- @/etc/mtab@ is pointed to /proc/mounts

We create our own etc/resolv.conf. Our elastichosts clients generally have a public (dynamically allocated) IP address assigned to eth0 and a vlan attached to eth1. DHCP gets exciting here: the client boots off eth1 and gets the address of that interface using boot-time kernel code, then runs the user-space dhclient tool to get an eth0 address, and we'd rather one rely on the conjunction of all that to get /etc/resolv.conf right

populate_var pretty much does what it says on the tin but might need more directories adding/removing depending on what you have installed

The initramfs

The Makefile ramfs.img target makes an initramfs image which knows how to mount root on nfs. This particular magic is built into Debian and the only particular point of note here is that we use nfsroot/etc/initramfs-tools as the config directory so we know we're generating a config for the client without treading on the server's usual initramfs config (which it might need when it boots itself). In our setup the only file that's actually changed is template/etc/initramfs-tools/initramfs.conf which has settings for BOOT, DEVICE, NFSROOT that probably differ from what the server wants for itself

Creating the cd image

This is pretty straightforward too. The Makefile boot_cd.iso target runs mkisofs to generate a CD image using the initramfs image and other files taken from isolinux.

Uploading it

We had to slightly patch the elastichost-upload script to add the ability to create shared images as well as exclusive ones. This is controlled by the api key claim:type, which the elastichosts API docs describe as follows: "either 'exclusive' (the default) or 'shared' to allow multiple servers to access a drive simultaneously"

The patched version is in the git repo, accompanied by the patch

Once you've uploaded the first one you can uncomment the DRIVE_UUID param at the top of Makefile so that subsequent attempts update the same drive instead of creating a new one every time.

Conclusion

There you have it. It's certainly a bit rough and ready right now and requires editing a few too many files to be completely turnkey, but hopefully it will save someone somewhere some time. If you have bug fixes, send me patches (or fork it on github and send me pull requests); if you have suggestions, my inbox is open; if you know you need something like this but can't understand what I'm writing about, my consulting rates are reasonable ;-)

⟪ Dec 2010 Feb 2011 ⟫