NixOS (again) - declarative VMs with QEMU#
Fri, 20 Oct 2017 08:43:49 +0000
I built a new PC to sit in the study at home. This isn't going to be a blog post about that, though: it all worked the first time and so there is nothing to rant about. The new box is smaller, quieter and faster than the old one, as it should be given that the old one is about 9 years old now.
Having got it running I wanted to put some VMs on it (hey, just like last time), but this time around I want to do it slightly less ad-hoc (hey, not just like last time) so I have been playing with creating them declaratively.
Goals (and non-goals)
- I want the host machine to describe the "raw machine" characteristics (ram and disk size, etc) and to perform an initial install onto the VMs. For reasons of convenience this should include getting an ssh key onto the box so that subsequent work (next bullet point) can be automated.
- I want the VMs to be standalone after that: I will manage them
ad-hoc or I will use them as NixOps targets but I
don't want
nixos-rebuild
on the host to be affecting the configuration of the VMs or to take down running services unexpectedly when I run it.
- I want "real" virtualization, not just containers
- I expressly don't want to use VirtualBox, which I have to deal with at work and never fails to be annoying, buggy and weird. QEMU/KVM will be fine - it's good enough for Bytemark (and many other commercial VPS providers) - Xen would be OK too but as I can't tell easily if Xen works with NixOS works with UEFI right now, QEMU it is.
Prior art
The Virtualization in NixOS page on the new Wiki is thoroughly worth reading and I am very much indebted to it for ideas and even some bits of code. The author has different requirements to me and therefore has different answers in places, but I borrowed a lot. In particular you should read that if you are wondering "doesn't {nixos, nixops} do this out of the box already?"
My approach
Note to the reader: there are many snippets of code in the rest of this post. They are all extracted from the actual system at the time I write this, and provided to help explain the approach, but probably not the best place to start if you just want something you can run. If you want something you can run, look instead at telent/nixos-configs on github which at time of writing is basically the same thing, but more likely to be updated, refined, bugfixed etc than I am ever to revisit this blog post.
Describe the guests
Unlike the Wiki author, I am managing the host machine as a plain
Nixos machine and not using Nixops here. So I have created a module
/etc/nixos/virtual.nix
and added it to my imports
in
/etc/nixos/configuration.nix
imports = [ # Include the results of the hardware scan. ./hardware-configuration.nix # [ ... ] ./virtual.nix ];
In that module, I define the VMs I want using an attribute set bound to a local variable. I know, I should do this properly with the module config system. Some day I will.
let guests = { alice = { memory = "1g"; diskSize = "40g"; vncDisplay="localhost:1"; netDevice="tap0"; }; bob = { memory = "1g"; diskSize = "20g"; vncDisplay="localhost:2"; netDevice="tap1"; }; };
Start the guest VM processes
We map over the guests
variable to make a systemd service for each
VM that checks it has a disk image and brings it up (or takes it down,
as appropriate).
systemd.services = lib.mapAttrs' (name: guest: lib.nameValuePair "qemu-guest-${name}" { wantedBy = [ "multi-user.target" ]; script = '' disks=/var/lib/guests/disks/ mkdir -p $disks hda=$disks/${name}.img if ! test -f $hda; then ${firstRunScript} $hda ${guest.diskSize} fi sock=/run/qemu-${name}.mon.sock ${pkgs.qemu_kvm}/bin/qemu-kvm -m ${guest.memory} -display vnc=${guest.vncDisplay} -monitor unix:$sock,server,nowait -netdev tap,id=net0,ifname=${guest.netDevice},script=no,downscript=no -device virtio-net-pci,netdev=net0 -usbdevice teablet -drive file=$hda,if=virtio,boot=on ''; preStop = '' echo 'system_powerdown' | ${pkgs.socat}/bin/socat - UNIX-CONNECT:/run/qemu-${name}.mon.sock sleep 10 ''; }) guests;
Create the guest disk images
These systemd services expect the guest machine to have a working disk image, so we need some way to create those.
The recipe for this on the Wiki creates a partition image, resize it
appropriately, then uses pkgs.vmTools.runInLinuxVM
to install NixOS
on it. The way it does this is somewhat low-level and to my mind
uncomfortably close to Dark Arts: it manually creates /nix/store
and
calculates package closures and makes directories and runs Grub and
...
I took a different approach which I feel is both cleaner and and more
hacky: I created a custom CD image which has a service on it that
looks for a disk called vda
and runs nixos-generate-config
and
nixos-install
on it. When a new VM is needed, it boots from this
virtual CD instead of from its own disk. Note that the auto-install
service has no safeguards or checks - this is definitely not a CD image
that you would burn onto an actual disk and leave around
the office.
(I claim it's more clean because it uses the "standard" installation
method, but it is definitely more hacky because it uses sed
on the
generated configuration.nix
to enable ssh and configure grub, and we
all know what happens when sed is invited to the party.)
The dangerous unattended install service is defined in nixos-auto-install-service.nix which I'm not going to copy and paste here but you can view on Github. In virtual.nix we write a derivation to create a NixOS config including it and build an ISO image
iso = system: (import <nixpkgs/nixos/lib/eval-config.nix> { inherit system; modules = [ <nixpkgs/nixos/modules/installer/cd-dvd/installation-cd-minimal.nix> ./nixos-auto-install-service.nix ]; }).config.system.build.isoImage;
and then we need something to create the disk image and run a QEMU which boots the ISO
firstRunScript = pkgs.writeScript "firstrun.sh" '' #!${pkgs.bash}/bin/bash hda=$1 size=$2 iso=$(echo /etc/nixos-cdrom.iso/nixos-*-linux.iso) PATH=/run/current-system/sw/bin:$PATH ${pkgs.qemu_kvm}/bin/qemu-img create -f qcow2 $hda.tmp $size mkdir -p /tmp/keys cp ${pubkey} /tmp/keys/ssh.pub ${pkgs.qemu_kvm}/bin/qemu-kvm -display vnc=127.0.0.1:99 -m 512 -drive file=$hda.tmp,if=virtio -drive file=fat:floppy:/tmp/keys,if=virtio,readonly -drive file=$iso,media=cdrom,readonly -boot order=d -serial stdio > $hda.console.log if grep INSTALL_SUCCESSFUL $hda.console.log ; then mv $hda.tmp $hda fi '';
(This is called from the systemd service defined previously, if you hadn't noticed and were wondering)
SSH keys
Eagle-eyed readers might notice the shenanigens with /tmp/keys
and
file=fat:floppy
in that script. I didn't really want to bake my ssh
public key into the ISO just because that's a vast amount of churn
every time the key changes, so this is how we get an SSH key into
the image. We're using a feature of QEMU that I did not previously
know about - it can create a virtual FAT system
from the contents of a directory on the host machine.
Networking
The guests are bridged onto the host LAN, because there is too much NAT in the world already and I do not wish to be the cause of more.
networking.interfaces = lib.foldl (m: g: m // {${g} = {virtual=true; virtualType="tap";};}) {} (map (g: g.netDevice) (builtins.attrValues guests)); networking.bridges.vbridge0.interfaces = [hostNic] ++ (map (g: g.netDevice) (builtins.attrValues guests));
A note of caution here: messing with bridges while connected via ssh
is a bad idea, if your connection is through one of the interfaces you
want to add to the bridge. As soon as you add eth0
(or wlp3s0
or
enp0s31f6zz9pluralzalpha
or whatever systemd thinks your network
card should be called today) to the bridge it will lose its IP address
and things will probably not Be Right until dhclient
next refreshes.
Learn from my mistakes: do this at the console or have some kind of
backup connection.
In practice
So far, It Seems To Work. There are some points you may want to note:
- the VMs are configured using virtio network and disk drivers. Any
vaguely modern Linux should have no problem with this, note that it
means your disk drive is called
vda
nothda
orsda
- if you change the disk image size after the image is created, nothing will happen until you remove the old disk image by hand, at which time a new one is created. I don't have a clear idea of how I want resizing to work yet but I am clear that I don't want it to wipe out my disk image when I was not expecting it, so I have gone for a conservative approach.
- the process of installing NixOS onto the guest disk is probably
quite brittle and doesn't have any timeouts, so if it should fail
for some reason you'll quite likely have an idle
qemu
process sitting there indefinitely displaying an error message on its console. You can check this by looking in/var/lib/guests/disks/
vmname.img.console.log
and/or by connecting a VNC client to localhost:99
- I have not done any kind of security audit here and make no warranties that it will not add your compute hardware to a Russian botnet mining bitcoins. I can think of no obvious reason that it should do, but this is something you need to do for yourself.