NixOS (again) - declarative VMs with QEMU#

Fri Oct 20 07:43:49 2017

Topics: nix

I built a new PC to sit in the study at home. This isn't going to be a blog post about that, though: it all worked the first time and so there is nothing to rant about. The new box is smaller, quieter and faster than the old one, as it should be given that the old one is about 9 years old now.

Having got it running I wanted to put some VMs on it (hey, just like last time), but this time around I want to do it slightly less ad-hoc (hey, not just like last time) so I have been playing with creating them declaratively.

Goals (and non-goals)

I want the host machine to describe the "raw machine" characteristics (ram and disk size, etc) and to perform an initial install onto the VMs. For reasons of convenience this should include getting an ssh key onto the box so that subsequent work (next bullet point) can be automated.

I want the VMs to be standalone after that: I will manage them ad-hoc or I will use them as NixOps targets but I don't want nixos-rebuild on the host to be affecting the configuration of the VMs or to take down running services unexpectedly when I run it.

I want "real" virtualization, not just containers

I expressly don't want to use VirtualBox, which I have to deal with at work and never fails to be annoying, buggy and weird. QEMU/KVM will be fine - it's good enough for Bytemark (and many other commercial VPS providers) - Xen would be OK too but as I can't tell easily if Xen works with NixOS works with UEFI right now, QEMU it is.

Prior art

The Virtualization in NixOS page on the new Wiki is thoroughly worth reading and I am very much indebted to it for ideas and even some bits of code. The author has different requirements to me and therefore has different answers in places, but I borrowed a lot. In particular you should read that if you are wondering "doesn't {nixos, nixops} do this out of the box already?"

My approach

Note to the reader: there are many snippets of code in the rest of this post. They are all extracted from the actual system at the time I write this, and provided to help explain the approach, but probably not the best place to start if you just want something you can run. If you want something you can run, look instead at telent/nixos-configs on github which at time of writing is basically the same thing, but more likely to be updated, refined, bugfixed etc than I am ever to revisit this blog post.

Describe the guests

Unlike the Wiki author, I am managing the host machine as a plain Nixos machine and not using Nixops here. So I have created a module /etc/nixos/virtual.nix and added it to my imports in /etc/nixos/configuration.nix

  imports =
    [ # Include the results of the hardware scan.
      ./hardware-configuration.nix
      # [ ... ]
      ./virtual.nix
    ];

In that module, I define the VMs I want using an attribute set bound to a local variable. I know, I should do this properly with the module config system. Some day I will.

let guests = {
      alice = {
        memory = "1g";
        diskSize = "40g";
        vncDisplay="localhost:1";
        netDevice="tap0";
      };
      bob = {
        memory = "1g";
        diskSize = "20g";
        vncDisplay="localhost:2";
        netDevice="tap1";
      };
    };

Start the guest VM processes

We map over the guests variable to make a systemd service for each VM that checks it has a disk image and brings it up (or takes it down, as appropriate).

    systemd.services = lib.mapAttrs' (name: guest: lib.nameValuePair "qemu-guest-${name}" {
      wantedBy = [ "multi-user.target" ];
      script =
          ''
          disks=/var/lib/guests/disks/
          mkdir -p $disks
          hda=$disks/${name}.img
          if ! test -f $hda; then
            ${firstRunScript} $hda ${guest.diskSize}
          fi
          sock=/run/qemu-${name}.mon.sock
          ${pkgs.qemu_kvm}/bin/qemu-kvm -m ${guest.memory} -display vnc=${guest.vncDisplay} -monitor unix:$sock,server,nowait -netdev tap,id=net0,ifname=${guest.netDevice},script=no,downscript=no -device virtio-net-pci,netdev=net0 -usbdevice teablet -drive file=$hda,if=virtio,boot=on
          '';
      preStop =
        ''
          echo 'system_powerdown' | ${pkgs.socat}/bin/socat - UNIX-CONNECT:/run/qemu-${name}.mon.sock
          sleep 10
        '';
    }) guests;

Create the guest disk images

These systemd services expect the guest machine to have a working disk image, so we need some way to create those.

The recipe for this on the Wiki creates a partition image, resize it appropriately, then uses pkgs.vmTools.runInLinuxVM to install NixOS on it. The way it does this is somewhat low-level and to my mind uncomfortably close to Dark Arts: it manually creates /nix/store and calculates package closures and makes directories and runs Grub and ...

I took a different approach which I feel is both cleaner and and more hacky: I created a custom CD image which has a service on it that looks for a disk called vda and runs nixos-generate-config and nixos-install on it. When a new VM is needed, it boots from this virtual CD instead of from its own disk. Note that the auto-install service has no safeguards or checks - this is definitely not a CD image that you would burn onto an actual disk and leave around the office.

(I claim it's more clean because it uses the "standard" installation method, but it is definitely more hacky because it uses sed on the generated configuration.nix to enable ssh and configure grub, and we all know what happens when sed is invited to the party.)

The dangerous unattended install service is defined in nixos-auto-install-service.nix which I'm not going to copy and paste here but you can view on Github. In virtual.nix we write a derivation to create a NixOS config including it and build an ISO image

    iso = system: (import <nixpkgs/nixos/lib/eval-config.nix> {
      inherit system;
      modules = [
        <nixpkgs/nixos/modules/installer/cd-dvd/installation-cd-minimal.nix>
        ./nixos-auto-install-service.nix
      ];
      }).config.system.build.isoImage;

and then we need something to create the disk image and run a QEMU which boots the ISO


  firstRunScript = pkgs.writeScript "firstrun.sh" ''
#!${pkgs.bash}/bin/bash
hda=$1
size=$2
iso=$(echo /etc/nixos-cdrom.iso/nixos-*-linux.iso)
PATH=/run/current-system/sw/bin:$PATH
${pkgs.qemu_kvm}/bin/qemu-img create -f qcow2  $hda.tmp $size
mkdir -p /tmp/keys
cp ${pubkey} /tmp/keys/ssh.pub
${pkgs.qemu_kvm}/bin/qemu-kvm -display vnc=127.0.0.1:99 -m 512 -drive file=$hda.tmp,if=virtio -drive file=fat:floppy:/tmp/keys,if=virtio,readonly -drive file=$iso,media=cdrom,readonly -boot order=d -serial stdio > $hda.console.log 
if grep INSTALL_SUCCESSFUL $hda.console.log ; then
  mv $hda.tmp $hda
fi
    '';

(This is called from the systemd service defined previously, if you hadn't noticed and were wondering)

SSH keys

Eagle-eyed readers might notice the shenanigens with /tmp/keys and file=fat:floppy in that script. I didn't really want to bake my ssh public key into the ISO just because that's a vast amount of churn every time the key changes, so this is how we get an SSH key into the image. We're using a feature of QEMU that I did not previously know about - it can create a virtual FAT system from the contents of a directory on the host machine.

Networking

The guests are bridged onto the host LAN, because there is too much NAT in the world already and I do not wish to be the cause of more.

    networking.interfaces = lib.foldl (m: g: m // {${g} = {virtual=true; virtualType="tap";};}) {} (map (g: g.netDevice) (builtins.attrValues guests));
    networking.bridges.vbridge0.interfaces = [hostNic] ++ (map (g: g.netDevice) (builtins.attrValues guests));

A note of caution here: messing with bridges while connected via ssh is a bad idea, if your connection is through one of the interfaces you want to add to the bridge. As soon as you add eth0 (or wlp3s0 or enp0s31f6zz9pluralzalpha or whatever systemd thinks your network card should be called today) to the bridge it will lose its IP address and things will probably not Be Right until dhclient next refreshes. Learn from my mistakes: do this at the console or have some kind of backup connection.

In practice

So far, It Seems To Work. There are some points you may want to note:

the VMs are configured using virtio network and disk drivers. Any vaguely modern Linux should have no problem with this, note that it means your disk drive is called vda not hda or sda

if you change the disk image size after the image is created, nothing will happen until you remove the old disk image by hand, at which time a new one is created. I don't have a clear idea of how I want resizing to work yet but I am clear that I don't want it to wipe out my disk image when I was not expecting it, so I have gone for a conservative approach.

the process of installing NixOS onto the guest disk is probably quite brittle and doesn't have any timeouts, so if it should fail for some reason you'll quite likely have an idle qemu process sitting there indefinitely displaying an error message on its console. You can check this by looking in /var/lib/guests/disks/_vmname_@.img.console.log@ and/or by connecting a VNC client to localhost:99

I have not done any kind of security audit here and make no warranties that it will not add your compute hardware to a Russian botnet mining bitcoins. I can think of no obvious reason that it should do, but this is something you need to do for yourself.

⟪ Sep 2017 Nov 2017 ⟫