diary @ telent

Centralised logging with Liminix and VictoriaLogs#

Tue Oct 21 17:52:38 2025

Topics: liminix

It's a year since I wrote Log off, in which I described some ongoing-at-the-time work to make Liminix devices log over the network to a centralised log repo. It's also, and this is entirely a coincidence, a year since I made any kind of progress on it: since that time all my log messages have continued to be written to ramdisk that will be lost forever like tears in the rain.

This situation was not ideal. I had some time and energy recently to see if I could finish it up and, well, I haven't done that exactly but whereas last time I only believed it was substantially finished, this time I believe it is substantially finished.

It goes a little something like this:

Tap the log pipeline

Each service in Liminix is connected to its own log process, which is (for 98% of the services) connected to the "fallback logger" which writes the logs to disk (ramdisk) and takes care of log rotation etc. This is standard s6 stuff, we're not innovating here.

Into the middle of this pipeline we insert a program called logtap which copies its input to its output and also to a fifo - but only writes to the fifo if the previous writes worked (i.e. it doesn't back up or stop working if the tap is not connected). The standard output from logtap goes on to the default logger, so local logging is unaffected - which is important if the network is down or hasn't come up yet.

This is a change from last year's version, which used a unix domain socket instead of a fifo. Two reasons: first, we need to know which messages were sent successfully and which weren't. It was difficult to tell reliably and without latency whether there was anything at the other end of the socket, whereas we learn almost instantly when a fifo write fails. Second, it makes it easier to implement a shipper because it can just open the fifo and read from it, instead of having to call socket functions.

Hang a reader on the tap

The log shipper opens the other end of the fifo and ... ships the logs. I've chosen VictoriaLogs (wrapped in an HTTPS reverse proxy) as my centralised log service, so my log shipper has to conect with HTTPS to the service endpoint and send "jsonline" log messages. In fact, my log shipper just speaks pidgin HTTP on file descriptors 6 and 7 and leverages s6-tlsclient to do the actual TCP/TLS heavy lifting.

This is all new since last year when we were just splatting raw logs over a socket connection instead of doing this fancy JSON stuff. It did mean writing a parser for TAI64N external timestamps and some functions to convert it to UTC: as a matter of principle (read: stubbornness) I do appreciate that my log message timestamps won't go forwards and backwards arbitrarily when leap seconds are decreed, but I guess almost nobody else (at least, neither VictoriaLogs nor Zinc) thinks it's important.

  # in liminix config
  logging.shipping = {
    enable = true;
    command =
      let certc = config.services.client-cert;
      in ''
        export CERTFILE=$(output_path ${certc} certificate)
        export CAFILE=$(output_path ${certc} ca-certificate)
        export KEYFILE=$(output_path ${certc} key)
        ${pkgs.s6-networking}/bin/s6-tlsclient -j -y -k loghost.example.org \
          10.0.0.1 443 \
          ${pkgs.logshippers}/bin/victorialogsend https://loghost.example.org/insert/jsonline
      '';
    dependencies = [services.qemu-hyp-route services.client-cert];
  };

... using the TLS cert you previously requested

Before the log shipper can start, it needs to get its TLS client certificate, by making a CSR and sending it to Certifix. The certifix-client is almost the same as last year's version except that it uses lua-http instead of fetch-freebsd as the http interface. This is because last year's version wasn't work when asked to traverse the baroque maze of iptables forwarding and QEMU Slirp networking that lies between my Liminix test network and my VictoriaLogs instance. After a long time staring at pcap dumps I gave up trying to work out why and just rewrote that bit.

It's important to have an (at least vaguely) accurate clock before attempting HTTPS, because the server certificate has a "not valid before" field, so OpenSSL won't like it if you say it's still 1970.

  # in liminix config
  services.client-cert = svc.tls-certificate.certifix-client.build {
    caCertificate = builtins.readFile /var/lib/certifix/certs/ca.crt;
    subject = "C=GB,ST=London,O=Example Org,OU=devices,CN=${config.hostname}";
    secret = builtins.readFile /var/lib/certifix/challengePassword;
    serviceUrl = "https://loaclhost.lan:19613/sign";
    dependencies = [ config.services.ntp ] ;
  };

... to connect to an HTTPS reverse proxy

Originally I planned to put a Lets Encrypt cert in front of Victorialogs, but that would need 500k of CA certificate bundle on each device, which is quite a lot on devices with little flash. So it makes more sense to use the Certifix CA here too.

Persuading the OpenSSL command line tools to make a CSR with a challengePassword was probably as much work as writing something with luaossl would have been - it was certainly messier - but the point is I didn't know that when I started.

  # in nixos configuration.nix
  systemd.services."loghost-certificate" =
    let
      dir = "/var/lib/certifix";
      pw = builtins.readFile "${dir}/private/challengePassword";
    in {
      script = ''
        set -eu
        cd ${dir}
        PATH=${pkgs.openssl}/bin:${pkgs.curl}/bin:$PATH
        openssl req -config <(printf '[req]\nprompt=no\nattributes=attrs\ndistinguished_name=DN\n[DN]"C=GB\nST=London\nO=Example Org\nCN=loghost\n[attrs]\nchallengePassword=${pw}') -newkey rsa:2048  -addext "extendedKeyUsage = serverAuth" -addext "subjectAltName = DNS:loghost.lan,DNS:loghost,DNS:loghost.example.org" -nodes -keyout private/loghost.key --out certs/loghost.csr
        curl --cacert certs/ca.crt -H 'content-type: application/x-pem-file' --data-binary @certs/loghost.csr https://localhost:19613/sign -o certs/loghost.crt
      '';
      serviceConfig = {
        Type = "oneshot";
        User = "root";
        ReadWritePaths = ["/var/lib/certifix"];
        StateDirectory = "certifix";
      };
      startAt = "monthly";
    };

The proxy itself is just Nginx with ssl_verify_client set, but certifix-client holds the https connection open so remember to disable proxy buffering or you aren't getting your logs in any kind of timely fashion.

  # in nixos configuration.nix
  services.nginx.virtualHosts."loghost.example.org" = {
    forceSSL = true;
    sslTrustedCertificate = /var/lib/certifix/certs/ca.crt;
    sslCertificateKey = "/var/lib/certifix/private/loghost.key";
    sslCertificate = "/var/lib/certifix/certs/loghost.crt";

    extraConfig = ''
      ssl_verify_client on;
      proxy_buffering off;
      proxy_request_buffering off;
    '';

    locations."/".proxyPass = "http://127.0.0.1:9428/";
  };

Just as I did last year, I'm going to finish by claiming that this is basically finished and it just needs installing on some real devices. Hopefully I'm right this time, though.