diary @ telent

Flash is fast, flash is cool#

Sun Jan 25 09:05:44 2026

Topics: eculocate rust esp32

This week has been about implementing OTA flashing.

The ESP32 supports an A/B partitioning scheme for the application partition, so you can safely install a new firmware without destroying the firmware that's currently running. You write to partition ota_0 having booted from ota_1 and then flip the active boot partition and do the opposite on the next update. Rust support for this is in the esp_hal_ota crate, but if you want to understand what it's doing you should first read the official Espressif C API documentation.

It's pretty easy to use: as per the example you make an Ota object, you call ota_begin on it, and then you call ota_write chunk every time you get some new data (from the network or from BLE or from the USB stick or wherever you are getting the update from) until the whole thing's transferred, then you call ota_flush at the end.

It does/did seem to have a bug (as far as I can tell): if you turn off the builtin CRC checking (because you are using a reliable network, or you have some other form of integrity check), it will try to do it anyway. It's a one-line fix which I will open a PR for as soon as I'm a bit more convinced I haven't misundestood the whole thing.

Anyway, I plugged it in, hooked it up and ... it worked. It was however a lot slower (it took minutes) than flashing using espflash over USB, which takes seconds, and this is because it (more precisely, the API afforded by embedded-storage) takes a quite naive approach to managing erase blocks: if you get 1187 bytes from the network and ask embedded-storage to write it, it will read the whole erase block (4096 bytes) into RAM, erase it - you can only erase in units of one block - and then write the modified block back. Given you'd just written less than an entire block, the next packet will cause it to erase the same block again to write a different part of it. You're probably erasing each block two or three times.

Rather than change esp-hal-ota I decided to take the coward's way out and buffer data in my application until I had a block's worth. Suddenly it got massively faster.

The other feature this week was authentication: the device should only accept legitimate (read: cryptograpically signed) firmware. The constraint here is that we can't read the whole firmware into RAM before verifying the signature - there's 384K of RAM and 4MB of flash, it's not going to fit. Our self-imposed additional constraint is that we don't want to write the firmware to flash as we go along and then validate the signature at the end, because if it's wrong then we've already overwritten a working firmware with malicious data. I say this is self-imposed because it's not actually obvious that it's a real problem unless there's some way for the bad actor to switch to the new firmware, but it doesn't seem pretty even so.

So tl;dr we need to verify the firmware before we've read it all, and before we've written any of it. We do this using the approach explained by Gennaro and Rohatgi in How to Sign Digital Streams - the first block is digitally signed and each block contains a sha256 of the following block.

We use ed25519-dalek for signature checking, because it looked easy to get running with no_std and I've heard of at least two of the authors. For SHA256 - it turns out that the ESP32-C3 has a hardware SHA accelerator, so we use esp_hal::sha here and don't need to do it in software.

I bashed my head against this for a while because I didn't read the F manual closely enough. hasher.update doesn't eat all the bytes that you feed it, and it returns the unconsumed data - so you have to run it in a loop until it comes back empty. If you just call it once, as I did, you get the hash of the first 32 bytes only.

(In the end, this - it seems to me - doesn't protect a lot better against wiping good images than the verify-at-the-end approach. If a MITM is able to substitute block 22 of 150 blocks with their own data, we will write blocks 1-21 and then abort, whereas verifying-at-the-end means we'll write blocks 1-150 but not switch to the new image. Either way we've trashed whatever was there before)

That's bascally all there is to relate this week, although I also invested a little more time reading bits of the Rust Book that our Rust study group at $work hasn't reached yet, in order to make the error handling less crappy. Next steps are to write the session registration thing so that UDP is authenticated, and to add wifi provisioning so we're not hardcoding my wifi network details.

After shock#

Sun Jan 18 15:11:06 2026

Topics: bike

What's missing from this motorbike? The answer is shocking.

I removed the shock from my motorbike today so I can take it to ABE tomorrow to be rebuilt. Some notes for posterity and so that I remember how to reinstall it.

I mostly followed the Haynes manual: the words are good but the pictures are awful. They say to remove the fuel tank, but I didn't really want to, on account of how it's full of fuel. I found it worked to lift the tank a bit and stick some wooden blocks underneath to hold it up. While doing this the vent/breather pipe popped off, as it always does.

In the order that I tackled them:

Hopefully having now written this down I'll not forget to reattach all the bits

A joke about UDP#

Sat Jan 17 08:30:09 2026

Topics: rust eculocate esp32

I'll tell you a joke about UDP, but you might not get it.


We have a new name. "Thing I can plug into my motorbike ECU to log the data (rpm, speed, throttle position, temperatures etc etc) it produces" is Leonard-of-Quirm-level naming. I'd provisionally been calling it "eculogical" which I didn't like, and now it's called "eculocate" which I ... can tolerate.

And I've got it to the point where it (kind of) works - but, now I've decided I need to semi-fundamentally break it again. I'll get to that.

On the server side we have a UDP socket that listens for subscription message containing [(interval, table-number, start, end), ...] (actually binary encoded) and then sends back the requested table data once every interval milliseconds for the next minute. Then it stops, because this is UDP and we can't reliably tell when the peer has gone away, so the peer should send another subscription message in the meantime if it wants to carry on receiving.

For now we're just offering the raw tables, because I'm going to need much more example data to figure out the structure. Eventually we'll do some processing on device so that clients can query "RPM" or "TPS" without having to know their table/offset - as that varies between bike models.

Notes:

And we have an Android client. Well, it's Android insofar as it runs on my phone, but I don't think it'd qualify for Android Market or the Play Store or whatever it's called now. I sidestepped the whole android app development slog, by installing Termux and Termux:GUI on my phone and writing the client side as a Python script. I don't even like Python and I still found this preferable to the Android Studio build/run process: I simply sshed into my phone and used tramp to edit the script. I believe that Termux:GUI doesn't support the full range of Android widgets but it has buttons and labels and text boxes and LinearLayouts which is enough for me. Adding dns-sd (zeronconf) support was the work of about 20 minutes, which was nice.

Having achieved that milestone I made a list of what's left before I can plug it into my motorbike and take it for a ride (cable, power supply, some form of protective casing) and realised that once I detach it from its USB umbilical I will no longer be able to release new versions simply by invoking cargo run. So, it needs a mechanism for OTA updates, and this should probably come with some kind of auth[nz] so that not just any Tom, Dick or Harry on the same wifi network could flash random crap onto it. Then I considered that if we're not trusting the wifi, the actual UDP service (which is currently read-only but maybe some day might include a means of writing to (and therefore probably bricking) the ecu) is also sensitive.

Here's the plan:

Additionally, we need to change the dns-sd stuff to advertise a TCP service, the client to register a session key when it starts, the subscription message format to include the session key, and the UDP listener to check it. Which is what I meant when I said "semi-fundamentally break it".

If this were commercial/proprietary software then we'd have separate keys for the firmware signing and for the client. That seems less of an issue when it's most likely the same person building the software as is using the client, but it might be worth doing anyway.

Current status: bodged together a TCP listener, haven't touched on crypto yet, and so far it only pretends to do the OTA update.

Magic DNS#

Fri Jan 2 09:58:45 2026

Topics: rust eculocate esp32

It's all very 2026 around here, isn't it? I am reminded by

to jot down some of what I've been doing in the past month or so. The tl;dr is "making a thing I can plug into my motorbike ECU to log the data (rpm, speed, throttle position, temperatures etc etc) it produces". For reasons mostly of ramifying the learning opportunities, I decided the best way would be to get a cheap ESP-32 device (it's RISC-V - isn't that cool?) and hook it up to a level converter, and then write a program for it in Rust (Rust learning opportunity ahoy) to twiddle the serial line appropriately and send the data over the network to the mobile phone which sits on my handlebars.

It turns out that I spent way less time getting the serial interface to the Honda K-line ECU signal to reveal its secrets than on the "why don't you just ..." part where I want to stream the data over wifi to another device. So this post is actually not at all about hardware hacking.

The constraints I have imposed on myself here are

These are both in principle solved problems.

There's a convention for provisioning wifi on these devices which involves using a mobile phone app to connect to it using BLE then sending the ssid/password of the chosen access point. In fact there's even a prebuilt Android app which we can use and an esp32 arduino library which we can't (because we have elected to make our lives difficult and use Rust instead). But I am led to believe that "rewrite everything in Rust" is idiomatic for Rust programmers anyway. I haven't done this yet.

And for the "what's my IP address" problem there is a standard way, by combining Multicast DNS and DNS-based Service Discovery, for computers to publish their services on the LAN. When I say "computers": if this household is typical, mostly they're set-top boxes, printers, light bulbs, smart speakers and thermostats rather than general-purpose computing devices. I've mostly done this bit.

Terms

Multicast DNS is DNS, but peer-to-peer: it reuses mostly the same packet formats but instead of requiring a centralised server which knows all the names, every device listens on a multicast address for DNS queries for its own name.

DNS-SD is a convention for which records you can query/need to send in order to advertise what kind of services you have and where they are. Because sending an A record alone is not sufficient for anyone with a Mac and a fancy-schmancy service browser to know what kind of service is on offer at that address. Is it a printer? A dishwasher? An IoT air fryer?

The RFCs for each (which are, by the way, much easier reads than a lot of RFCs and contain no EBNF at all) go to great lengths to point out that each is independent of the other. But they stack well.

DNS-SD, 3048 metre view

DNS-SD is based on a paradigm of "services" and "service instances". A "service" is the general "kind" of thing on offer and is named something like _http._tcp.local - it will always end in _tcp.local if it is TCP or _udp.local if it is anything other than TCP. For our ECU project we chose the service name _keihin._udp.local after the manufacturer of the ECUs that the device knows how to talk to. A service instance might be something like WiserHeat05AB12._http._tcp.local. Service names aren't usually hierarchical but there are a few with a second level like _printer._sub._http._tcp

The minimum/usual set of records you need to publish for DNS-SD is this (pseudocode)

myinstancename._theservicename._udp.local SRV, data: (target: myhostname.local, port: nnnn)
myhostname.local A, data: a.b.c.d
myinstancename._theservicename._udp.local TXT, data: "txtvers=1"
_theservicename._udp.local PTR, data:  myinstancename._theservicename._udp.local
_services._udp.local PTR, data:  _theservicename._udp.local

Your service instance needs a SRV and a TXT, then there's a PTR connecting the service instance to the service for people who are browsing the service - think about e.g. an "Add a printer" dialog box, then there's a PTR from _services._udp.local to the service name PTR for people who are running avahi-browse -a or its moral equivalent in GUI-land. And not forgetting there's an A record matching the one in the SRV record data.

MDNS

The single biggest problem when implementing MDNS is the lack of tooling to test it against. In my experience:

Where are we now?

I believe that it now does everything an mdns responder SHOULD(sic) do except

and I can't decide, in the context of this being a program that probably nobody else in the world will ever use and even I will only use on one single piece of hardware (I only have one motorbike) whether implementing those things is a good and laudable decision because spec compliance is important, or just a way of further putting off the inevitable next step which involves writing the Android app to collect the data.

It also could do with being extracted into its own module/crate/thing to be more modular. I'd say "to aid reuse" but I don't think anyone really wants to (or should want to) reuse my novice-level Rust code. Learning in public.

Centralised logging with Liminix and VictoriaLogs#

Tue Oct 21 17:52:38 2025

Topics: liminix

It's a year since I wrote Log off, in which I described some ongoing-at-the-time work to make Liminix devices log over the network to a centralised log repo. It's also, and this is entirely a coincidence, a year since I made any kind of progress on it: since that time all my log messages have continued to be written to ramdisk that will be lost forever like tears in the rain.

This situation was not ideal. I had some time and energy recently to see if I could finish it up and, well, I haven't done that exactly but whereas last time I only believed it was substantially finished, this time I believe it is substantially finished.

It goes a little something like this:

Tap the log pipeline

Each service in Liminix is connected to its own log process, which is (for 98% of the services) connected to the "fallback logger" which writes the logs to disk (ramdisk) and takes care of log rotation etc. This is standard s6 stuff, we're not innovating here.

Into the middle of this pipeline we insert a program called logtap which copies its input to its output and also to a fifo - but only writes to the fifo if the previous writes worked (i.e. it doesn't back up or stop working if the tap is not connected). The standard output from logtap goes on to the default logger, so local logging is unaffected - which is important if the network is down or hasn't come up yet.

This is a change from last year's version, which used a unix domain socket instead of a fifo. Two reasons: first, we need to know which messages were sent successfully and which weren't. It was difficult to tell reliably and without latency whether there was anything at the other end of the socket, whereas we learn almost instantly when a fifo write fails. Second, it makes it easier to implement a shipper because it can just open the fifo and read from it, instead of having to call socket functions.

Hang a reader on the tap

The log shipper opens the other end of the fifo and ... ships the logs. I've chosen VictoriaLogs (wrapped in an HTTPS reverse proxy) as my centralised log service, so my log shipper has to conect with HTTPS to the service endpoint and send "jsonline" log messages. In fact, my log shipper just speaks pidgin HTTP on file descriptors 6 and 7 and leverages s6-tlsclient to do the actual TCP/TLS heavy lifting.

This is all new since last year when we were just splatting raw logs over a socket connection instead of doing this fancy JSON stuff. It did mean writing a parser for TAI64N external timestamps and some functions to convert it to UTC: as a matter of principle (read: stubbornness) I do appreciate that my log message timestamps won't go forwards and backwards arbitrarily when leap seconds are decreed, but I guess almost nobody else (at least, neither VictoriaLogs nor Zinc) thinks it's important.

  # in liminix config
  logging.shipping = {
    enable = true;
    command =
      let certc = config.services.client-cert;
      in ''
        export CERTFILE=$(output_path ${certc} certificate)
        export CAFILE=$(output_path ${certc} ca-certificate)
        export KEYFILE=$(output_path ${certc} key)
        ${pkgs.s6-networking}/bin/s6-tlsclient -j -y -k loghost.example.org \
          10.0.0.1 443 \
          ${pkgs.logshippers}/bin/victorialogsend https://loghost.example.org/insert/jsonline
      '';
    dependencies = [services.qemu-hyp-route services.client-cert];
  };

... using the TLS cert you previously requested

Before the log shipper can start, it needs to get its TLS client certificate, by making a CSR and sending it to Certifix. The certifix-client is almost the same as last year's version except that it uses lua-http instead of fetch-freebsd as the http interface. This is because last year's version wasn't work when asked to traverse the baroque maze of iptables forwarding and QEMU Slirp networking that lies between my Liminix test network and my VictoriaLogs instance. After a long time staring at pcap dumps I gave up trying to work out why and just rewrote that bit.

It's important to have an (at least vaguely) accurate clock before attempting HTTPS, because the server certificate has a "not valid before" field, so OpenSSL won't like it if you say it's still 1970.

  # in liminix config
  services.client-cert = svc.tls-certificate.certifix-client.build {
    caCertificate = builtins.readFile /var/lib/certifix/certs/ca.crt;
    subject = "C=GB,ST=London,O=Example Org,OU=devices,CN=${config.hostname}";
    secret = builtins.readFile /var/lib/certifix/challengePassword;
    serviceUrl = "https://loaclhost.lan:19613/sign";
    dependencies = [ config.services.ntp ] ;
  };

... to connect to an HTTPS reverse proxy

Originally I planned to put a Lets Encrypt cert in front of Victorialogs, but that would need 500k of CA certificate bundle on each device, which is quite a lot on devices with little flash. So it makes more sense to use the Certifix CA here too.

Persuading the OpenSSL command line tools to make a CSR with a challengePassword was probably as much work as writing something with luaossl would have been - it was certainly messier - but the point is I didn't know that when I started.

  # in nixos configuration.nix
  systemd.services."loghost-certificate" =
    let
      dir = "/var/lib/certifix";
      pw = builtins.readFile "${dir}/private/challengePassword";
    in {
      script = ''
        set -eu
        cd ${dir}
        PATH=${pkgs.openssl}/bin:${pkgs.curl}/bin:$PATH
        openssl req -config <(printf '[req]\nprompt=no\nattributes=attrs\ndistinguished_name=DN\n[DN]"C=GB\nST=London\nO=Example Org\nCN=loghost\n[attrs]\nchallengePassword=${pw}') -newkey rsa:2048  -addext "extendedKeyUsage = serverAuth" -addext "subjectAltName = DNS:loghost.lan,DNS:loghost,DNS:loghost.example.org" -nodes -keyout private/loghost.key --out certs/loghost.csr
        curl --cacert certs/ca.crt -H 'content-type: application/x-pem-file' --data-binary @certs/loghost.csr https://localhost:19613/sign -o certs/loghost.crt
      '';
      serviceConfig = {
        Type = "oneshot";
        User = "root";
        ReadWritePaths = ["/var/lib/certifix"];
        StateDirectory = "certifix";
      };
      startAt = "monthly";
    };

The proxy itself is just Nginx with ssl_verify_client set, but certifix-client holds the https connection open so remember to disable proxy buffering or you aren't getting your logs in any kind of timely fashion.

  # in nixos configuration.nix
  services.nginx.virtualHosts."loghost.example.org" = {
    forceSSL = true;
    sslTrustedCertificate = /var/lib/certifix/certs/ca.crt;
    sslCertificateKey = "/var/lib/certifix/private/loghost.key";
    sslCertificate = "/var/lib/certifix/certs/loghost.crt";

    extraConfig = ''
      ssl_verify_client on;
      proxy_buffering off;
      proxy_request_buffering off;
    '';

    locations."/".proxyPass = "http://127.0.0.1:9428/";
  };

Just as I did last year, I'm going to finish by claiming that this is basically finished and it just needs installing on some real devices. Hopefully I'm right this time, though.