diary at Telent Netowrks

Can I speak to your supervisor?#

Fri, 30 Oct 2020 17:22:36 +0000

When we left off last time I was teasing some words abut L2TP ("PPP over the internet") and how it would let me test the actual routing/gatewaying/firewalling etc in NixWRT without breaking the internet access for everyone else in the house.

That's part of what I've been doing since though not exactly all of it. Mostly I have been drowing in alphabet soup.

Clearly, there are a number of dependencies here (and not forgetting you can't even start the process until you have a transport interface configured for the l2tp to be tunneled over), and modeling these dependencies in Monit was becoming a bit unwieldy. I'm not 100% sure that writing a whole new service supervision system in Lua will be net better, but that's what I've set out to do and early signs are promising. Two sentence summary:

It's very much WIP. If I can arrive at a general design that will let me cleanly address (1) the case above; (2) l2tp over lte modem as backup for pppoe; (3) automatically recovering from a buggy ethernet driver/device that panics under heavy load, I will tentatively conclude that it is a good piece of software.

In other news, I have revived (more accurately, rewritten) the QEMU target for NixWRT so now you (and I) can build and run it without access to real hardware. I am only mildly infuriated that QEMU doesn't appear to support device trees on MIPS (or perhaps it does but the Malta MIPS subarch init code doesn't know how to load the dtb that QEMU passes) but it hasn't added a whole lot of ocmplexity to the build to have one target without. See the instructions in the README

If I get much more fed up of having to build a new image and reboot qemu every time I move the bugs around in my Lua scripts, I may also look into adding 9p support so that the emulator can read its root fs direct from the host.

NixWRT - 5.4 branch#

Wed, 07 Oct 2020 23:08:15 +0000

I am starting this entry on Sunday evening, which means it might be finished in time for my once-traditional Tuesday blog post. No promises though.

Long story short: NixWRT has now switched to using Linux 5.4.64 as the base kernel version, and to using modules based on Linux 5.9 for wireless devices/802.11 code. It works for ramips and for ath79 targets: ar71xx has been removed.

Long story, unshortened: I have given up on trying to build monolithic kernels. Now I build the linux 5.4 kernel with support for most of the stuff that doesn't change a lot, then build wireless drivers and the wireless protocol stuff (lib80211, mac80211) using code from the linux-backports project. This is all made possible by the Linux Backports Project

Linux-backports is quite clearly a labour of love and a tremendous engineering feat, involving some really neat tech in the shape of Coccinelle - an OCaml program that accepts "semantic patches" describing program refactorings in a much richer and more general sense than the straightforward line-based diff format.

That said, if I had known three weeks ago what I now know, I would not have spent quite as long trying to get its "integration mode" to work, because it doesn't. What we need to do instead is

The first of these is not super-simple in Nix because gentree.py - the script that does the work - copies a bunch of read-only files from the source tree in /nix/store/ and then tries to overwrite them. I had to do an ugly Python hack to workaround this. Then I did a bunch of other barbarous hacks that may be necesary or may just be because I don't know what I'm doing.

The second is in theory straightforward except for the interaction between CONFIG_FOO settings in the base kernel and CPTCFG_BAR settings in the backported-modules tree that gentree.py has made. In brief, some experimentation is needed to figure out which options need enabling in the base kernel (typically a bunch of crypto code ) to unlock the ability to configure modules in the modules tree that depend on them. I probably still don't have the minimal set, but nor do I have the patience to dig in and find out.

The third is just legwork. I decided to load all the modules at boot time (dynamic device hotplugging is not a supported use case for NixWRT at this time) so I could skip all the work of having NixWRT figure out what the inter-module depenedencies are. Instead I do that myself and load them one at a time in the correct order

I will concede that although modules means more moving parts, this approach does have one big advantage over the monolithic tree it replaces - apart from relieving me of the hassle in trying to e.g. maintain the rt2x00 monster patch - which is that I no longer need the same degree of contortionism to embed device firmware into the kernel. I can just drop it in the filesystem.

A couple of wrong turns were also involved: most notably the day I spent thinking I was building for a GL-MT300N V2 and being puzzled why it wasn't finding the console device, before looking under my desk and realising it was in fact a GL-MT300A (similar but different SoC). Well, I found it funny. Only in retrospect, obviously.

The position right now is that

Not much spam - now even less#

Wed, 26 Aug 2020 23:28:47 +0000

After a long period of not getting around to it, finally I have added spam/ham training support to my self-hosted mail config. Key words: emacs, notmuch, muchsync, rspamd, nixos

The plan was: I want to be able to add a tag while reading mail on my laptop that would indicate we need to retrain using it, then after syncing back to the server I could run some notmuch search and feed the results through rspamc to rspamd.

First, something on my laptop to mark emails that need retraining. This is emacs, "obviously". (Seriously, it's unlikely to be obvious to most people but if it's not obvious to you personally, dear reader, you won't get much value from this blog post)

(define-key notmuch-search-mode-map (kbd "#")
  (lambda ()
    (interactive nil)
    (let* ((was-spam-p (member "spam" (notmuch-search-get-tags (point))))
	   (tag-changes (list "+retrain" 
			      (if was-spam-p "-spam" "+spam"))))
      (notmuch-search-tag tag-changes (point) (point)))))

Syncing the tags between laptop and server is ably handled already by muchsync, so no change needed there.

To run rspamc on the server I first had to enable the rspam "controller" worker: adding

services.rspamd.workers.controller = {
  enable = true;

was necessary but not sufficient: running rspamc learn-spam would give an error message HTTP error : 500, Unknown statistics error. Apparently it defaults to requiring a redis backend for something, so this is not wholly surprising as I wasn't running redis.

services.redis = {
   enable = true;
   bind = "";

and then we need to tell rspamd where to find redis:

services.rspamd.locals."redis.conf".text = "servers = \"\"";

and then all that remains is to figure out the correct commandline (still on the server) to query the wrongly classified emails and send them to rspamd:

notmuch search --output=files --format=text0 --exclude=false is:retrain and is:spam | \
  xargs -0  rspamc learn_spam -c bayes
notmuch search --output=files --format=text0 --exclude=false is:retrain and not is:spam | \
  xargs -0  rspamc learn_ham -c bayes
notmuch tag -retrain tag:retrain

I am writing this as a blog post in the vague hope that it will be useful to someone, but also as a prompt to myself so that at some point I will nixify the bits of this that aren't already. I have to say, it's overall been a whole lot less frustrating than my efforts trying to manage my firefox config (yesterday evening and ongoing).

Fixing Firefox#

Wed, 06 May 2020 08:52:30 +0000

You can interpret "fixing" here in any of its popular senses.

Presenting one approach for getting from freshly installed Firefox to something you'd actually want to use (preferences, extensions, etc) without tedious manual configuration.

The mechanism:

So it's not wholly automatic but it's pretty close: after running nix-env -r -iA nixos.desktop - which is my usual way of setting up my user environment - I needed to do systemctl --user enable configure-firefox. Yes, I could probably put that in my nixos configuration.nix but then I've coupled my per-user and per-host configuration and I'm not completely certain I want to do that.

Radio Free Europe#

Thu, 26 Dec 2019 23:46:56 +0000

The GL-AR750 now has working (though not particularly fast) wifi on both 2.4GHz and 5GHz bands. A fair amount of fiddling was required to get us to this point, so in the best tradition of the Christmas Radio Times (a pre-digital British institution, don't know if still a thing) here is a double-length post about it all.

The GL-AR750 has two distinct sets of wifi hardware. The 2.4GHz stuff is part of the QCA9531 SoC, i.e. it's on the same silicon as the CPU, the Ethernet, the USB etc. The device is connected to the host via AHB, which I think (but have not confirmed) stands for Atheros Host Bus, and it is supported in Linux using the ath9k driver. The 5GHz support, on the other hand, is provided by a QCA9887 PCIe (PCI embedded) WLAN chip: I haven't looked closely at the router innards to see if this is actually physically a separate board that could be unplugged, but as far as the Linux is concerned it behaves as one. This is supported by the ath10k driver. Clear so far?

Five giga hertz, four calling birds, three French hens ...

My approach to porting NixWRT was basically

and the answer, at least initially, was that I got no kind of anything from the ath9k driver and some error messages from ath10k, so I thought I'd start there.

A firmware hand on the tiller

There are two things that ath10k devices need from their host environment that are not provided directly by the driver: the firmware and the calibration data. The firmware is the code that the wifi chip runs and we have to upload into it when it boots, and the calibration data, by my somewhat hazy impression, is stuff like tuning parameters for e.g. knowing which amplitudes correspond to what power outputs (which is obviously going to depend on the amplifiers, antenna design, etc, and therefore will differ depending on how the device is wired up).

On a proper PC the driver obtains the firmware by doing some kind of call-out-to-udev dance that makes userspace find it on the filesystem and feed it back into the kernel, and then feeds it into the device using the BMI (Bootloader Messaging Interface). For NixWRT I want a monolithic kernel, but happily there is a config option for people like me: CONFIG_EXTRA_FIRMWARE takes a space-separated list of files that it expects to find in a location given in CONFIG_EXTRA_FIRMWARE_DIR, and bakes their contents into the kernel in some way such that the generated kernel can make calls with names like request_firmware to find them. So we can do the firmware using that.

Calibrate good times (come on!)

The calibration data is a little bit more involved, though .... On a PC or other "proper" computer, there's some kind of storage on the wireless card (this might be a so-called "OTP", which surprisingly enough stands for "One-Time Programmable" - or maybe an EEPROM - or according to some parts of the internet maybe even both? I'm sketchy here) which the manufacturer has set up with the cal data. When the driver initializes the card, it reads from the OTP or maybe the EEPROM or maybe it tries both (if they're not the same thing) and pushes that data into the device proper.

For a device which is intended for embedded systems, like the QCA9887, the manufacturers might not incorporate an OTP. It's destined for use in something that already has non-volatile memory on the host, why not just use some of that?

On some devices, it can be a little more involved, though ... the calibration data comes in two parts. There's the so-called pre-cal data, plus the board data file (BDF), and the two are combined somehow inside the device. Courtesy of a mailing list post

  1. load a firmware(-5).bin from /lib/firmware/ath10k/QCA4019/hw1.0/
  2. load the pre-cal (aka first part of calibration) data from /lib/firmware/ath10k/pre-cal-*
  3. do some firmware magic to identify the reference design
  4. load board data "files" (BDF) for this reference design from /lib/firmware/ath10k/QCA4019/hw1.0/board-2.bin
  5. send the BDF data to the firmware to let it compute the final calibration data
  6. start the actual wifi stuff

but wait! On a board which is not the Atheros reference design, it can be a little more involved ... only the reference boards get assigned board ids, and everyone else just borrows a board ID from something that they don't share electrical/RF/whatever characteristics with. Yay.

The IPQ4018/4019 SoC doesn't contain the actual RF parts. There are a couple of reference designs (SoC+RF parts) from QCA which got official numbers. These numbers identify the BDFs inside the board-2.bin. And the board-2.bin is not the firmware - it is a container for multiple BDFs.

Having said all that, I believe that for the QCA9887 we can skip some of this, because the ART partition in the flash (aka MTD) contains the final combined calibration data, so all we need to do is retrieve that and splat it into the device. For this relief much thanks - no worries about which board id we're improperly appropriating, just a lovely blob of binary mystery meat we need not examine closely. I hope.

Of course, that does still require us to be able to read the MTD. The ath10k driver doesn't already know how (as far as I can tell): it can get it from OTP or by asking via request_firmware or from the device tree.

Colonic irritation

So after obtaining a copy from my ART partition by booting OpenWRT and copying it across, my first thought was to add it to CONFIGEXTRAFIRMWARE except haha that that doesn't actually work because the driver requests a filename containing colon characters (it's something like cal-pci-0000:00:00.0.bin) and the thing that bakes firmwares into the kernel is written as a baroque piece of Makefile rule, and make is prejudiced against filenames that contain colons. The exact error message was target pattern contains no '%' and I am quite proud of myself for not having spent even longer than I did working out the actual problem.

Stuck between the orang-utan and one of the boats

There's a very funny childrens story about sticking things in a tree that probably shouldn't be there, and this next bit reminds me of it. If the device tree actually were a creation of the bootloader and it was passing configuration data into the freshly booted kernel as a parameter, I would willingly accept that 2k of binary blob encoding the length of the wireless antenna and the setting of the RF amplfier's volume knob is an appropriate part of that configuration data. As we are instead creating the device tree elsewhere on a build server and glomming it onto the end of the kernel we deploy to our target device, I am less convinced. But I am nothing if not pragmatic, and it beats coming up with an actual kernel patch to change the expected name of the calibration data file.

So, the cal data is now part of the device tree. To make this simpler we rearranged some of the code that builds the device tree from source, such that it's now its own derivation instead of part of the uimage derivation.

This actually works!

The animals went in 2.4

So, all that's left to do is add the ath9k.

First, it turns out we have the same problem here as we did with the mt7620 - the driver doesn't have the OF metadata to say it's compatible with the device.

Next, it turns out the mainline (4.19) kernel ath9k driver doesn't even support AHB anyway, only PCI. There's a patch in OpenWRT for that though, which also teaches it how to get its calibration data straight from MTD. This is a different way of doing it than we did in ath10k which offends my sense of perfect symmetry, but my occasional streak of pragmatism is kicking my sense of perfect symmetry under the table and my sense of perfect symmetry is keeping schtum.

Next next, when I added this, it stopped the ath10k from working. Argh.

Rules and regulations

This next bit I am slightly sketchy about, but this the internet so here goes anyway. Different countries have different laws about what you can broadcast on the radio, and even in parts of the spectrum like the 2.4GHz ISM band which are supposedly available globally, there are different power limits in various places. In Linux, there is a CRDA (Central Regulatory Database Agent) which can be queried to find out what you can do at any given frequency, but again there are kernel config flags to let us bake this into the kernel.

The problem is made more complicated by Atheros, who have decided that they also should lock the hardware to a particular set of local rules (anyone remember DVD region locks?) by having the EEPROM say which reg domain is supported then restricting you to the intersection of those rules and the rules of the regulatory domain that you've said are applicable in your location. Again I am a trifle sketchy here (because my other device has the same dmesg output but not the same problem) but this seems to cause problems because the EEPROM settings are for regdomain 0 - which is either "international" or "US", depending on who you believe - and the combined effect of that and requesting UK region is to disallow any operation on 5GHz channels.

(In passing: what I find odd about this is that it seems that a setting in the ath9k eeprom can change the behaviour of the entirely separate ath10k)

The Onus is Upon Us

Long story short(ened, but still really rather long): we have to add CONFIG_CFG80211_CERTIFICATION_ONUS to make it work. As far as I can work out this means "turn off all the safeties that ensure your transmitter is legal", so I'm not altogether happy about this. I need to do a bit more digging to ascertain whether there are different applicable restrictions for APs than there are for stations, because it would be much cleaner if we could enforce some appropriate restrictions instead of just disabling inappropriate ones. In OpenWRT there's a patch to disable enforcing the EEPROM regulatory restrictions which might be a less nuclear option if it works.


It works, but it needs tuning. Next steps:

The other thing that might be worth looking at, I have recently learned about, is the Linux Backports Project which "enables old kernels to run the latest drivers".