diary at Telent Netowrks

Radio Free Europe#

Thu, 26 Dec 2019 23:46:56 +0000

The GL-AR750 now has working (though not particularly fast) wifi on both 2.4GHz and 5GHz bands. A fair amount of fiddling was required to get us to this point, so in the best tradition of the Christmas Radio Times (a pre-digital British institution, don't know if still a thing) here is a double-length post about it all.

The GL-AR750 has two distinct sets of wifi hardware. The 2.4GHz stuff is part of the QCA9531 SoC, i.e. it's on the same silicon as the CPU, the Ethernet, the USB etc. The device is connected to the host via AHB, which I think (but have not confirmed) stands for Atheros Host Bus, and it is supported in Linux using the ath9k driver. The 5GHz support, on the other hand, is provided by a QCA9887 PCIe (PCI embedded) WLAN chip: I haven't looked closely at the router innards to see if this is actually physically a separate board that could be unplugged, but as far as the Linux is concerned it behaves as one. This is supported by the ath10k driver. Clear so far?

Five giga hertz, four calling birds, three French hens ...

My approach to porting NixWRT was basically

and the answer, at least initially, was that I got no kind of anything from the ath9k driver and some error messages from ath10k, so I thought I'd start there.

A firmware hand on the tiller

There are two things that ath10k devices need from their host environment that are not provided directly by the driver: the firmware and the calibration data. The firmware is the code that the wifi chip runs and we have to upload into it when it boots, and the calibration data, by my somewhat hazy impression, is stuff like tuning parameters for e.g. knowing which amplitudes correspond to what power outputs (which is obviously going to depend on the amplifiers, antenna design, etc, and therefore will differ depending on how the device is wired up).

On a proper PC the driver obtains the firmware by doing some kind of call-out-to-udev dance that makes userspace find it on the filesystem and feed it back into the kernel, and then feeds it into the device using the BMI (Bootloader Messaging Interface). For NixWRT I want a monolithic kernel, but happily there is a config option for people like me: CONFIG_EXTRA_FIRMWARE takes a space-separated list of files that it expects to find in a location given in CONFIG_EXTRA_FIRMWARE_DIR, and bakes their contents into the kernel in some way such that the generated kernel can make calls with names like request_firmware to find them. So we can do the firmware using that.

Calibrate good times (come on!)

The calibration data is a little bit more involved, though .... On a PC or other "proper" computer, there's some kind of storage on the wireless card (this might be a so-called "OTP", which surprisingly enough stands for "One-Time Programmable" - or maybe an EEPROM - or according to some parts of the internet maybe even both? I'm sketchy here) which the manufacturer has set up with the cal data. When the driver initializes the card, it reads from the OTP or maybe the EEPROM or maybe it tries both (if they're not the same thing) and pushes that data into the device proper.

For a device which is intended for embedded systems, like the QCA9887, the manufacturers might not incorporate an OTP. It's destined for use in something that already has non-volatile memory on the host, why not just use some of that?

On some devices, it can be a little more involved, though ... the calibration data comes in two parts. There's the so-called pre-cal data, plus the board data file (BDF), and the two are combined somehow inside the device. Courtesy of a mailing list post

  1. load a firmware(-5).bin from /lib/firmware/ath10k/QCA4019/hw1.0/
  2. load the pre-cal (aka first part of calibration) data from /lib/firmware/ath10k/pre-cal-*
  3. do some firmware magic to identify the reference design
  4. load board data "files" (BDF) for this reference design from /lib/firmware/ath10k/QCA4019/hw1.0/board-2.bin
  5. send the BDF data to the firmware to let it compute the final calibration data
  6. start the actual wifi stuff

but wait! On a board which is not the Atheros reference design, it can be a little more involved ... only the reference boards get assigned board ids, and everyone else just borrows a board ID from something that they don't share electrical/RF/whatever characteristics with. Yay.

The IPQ4018/4019 SoC doesn't contain the actual RF parts. There are a couple of reference designs (SoC+RF parts) from QCA which got official numbers. These numbers identify the BDFs inside the board-2.bin. And the board-2.bin is not the firmware - it is a container for multiple BDFs.

Having said all that, I believe that for the QCA9887 we can skip some of this, because the ART partition in the flash (aka MTD) contains the final combined calibration data, so all we need to do is retrieve that and splat it into the device. For this relief much thanks - no worries about which board id we're improperly appropriating, just a lovely blob of binary mystery meat we need not examine closely. I hope.

Of course, that does still require us to be able to read the MTD. The ath10k driver doesn't already know how (as far as I can tell): it can get it from OTP or by asking via request_firmware or from the device tree.

Colonic irritation

So after obtaining a copy from my ART partition by booting OpenWRT and copying it across, my first thought was to add it to CONFIGEXTRAFIRMWARE except haha that that doesn't actually work because the driver requests a filename containing colon characters (it's something like cal-pci-0000:00:00.0.bin) and the thing that bakes firmwares into the kernel is written as a baroque piece of Makefile rule, and make is prejudiced against filenames that contain colons. The exact error message was target pattern contains no '%' and I am quite proud of myself for not having spent even longer than I did working out the actual problem.

Stuck between the orang-utan and one of the boats

There's a very funny childrens story about sticking things in a tree that probably shouldn't be there, and this next bit reminds me of it. If the device tree actually were a creation of the bootloader and it was passing configuration data into the freshly booted kernel as a parameter, I would willingly accept that 2k of binary blob encoding the length of the wireless antenna and the setting of the RF amplfier's volume knob is an appropriate part of that configuration data. As we are instead creating the device tree elsewhere on a build server and glomming it onto the end of the kernel we deploy to our target device, I am less convinced. But I am nothing if not pragmatic, and it beats coming up with an actual kernel patch to change the expected name of the calibration data file.

So, the cal data is now part of the device tree. To make this simpler we rearranged some of the code that builds the device tree from source, such that it's now its own derivation instead of part of the uimage derivation.

This actually works!

The animals went in 2.4

So, all that's left to do is add the ath9k.

First, it turns out we have the same problem here as we did with the mt7620 - the driver doesn't have the OF metadata to say it's compatible with the device.

Next, it turns out the mainline (4.19) kernel ath9k driver doesn't even support AHB anyway, only PCI. There's a patch in OpenWRT for that though, which also teaches it how to get its calibration data straight from MTD. This is a different way of doing it than we did in ath10k which offends my sense of perfect symmetry, but my occasional streak of pragmatism is kicking my sense of perfect symmetry under the table and my sense of perfect symmetry is keeping schtum.

Next next, when I added this, it stopped the ath10k from working. Argh.

Rules and regulations

This next bit I am slightly sketchy about, but this the internet so here goes anyway. Different countries have different laws about what you can broadcast on the radio, and even in parts of the spectrum like the 2.4GHz ISM band which are supposedly available globally, there are different power limits in various places. In Linux, there is a CRDA (Central Regulatory Database Agent) which can be queried to find out what you can do at any given frequency, but again there are kernel config flags to let us bake this into the kernel.

The problem is made more complicated by Atheros, who have decided that they also should lock the hardware to a particular set of local rules (anyone remember DVD region locks?) by having the EEPROM say which reg domain is supported then restricting you to the intersection of those rules and the rules of the regulatory domain that you've said are applicable in your location. Again I am a trifle sketchy here (because my other device has the same dmesg output but not the same problem) but this seems to cause problems because the EEPROM settings are for regdomain 0 - which is either "international" or "US", depending on who you believe - and the combined effect of that and requesting UK region is to disallow any operation on 5GHz channels.

(In passing: what I find odd about this is that it seems that a setting in the ath9k eeprom can change the behaviour of the entirely separate ath10k)

The Onus is Upon Us

Long story short(ened, but still really rather long): we have to add CONFIG_CFG80211_CERTIFICATION_ONUS to make it work. As far as I can work out this means "turn off all the safeties that ensure your transmitter is legal", so I'm not altogether happy about this. I need to do a bit more digging to ascertain whether there are different applicable restrictions for APs than there are for stations, because it would be much cleaner if we could enforce some appropriate restrictions instead of just disabling inappropriate ones. In OpenWRT there's a patch to disable enforcing the EEPROM regulatory restrictions which might be a less nuclear option if it works.

Conclusion

It works, but it needs tuning. Next steps:

The other thing that might be worth looking at, I have recently learned about, is the Linux Backports Project which "enables old kernels to run the latest drivers".