diary at Telent Netowrks

Switched out#

Wed, 26 Jun 2019 20:55:18 +0000

I have spent an inordinate amount of time lately to get a working NixWRT wireless extender running on my MT300A. The symptom is that whenever I reboot it, about 30-60 seconds later it will lock up every device on the switch it is plugged into for a minute - and as this presently includes my build system, which also serves as my NAS, the problem is a high-priority one.

I've had all kinds of hypotheses, many of which involved bridging loops somewhere on the LAN, but no amount of staring at network diagrams, poring over wireshark captures or or fiddling with STP seemed to have any effect. Until last night when I tried to do a TFTP boot forgetting that I'd unplugged it from the TFTP server, and that resulted in exactly the same problem without even starting the Linux kernel.

So, I guess we can eliminate NixWRT from our enquiries. Whether the problem is hardware, or is something to do with how U-Boot initializes the hardware - or both, even - I am in some sense cheered to learn that it was probably nothing I did. At this point I'm going to flash the latest extensino.nix build to it, put the cover back on, and go and plug it in somewhere I have wireless dead spots, so that I can actually start working on the new router that arrived last week. And maybe not reboot it too often, but actually I'm hoping that once the image can be booted from flash, U-boot won't need to initialize the network device at all and there will be no problem.

Well, I might just add in support for STP first, it seems like a sensible thing to have.

Blogging on logging#

Fri, 14 Jun 2019 08:34:20 +0000

Before I put NixWRT on my primary internet connection, I want to deploy this wireless range extender, which means unplugging the serial connection. So, I really need to make it send syslog over the network.

"Just [* ] install syslogd", you say, "how hard can it be?". There's a syslogd applet in Busybox, all I need is something elsewhere on the LAN to receive the messages (hint: not Journald ). But, before I can send log messages over the network, I need the network to be available:

Which in sum is about 96% of everything it needs to do when it boots up, and whether that all goes to plan or not, any log messages it generates will be lost like tears in rain. Forgotten like a politician's manifesto commitments. Cast to the ground like a toddler's breakfast. I wanted something a bit more comprehensive, and didn't want to write the messages to flash because I'm not using any writable flash filesystem.

Avery Pennaruns blog article The log/event processing pipeline you can't have is not only a fun read about log processing at scale, but contains a really neat solution to this problem: instead of having klogd suck kernel messages and push them into syslogd which spits them out to files on disk, why not run the pump in reverse and push all the userland log messages into the kernel printk buffer? We have lots of RAM (I mean, by comparison with how much flash we can spare) and can squirrel all the boot time messages away until the network is up and ready to send them to the loghost.

Copying from /dev/log to /dev/kmsg is either simple or as complicated as you care to make it (the Google Fiber app has clearly encountered and addressed a whole bunch of real-world issues I haven't had to deal with).

Sending messages from the kernel printk buffer is slightly more complicated but only in the incidentals not in the essentials. We need to

1. read from /dev/kmsg and parse the strings we get. The format is documented and the only challenge here is that for reasons of keeping the system size down I felt obliged to write it in C, which is peculiarly suitable for text processing tasks. Emphasis here on "peculiar" not "suitable". Slightly annoyingly, the log entries don't include a timestamp field which can be related to time of day: instead they use a monotonic timestamp which is probably the number of seconds since boot but doesn't account for time while the system is suspended.

2. transform it into something J Random Syslog Server will recognise. There are two contenders here: RFC 5424 says how you're supposed to do it, with options for PROCID, MSGID and STRUCTURED-DATA which I probably don't need, and RFC 3164 says how everyone actually does it, which as you can imagine encompasses a wide variety of exciting brokenness (it's descriptive not prescriptive, and what it describes is little more constrained than "there might be a facility in angle brackets, there might be a timestamp, there's probably some other stuff). Initially I tried 3164 with ISO8601 timestamps instead of the weird legacy date format it recommends, but rsyslog declined to parse the rest of the line, so I switched to 5424. After making this decision I noticed that the author of rsyslog is also the author of RFC5242, so, uh, I guess there's that. I had to add my own timestamps here because see complaint about step 1, and I had to add in the hostname.

3. Send it to the internet. This bit at least is well-trodden ground.

There's only one other thing to say here, which is that the lines you write into /dev/kmsg are whatever you want to write. My klogcollect just writes more or less anything that turn up in /dev/log, without attempting to parse it, which means whatever format the C library syslog() function writes. So when forwarding the messages I have to check the message origin (kernel or userland) before deciding whether to format it for RFC5424 or whether to wing it as-is.

Summarising

After the monolog(ue), the epilog(ue)...

klogforward is on Github, and you can have this working in your NixWRT by adding

(syslog { loghost = "loghost.example.com" ; })

to your modules.

When I next come back to this I am going to play with the PRINTK_PERSIST patch that Avery talks about, and also I should have a proper look at logos - it probably solves problems I haven't seen yet. But more pressing right now is working out why I can't reboot my wireless extender without freezing every other device on the switch it's attached to for a minute. Wireshark says it's "MAC PAUSE" frames, and my working hypothesis is a switching loop.

Configuring Homeplug from Linux#

Thu, 23 May 2019 22:42:08 +0000

The new house doesn't have structured cabling, and I won't be doing anything to address that until we start work on the extension. In the meantime, therefore, we're using Homeplug AV (networking over the power line). I had two plugs, I needed a third so I bought the cheapest one I could find on Ebay.

When it turned up it had no buttons - which made it a little difficult to add to the existing network. Usually you press the "pair" button on the new plug and then on the existing plug and count to ten and wave a dead chicken around[*] and do a little dance, but that doesn't work when there is no button. Here's how to do it, assuming (1) it's an Intellon chipset, and (2) you have a Linux or other Unix-like box with an ethernet adaptor to plug it into.

1. You need faifa . To build it, you need libpcap and libevent: here's a quick and hacky Nix derivation .

2. Pick 8 random bytes to use for your encryption key. I did dd bs=8 count=1 if=/dev/random |od -x. This is a shared secret which you will need to install on all your adaptors, so probably you should save it somewhere until you're done. Note: the faifa source code seems to suggest that the key you provide is further hashed before being used, so maybe any amount of random stuff is OK here and there's no need to be precious about the format.

3. For each adaptor you want to configure, you will need its MAC address. Hopefully this is printed somewhere on the device itself: I don't know how to get it programmatically.

4. Plug the first adaptor into the mains and attach it to the machine where you built faifa. Now run it

$ sudo faifa -i enp0s31f6 -m

5. You should get a long list of "supported frames" which I will assume are about as meaningful to you as they were to me, followed by a prompt to choose one. If you were to choose, for example, a000 (which is "Get Device/SW Version Request), it might respond something like this:

Choose the frame type (Ctrl-C to exit): 0xa000
Frame: Get Device/SW Version Request (0xA000)

Dump: Frame: Get Device/SW Version Confirm (A001), HomePlug-AV Version: 1.0 Status: Success Device ID: INT6400, Version: INT6000-MAC-4-1-4102-00-3679-20090724-FINAL-C, upgradeable: 0

6. To set the encryption key, the frame type is 0xa050 (Set Encryption Key Request). It prompts you further for "local or remote" (I have no idea what this means, but "local" worked), for the key itself, and for the MAC address.

Choose the frame type (Ctrl-C to exit): 0xa050
Frame: Set Encryption Key Request (0xA050)
Local or distant setting ?
0: distant
1: local
1
AES NMK key ?0001020304050607
Destination MAC address ?b0:48:7a:b9:00:00

Dump: Frame: Set Encryption Key Confirm (A051), HomePlug-AV Version: 1.0 Status: Success

7. Repeat for the next adapter, until you have done them all.

I make no claim that this is correct, but it seems to work for me, and now I can plug my new Odroid C2 (I'll write about that another time, but there's little to say so far, it just runs Kodi) into the TV without the use of a 5m HDMI cable.

[*] make sure it's dead. Waving a live chicken is stressful for all involved.

NixWRT next words#

Mon, 06 May 2019 10:02:50 +0000

Happy New Year!

Uh, yeah. I moved house quite recently and am only just starting to get back on top of stuff. Without further ado:

Things I have done

Some forward progress and some sideways movement. The forward progress is that I have been able to bring up WiFi on the GL-MT300A which was until recently my primary domestic router. I don't know how fast, reliable or stable it is, but judging by the number of OpenWRT patches I haven't applied, I suspect the answer is "not very, yet".

The sideways movement is that the top-level configs (backuphost.nix, wap.nix etc) are moved into the examples/ subdirectory and I have ceased pretending they're device-independent, because in practice they turn out not to be. Building abstraction layers over things like switch vlan settings is a distraction from building the things themselves - which is not to deny its importance, just to say that i don't want always to be having to do both at once.

Note that as of the time I write this, the repo is still in a state of some flux: the only thing that has a hope of building/booting is `defaultroute.nix` and that doesn't even work for use as a router as I haven't even tried to configure pppoe yet.

Things I have learned

The GL-MT300A is based on the Mediatek MTK7620A SoC (hopefully this means the hard parts are also done for the MT300Nv2, based on the very similar MTK7628NN - when I eventually find it again after moving house, I will try it). The Linux driver for this is rt2x00 but verifying this is correct/current/best was a long process because when I first compiled it into my kernel it utterly failed to find any hardware.

This took a certain amount of digging[1] to find out what the problem was. It turns out that although there's a config option for RT2800SOC which is enabled by SOC_MT7620 (which might make you think it's supposed to work) there is no description of this hardware in the upstream mt7620a device tree, so the kernel (reasonably) has no idea that it should be using this code for anything. The fix is two-fold: first, use the much more fully-featured device trees files from OpenWRT instead of upstream, and second, patch the driver so that it advertises a compatible attribute which matches the compatible attribute in the device tree node. It turns out that OpenWRT has done this already as well and now my kernel knows it has wlan hardware and a driver which are compatible with each other.

Well, allegedly compatible.

<6>[    1.310547] ieee80211 phy0: rt2x00_set_rt: Info - RT chipset 6352, rev 0500 detected                
<3>[    1.318533] ieee80211 phy0: rt2800_init_eeprom: Error - Invalid RF chipset 0xbadd detected          
<3>[    1.326981] ieee80211 phy0: rt2x00lib_probe_dev: Error - Failed to allocate device

The fix for this is also in OpenWRT patches: it needs to be told to get the eeprom whereabouts from the device tree, and then it can read the actual eeprom instead of I-have-no-idea-where-it's-getting-that-from and it gets the right RF chipset ID.

One other thing I noticed while doing this is that the OpenWRT patch set here didn't apply cleanly to any kernel version I had previously associated with OpenWRT, so I went looking a bit deeper to find out what they did.

What they did will surprise you! At least, I need to dig further to confirm this absolutely, but on the evidence so far, it's surprising me. Here's how it looks: Not only do they have different kernel minor versions for different target devices (currently 4.9 for atheros devices and 4.14 for everything else ) but they build these kernels to exclude all wireless support, and then build a completely separate out-of-tree module based on a completely different tree to provide the appropriate wifi support modules. See the mac80211 Makefile

I have not replicated this pattern for NixWRT, mostly because I want to see if I can get away without having to. Right now I'm picking patches from that set and applying them selectively to my basic monolithic kernel, and it seems to be working. In this regard, let me briefly tout the last item on my "things I learned" list: filterdiff , a command line tool which manipulates diff files so that you can drop patches applying to files you don't have, rewrite file names, skip hunks and so on, which means I can apply patches cleanly direct from OpenWRT and don't have to maintain patches-on-patches.

Next step: I will probably turn this into a wireless range extender on a separate ssid than my usual, just because everything I read about the reliability of rx200 driver suggests that if I want something stable and performant I should also apply the rest of those (currently inapplicable) patches to fix txpower and other random stuff which I have no idea what it is, and my userbase - the family - won't like dogfooding on my behalf. One of the (many, most irrelevant) changes in the new place is the internet connection: my new ISP sent me a free router with a dual band radio, which is in some ways much nicer than the ones I'm building NixWRT on (came pre-configured with all the IPv6 setup, plus 5GHz band is practically unused around here whereas 2.4G is more congested than the A406 on a Saturday afternoon) and in other ways is doing my head in (weird slow GUI, no real shell, no way to tell which of the 14 devices in the house is saturating the line) but it'll do for the moment. Apparently it's a broadcom chipset so porting NixWRT to it might not be the best possible use of time.

Next next steps, therefore: research dualband router SoCs I would like to port NixWRT to, and also find out how to set up all the IPV6, because native IPv6 is kinda new and kinda fun.

[1] Which is to say, alternately staring at it and googling randomly

Moon on a stick#

Tue, 18 Dec 2018 22:31:25 +0000

Latest in my ever-expanding series of diversions from NixWRT is this foolish attempt to write a Wayland compositor using a language I don't actually understand. I'd add: not for the first time , except that now I check the link I see that was going to be a text editor not a compositor.

Anyway. This time it's a Wayland compositor, using the wlroots library (described by its author as "about 50,000 lines of code you were going to write anyway") and written in Lua. Mostly because when I get a bit further along with it I'm going to integrate Fennel into it, and then with a bit of luck will have a respectable replacement for the once-brilliant but now rather long in the tooth Sawfish . I say "when" but we all know I really mean "if".

At the time of writing it works to the extent that I have a Lua script that can set up the display appropriately then allow me to display a Konsole window into which I can type, and it renders a pointer which moves around in expected and unsurprising ways when I stroke my touchpad. It has no knowledge, though, of window decorations or of stacking order or even of focus or the channeling of pointer button/movement events onto a client.

(Decorations? Yes, I'm with the KDE guys on this one. Requiring clients to render their own window borders and resize themselves on the screen is like a bank requiring customers to track their own account balances and know which market funds their deposits were invested in: how can you ever be sure the app you thought was "Password Wallet Master Key Entry" was actually that and not, I dunno, "I'd like to add you to my professional Botnet" if you don't have some kind of trustable fence around the rectangle it's permitted to draw into. Honour system?)

Some brief notes on how I'm doing it, for the benefit of future-me and potential benefit of present-other. I'm using LuaJIT: not for speed but because the FFI can parse (most) C header files and there's a lot of FFI in this project. So far I have made a fairly literal transcription of the code in Drew DeVault's blog posts parts I-III, and then a rather sketchier interpretation of the code in Input handling in wlroots

Some stuff has changed since the blog posts were written. Here are the differences and/or relevant gotchas that I noted or can remember