diary at Telent Netowrks

Blogging on logging#

Fri, 14 Jun 2019 08:34:20 +0000

Before I put NixWRT on my primary internet connection, I want to deploy this wireless range extender, which means unplugging the serial connection. So, I really need to make it send syslog over the network.

"Just [* ] install syslogd", you say, "how hard can it be?". There's a syslogd applet in Busybox, all I need is something elsewhere on the LAN to receive the messages (hint: not Journald ). But, before I can send log messages over the network, I need the network to be available:

Which in sum is about 96% of everything it needs to do when it boots up, and whether that all goes to plan or not, any log messages it generates will be lost like tears in rain. Forgotten like a politician's manifesto commitments. Cast to the ground like a toddler's breakfast. I wanted something a bit more comprehensive, and didn't want to write the messages to flash because I'm not using any writable flash filesystem.

Avery Pennaruns blog article The log/event processing pipeline you can't have is not only a fun read about log processing at scale, but contains a really neat solution to this problem: instead of having klogd suck kernel messages and push them into syslogd which spits them out to files on disk, why not run the pump in reverse and push all the userland log messages into the kernel printk buffer? We have lots of RAM (I mean, by comparison with how much flash we can spare) and can squirrel all the boot time messages away until the network is up and ready to send them to the loghost.

Copying from /dev/log to /dev/kmsg is either simple or as complicated as you care to make it (the Google Fiber app has clearly encountered and addressed a whole bunch of real-world issues I haven't had to deal with).

Sending messages from the kernel printk buffer is slightly more complicated but only in the incidentals not in the essentials. We need to

1. read from /dev/kmsg and parse the strings we get. The format is documented and the only challenge here is that for reasons of keeping the system size down I felt obliged to write it in C, which is peculiarly suitable for text processing tasks. Emphasis here on "peculiar" not "suitable". Slightly annoyingly, the log entries don't include a timestamp field which can be related to time of day: instead they use a monotonic timestamp which is probably the number of seconds since boot but doesn't account for time while the system is suspended.

2. transform it into something J Random Syslog Server will recognise. There are two contenders here: RFC 5424 says how you're supposed to do it, with options for PROCID, MSGID and STRUCTURED-DATA which I probably don't need, and RFC 3164 says how everyone actually does it, which as you can imagine encompasses a wide variety of exciting brokenness (it's descriptive not prescriptive, and what it describes is little more constrained than "there might be a facility in angle brackets, there might be a timestamp, there's probably some other stuff). Initially I tried 3164 with ISO8601 timestamps instead of the weird legacy date format it recommends, but rsyslog declined to parse the rest of the line, so I switched to 5424. After making this decision I noticed that the author of rsyslog is also the author of RFC5242, so, uh, I guess there's that. I had to add my own timestamps here because see complaint about step 1, and I had to add in the hostname.

3. Send it to the internet. This bit at least is well-trodden ground.

There's only one other thing to say here, which is that the lines you write into /dev/kmsg are whatever you want to write. My klogcollect just writes more or less anything that turn up in /dev/log, without attempting to parse it, which means whatever format the C library syslog() function writes. So when forwarding the messages I have to check the message origin (kernel or userland) before deciding whether to format it for RFC5424 or whether to wing it as-is.


After the monolog(ue), the epilog(ue)...

klogforward is on Github, and you can have this working in your NixWRT by adding

(syslog { loghost = "loghost.example.com" ; })

to your modules.

When I next come back to this I am going to play with the PRINTK_PERSIST patch that Avery talks about, and also I should have a proper look at logos - it probably solves problems I haven't seen yet. But more pressing right now is working out why I can't reboot my wireless extender without freezing every other device on the switch it's attached to for a minute. Wireshark says it's "MAC PAUSE" frames, and my working hypothesis is a switching loop.

Switched out#

Wed, 26 Jun 2019 20:55:18 +0000

I have spent an inordinate amount of time lately to get a working NixWRT wireless extender running on my MT300A. The symptom is that whenever I reboot it, about 30-60 seconds later it will lock up every device on the switch it is plugged into for a minute - and as this presently includes my build system, which also serves as my NAS, the problem is a high-priority one.

I've had all kinds of hypotheses, many of which involved bridging loops somewhere on the LAN, but no amount of staring at network diagrams, poring over wireshark captures or or fiddling with STP seemed to have any effect. Until last night when I tried to do a TFTP boot forgetting that I'd unplugged it from the TFTP server, and that resulted in exactly the same problem without even starting the Linux kernel.

So, I guess we can eliminate NixWRT from our enquiries. Whether the problem is hardware, or is something to do with how U-Boot initializes the hardware - or both, even - I am in some sense cheered to learn that it was probably nothing I did. At this point I'm going to flash the latest extensino.nix build to it, put the cover back on, and go and plug it in somewhere I have wireless dead spots, so that I can actually start working on the new router that arrived last week. And maybe not reboot it too often, but actually I'm hoping that once the image can be booted from flash, U-boot won't need to initialize the network device at all and there will be no problem.

Well, I might just add in support for STP first, it seems like a sensible thing to have.