Blogging on logging#
Fri, 14 Jun 2019 08:34:20 +0000
Before I put NixWRT on my primary internet connection, I want to deploy this wireless range extender, which means unplugging the serial connection. So, I really need to make it send syslog over the network.
"Just [* ] install syslogd", you say, "how hard can it be?". There's a syslogd applet in Busybox, all I need is something elsewhere on the LAN to receive the messages (hint: not Journald ). But, before I can send log messages over the network, I need the network to be available:
- the DHCP client on
br0
device has been allocated an address - which means the bridge between wireless and wired is up
- which means
hostapd
is running - and
swconfig
has run - which means
monit
(process supervisor) has started - which means the kernel booted successfully and started userland
Which in sum is about 96% of everything it needs to do when it boots up, and whether that all goes to plan or not, any log messages it generates will be lost like tears in rain. Forgotten like a politician's manifesto commitments. Cast to the ground like a toddler's breakfast. I wanted something a bit more comprehensive, and didn't want to write the messages to flash because I'm not using any writable flash filesystem.
Avery Pennaruns blog article The log/event processing pipeline you can't have is not only a fun read about log processing at scale, but contains a really neat solution to this problem: instead of having klogd suck kernel messages and push them into syslogd which spits them out to files on disk, why not run the pump in reverse and push all the userland log messages into the kernel printk buffer? We have lots of RAM (I mean, by comparison with how much flash we can spare) and can squirrel all the boot time messages away until the network is up and ready to send them to the loghost.
Copying from /dev/log
to /dev/kmsg
is either
simple
or as complicated as you care to make it
(the Google Fiber app has clearly encountered and addressed a
whole bunch of real-world issues I haven't had to deal with).
Sending messages from the kernel printk buffer is slightly more complicated but only in the incidentals not in the essentials. We need to
1. read from /dev/kmsg
and parse the strings we get. The format is
documented
and the only challenge here is that for reasons of keeping the system
size down I felt obliged to write it in C, which is peculiarly
suitable for text processing tasks. Emphasis here on "peculiar" not
"suitable". Slightly annoyingly, the log entries don't include a
timestamp field which can be related to time of day: instead they use
a monotonic timestamp which is probably the number of seconds since
boot but doesn't account for time while the system is suspended.
2. transform it into something J Random Syslog Server will recognise. There are two contenders here: RFC 5424 says how you're supposed to do it, with options for PROCID, MSGID and STRUCTURED-DATA which I probably don't need, and RFC 3164 says how everyone actually does it, which as you can imagine encompasses a wide variety of exciting brokenness (it's descriptive not prescriptive, and what it describes is little more constrained than "there might be a facility in angle brackets, there might be a timestamp, there's probably some other stuff). Initially I tried 3164 with ISO8601 timestamps instead of the weird legacy date format it recommends, but rsyslog declined to parse the rest of the line, so I switched to 5424. After making this decision I noticed that the author of rsyslog is also the author of RFC5242, so, uh, I guess there's that. I had to add my own timestamps here because see complaint about step 1, and I had to add in the hostname.
3. Send it to the internet. This bit at least is well-trodden ground.
There's only one other thing to say here, which is that the lines you
write into /dev/kmsg
are whatever you want to write. My
klogcollect
just writes more or less anything that turn up in
/dev/log
, without attempting to parse it, which means whatever format
the C library syslog()
function writes. So when forwarding the
messages I have to check the message origin (kernel or userland)
before deciding whether to format it for RFC5424 or whether to wing it as-is.
Summarising
After the monolog(ue), the epilog(ue)...
klogforward is on Github, and you can have this working in your NixWRT by adding
(syslog { loghost = "loghost.example.com" ; })
to your modules.
When I next come back to this I am going to play with the PRINTK_PERSIST patch that Avery talks about, and also I should have a proper look at logos - it probably solves problems I haven't seen yet. But more pressing right now is working out why I can't reboot my wireless extender without freezing every other device on the switch it's attached to for a minute. Wireshark says it's "MAC PAUSE" frames, and my working hypothesis is a switching loop.