diary at Telent Netowrks

Left to my own devices#

Thu, 12 Jul 2018 00:10:47 +0000

I bought another GL-Inet router so that I would have a device for testing on. Because it was a tenner cheaper, I bought the MT300N instead of the 300A.

So far:

[ 6186.097454] mtk_soc_eth 10100000.ethernet eth0: transmit timed out

I am probably going to stick with 4.14 anyway even if it's not strictly necessary, it looks like Openwrt is already using it as default on ralink/mediatek boards and there seems no point in being intentionally out of date.

Current mood : quite stoked that a new device I received on a Friday is running nixwrt by the following Wednesday - maybe 10-12 hours of actual hacking time later.

Thinking about next: how to do upgrades of production devices without needing to take the covers off again. I think it's going to involve kexec

Looks plausible so far, which probably means there's something I've forgotten.

I will run in the path of your commands#

Wed, 04 Jul 2018 00:33:38 +0000

I found last week's weird bug not long after posting. Debugging really got underway when I tried setting LD_LIBRARY_PATH to include /nix/store/...-zlib-1.2.11.../lib and observed that then my binaries were able to start. From this I inferred that the libraries themselves were most probably fine and the problem must be in the binaries referring to them or in the dynamic linker (or "ELF program interpreter" as we're apparently supposed to call it)

Running strings on broken or on working binaries didn't turn up much of note, but when I ran readelf -d monit I got

Dynamic section at offset 0x230 contains 31 entries:                                
  Tag        Type                         Name/Value                                
 0x00000001 (NEEDED)                     Shared library: [libz.so.1]                
 0x00000001 (NEEDED)                     Shared library: [libc.so]                  
 0x0000001d (RUNPATH)                    Library runpath: [/nix/store/6qw1h5hwikg4wv9dhfhyk08pzskph6y1-zlib-1.2.11-mips-unknown-linux-musl/lib:/nix/store/mkvy309rmdjzrj81j8hmc13j2fq6dpl1-musl-1.1.18-mips-unknown-linux-musl/lib]                         
 0x0000000c (INIT)                       0x4078b4                                   
...

for a working monit and something more like

Dynamic section at offset 0x230 contains 31 entries:                                
  Tag        Type                         Name/Value                                
 0x00000001 (NEEDED)                     Shared library: [libz.so.1]                
 0x00000001 (NEEDED)                     Shared library: [libc.so]                  
 0x1d000000 (<unknown>: 1d000000)        0x278e                                     
 0x0000000c (INIT)                       0x4078b4
...

and wait what why's there that 1d in the MSB instead of in the LSB where we'd recognise it? Either gcc (or ld or something) is misgenerating the ELF tags, or something afterwards is trashing them. Long story short, it turns out that something is patchelf and that I am not the first person to find the bug.

Given the patch in the PR (thanks UraniumKnight), it was comparatively simple to add it locally to my overlay and now everything is working. I still can't run with exact nixpkgs master, but there are only two changes and I have submitted PRs for both: #42795 and #42794

Next steps (ongoing): bring the mt300a config up to date so I can get cracking on replacing the OS on my primary internet router. Probably I should buy another one for this purpose so that I actually have a test device, I don't think the family will appreciate it if I kill the live one.

Shrunk but it came unlunk#

Fri, 29 Jun 2018 14:17:56 +0000

I finally "persuaded" NixWRT to produce an image less than 4MB large and am successfully posting this through it. But then I decided to update the Nixpkgs it's built on from a fork that diverged last February to current master, and guess what? It all broke again!

Presently I'm at the "thinking hard about the problem" stage of debugging, but this may soon progress to the git bisect stage of debugging, because I haven't had any good ideas yet.

I will try to fix you#

Thu, 21 Jun 2018 19:30:36 +0000

Last week I had hooked up the serial port on my upstairs wifi access point. This week I'm posting this blog entry through it. Which is a win, I think.

All is not finished yet, though, because the NixWRT install on it is too big for the teeny tiny 4MB flash chip and is running in RAM. Which means I can't put the case back on yet. So I've been working on ways to make the image smaller, which mostly has been about removing kernel options I don't need and busybox applets it doesn't use.

To do this, I have changed the module system to introduce - without, I admit, fully understanding - the fix-point pattern as used in package overlays. The deal here is that many packages may wish to change the compile-time options for busybox/the kernel, and simultaneously they also wish to use the resulting binary - and as packagers we want them all to use the same binary instead of making a new kernel/busybox for every package that needs a slightly different one. This is a mutually recursive dependency, but mercifully it's a mutually recursive dependency with a base case. What this means is that a module now takes self and super parameters where the super is a configuration attrset that it can modify and return, and the self is the "final" shape of that attrset that it can use for references to other packages and things that it needs but is not modifying. That's not an exact explanation of how to use this pattern, but it's basically the limit of my understanding of it.

Rearranging things in this way has led to a couple of nice features: for example, I can now add a module to introduce the kernel phram support that is only needed for TFTP boots (and thus have it excluded from the flashable image) or the 9p filesystem support that's only relevant to qemu builds. It also means my image size is now down to 1635k (kernel) plus 2600k (root), but squeezing out the final 300k or so that I'll need (I don't have an exact number but u-boot and ART partition will want space) is proving to be somewhat challenging. When this device was produced it was using a 3.x series kernel and I think 4.9 just unavoidably has a bit more heft in it.

Incidentally, backuphost.nix is assumed broken by this latest set of changes. Some day soon I will figure out a workable CI process for this project.

In other news, finally finished moving stuff off my old Debian shell host to a Nixos installation. Obligatory Bytemark plug goes here.

Solder but no wiser#

Wed, 13 Jun 2018 07:17:21 +0000

Everyone should have at least one hobby that they're no good at and/or dislike doing.

For me that was once playing guitar (still have the instrument, no longer have the calluses) but these days it's soldering. So after having stuck three short pieces of paperclip into adjacent holes inside my Trendnet TEW712BR and showered them with blobs of molten tin, I was as surprised as anybody when I plugged my USB TTL serial converter in and found I had a root console.

Notes for others who may follow this way:

It turns out I'm slightly ahead of myself in doing this right now, because although I have an image for Milestone 1 that runs on an Arduino Yun, it's too big - there is only 4MB flash in this little box. Still, at least I am able to capture a boot log and find out its partition scheme and kernel load address.

Yes, I have working wifi, and the use case for "WiFi access point/range extender" (WiFi in AP mode via hostapd, bridged to Ethernet, and a DHCP client). This entailed

Mostly fairly straightforward stuff so far - or at least, at a remove of anything up to two weeks since I did some of this work, I've forgotten whatever problems I ran into. It's all in wap.nix , which still copies a bit too much code from backuphost.nix but that will change as soon as I can get my head around fixpoints. To get it to fit into flash on the device I'm going to need a smaller kernel and perhaps a smaller busybox.