diary at Telent Netowrks

Flash! (ah-ah)#

Mon, 16 Apr 2018 23:18:57 +0000

This week I successfully flashed NixWRT to my GL-MT300A, such that it runs whenever I turn the device on. And then I found it slowly fills up all RAM and the process table over about half a day then stops working.

But we'll get to that in a minute. Let's talk about the good bits first.

Graven image

Given we have a kernel and a root filesystem, how exactly should we mash them up into a flashable image such that (i) uboot will find and run the kernel, (ii) the kernel will find its filesystem? Let's look at the dmesg output from booting OpenWRT -

[    0.620000] m25p80 spi32766.0: w25q128 (16384 Kbytes)
[    0.630000] 5 ofpart partitions found on MTD device spi32766.0
[    0.630000] Creating 5 MTD partitions on "spi32766.0":
[    0.640000] 0x000000000000-0x000000030000 : "u-boot"
[    0.650000] 0x000000030000-0x000000040000 : "u-boot-env"
[    0.650000] 0x000000040000-0x000000050000 : "factory"
[    0.660000] 0x000000050000-0x000000fd0000 : "firmware"
[    0.780000] 2 uimage-fw partitions found on MTD device firmware
[    0.790000] 0x000000050000-0x000000174720 : "kernel"
[    0.790000] 0x000000174720-0x000000fd0000 : "rootfs"
[    0.800000] mtd: device 5 (rootfs) set to be root filesystem
[    0.810000] 1 squashfs-split partitions found on MTD device rootfs
[    0.810000] 0x000000890000-0x000000fd0000 : "rootfs_data"
[    0.820000] 0x000000ff0000-0x000001000000 : "art"

It has five "ofpart" partitions. I'm guessing the "of" stands for "open firmware" and indeed if we look at the DTS file (remember if you will from blog entries passim that the device tree is a representation of "what you'd get from open firmware if the hardware had open firmware") we can see five partitions defined there.

It has two "uimage-fw" partitions which further subdivide the firmware partition. Unlike the ofpart partitions these are not actually defined anywhere: they are the result of kernel code (in drivers/mtd/mtdsplit/mtdsplit{,_uimage}.c) which looks for partitions which start with a uimage, parses the image length, and then looks for a filesystem signature of some kind on the next erase block boundary. (This is very convenient magic not least because it means we don't have to update some partition table each time our kernel size changes, but it still makes me uneasy; I have a very low threshold for magic)

Hypothesis (subsequently proven): if our firmware file consists of a kernel wrapped in a uimage, plus padding to the next erase block boundary, plus the filesystem image, Linux will report an MTD uimage-fw partition that starts where the filesystem starts. We can copy this combined image into flash at offset 0x000000050000 to overwrite the existing kernel/root fs while leaving the rest of the flash undisturbed.

(To get the erase block size, we could (a) guess it's probably 128k, or (b) if we were more prudent, check in in /proc/mtd )

Short version: that half of the puzzle is solved by creative use of dd.

Flash override

So the general principle for flashing from U-Boot once you have a suitable image, is (1) download it into RAM somewhere, (2) erase an appropriate section of flash, (2.1) hope the power doesn't fail before you finish step 3, (3) copy the image from RAM into flash, (4) reboot and see whether you've bricked the device. Although unless you've done something badly wrong (like overwrite uboot itself or the ART partition) then it doesn't matter too much if the image you've uploaded doesn't actually work because you can just go back to the u-boot prompt and try again. There's an explanation of how to do this on the Yun which I had previously successfully followed, converting that to another board is just a matter of working out whereabouts in RAM the flash chip is mapped.

And the simplest way of doing this is to look at what uboot does by default when the board is powered on. If we run printenv we see i.a.

   bootcmd=bootm 0xbc050000
With 99.8% certainty, we think that this is a jump to offset 0x50000 (remember, this is the offset of the "firmware" partition) of a flash chip that starts at 0xbc000000, and this is probably all we need. So: on the build machine

$ nix-build -I nixpkgs=../nixpkgs-for-nixwrt/ backuphost.nix \
 -A firmwareImage --argstr targetBoard mt300a -o mt300a.bin
$ cp mt300a.bin /tftp

and then on the device, run these u-boot commands

setenv serverip 192.168.0.2 
setenv ipaddr 192.168.0.251 
tftp 0x80060000 /tftp/mt300a.bin
erase 0xbc050000 0xbcfd0000
cp.b 0x80060000 0xbc050000 ${filesize};
reset

and then offer up a silent prayer because 99.8% is still less than 100%. Punch the air shortly thereafter :-)

Flash I love you but we only have 14 hours

As alluded to above, there is something weird going on in userland that makes it presently less than useful: the ntpd and syslogd processes (both actually BusyBox applets) don't write pid files when they start up, causing monit to decide they have failed to start and spawn another. One of each of them added every 30 seconds soon leads to a poorly computer.

No idea why, yet. I'd run strace but it doesn't want to build (maybe a MIPS thing, maybe a musl thing). Hopefully next week...

Completely random aside

And I mean completely random. Is it just me or does anyone else find that the taillight cluster on the new Prius reminds them of Ming the Merciless?

Maybe just me.

The GL(.Inet) has landed#

Tue, 10 Apr 2018 22:57:27 +0000

See subject. Most of the work this last week has been moving things around in the hope of making it possible to support more than one device, and then merging the gl-mt300a branch into master (nitpick: for reasons I can't remember and are unlikely to be convincing, the primary development branch is actually called nixwrt not master. Probably something to do with git filter-branch)

This breaks the Yun code that was previously there, because "possible" and "actually implemented" are two different things. But I am not using it, I am pretty certain nobody else is either, and at least now I can see how to fix it.

I have added a targetBoard argument which will soon allow choice of mt300a, yun or malta (that last one for qemu), so the build command is presently:

nix-build -I nixpkgs=../nixpkgs-for-nixwrt/ backuphost.nix \
  -A tftproot --argstr targetBoard mt300a -o mt300a 

I have added a `swconfig` invocation to the monit config so that networking comes up. I've done it in a totally hacky way until I decide how to represent the switch in configuration, but that might involve learning how the damn thing works.

That's about all for now, save to say that yesterday I went to the NixOS London meetup which in the event did not involve an install party (everyone present had installed it already) but did involve some interesting conversations, learning something about overlays and giving a quick demo of NixWRT so far. Albeit that I was demonstrating by ssh back to my hardware at home : no actual hardware at the venue that could be waved around or kicked. I had intended to do something a bit more visual but thought better of it when I realised how much of my home kit I'd have to unplug. Next time, definitely.

Flip the switch#

Mon, 02 Apr 2018 22:07:34 +0000

[ meta: I wanted to call this one "sudo make me a LAN switch" but now the MRAs have ruined that phrase for everyone ]

I got networking working on the GL-MT300A. This entailed:

Patching the device tree definition.

This is slightly cargo-culted (I Read It On The Internet), but by removing the pinctrl-0 entry from the &ethernet stanza I managed to change the bootup mesages from saying

[    1.873672] rt2880-pinmux pinctrl: could not request pin 40 (io40) from group
 ephy  on device rt2880-pinmux
[    1.883620] mtk_soc_eth 10100000.ethernet: Error applying setting, reverse things back
[    1.891724] mtk_soc_eth: probe of 10100000.ethernet failed with error -22

to saying

<6>[    2.586201] mtk_soc_eth 10100000.ethernet eth0 (uninitialized): port 1 link up (100Mbps/Full duplex)
<6>[    2.595753] mtk_soc_eth 10100000.ethernet: loaded mt7620 driver
<6>[    2.602600] mtk_soc_eth 10100000.ethernet eth0: mediatek frame engine at 0xb0100000, irq 5

which felt a lot like progress but did not result in actual connectivity.

Building swconfig

After trying the obvious culprits (firewall rules? weird routing?) for why my board was not seeing the network - indeed, not even able to ping its own IP address - I studied the dmesg output a bit more closely, and noticing the line

<6>[    2.581845] gsw: setting port4 to ephy mode

I took a wild-ass guess that given I knew the the device contains some kind of network switch, maybe the switch doesn't come up in a useful state. So we needed a tool of some kind to reconfigure it and apparently the appropriate tool is swconfig

(If you followed that link and struggled to understand what it was talking about, be assured that it means nothing to me either. Ure not alone)

Building swconfig is easier when you start with a fork for Debian instead of the original OpenWRT package: I simply made my kernel derivation install the header files (first time I have written a multi-output derivation, but turned out that in this case it was a one-line change), wrote a derivation with libnl as a dependency, and created an 80MB filesystem image.

PRO TIP: don't override phases in a Nixpkgs derivation unless you understand what is done by all the phases you didn't include. In this case, not running the fixup phase meant that nothing ran the "shrink rpath" magic which removes unneeded compile-time dependencies from the runtime dependency list. Image size go boom.

Running swconfig

Dog with network cable, captioned \

I spent some time trying to figure out what I was doing with vlans and port configuration and worrying that when it said link: ?unknown-type? against each port that would mean I had to tell it somehow what kind of link type to use. Then eventually I hit on

swconfig dev switch0 set enable_vlan 0
swconfig dev switch0 set apply

and as if by magic (= "insufficiently advanced understanding of technology") it all started working. When I eventually want it to function as a switch, obviously I will need to revisit this. But that is Milestone 1 and this is Milestone 0 - sufficient unto the day etc etc.

The U is for urgghle#

Wed, 28 Mar 2018 00:36:42 +0000

A more productive week than the previous one , on the whole.

(Once I got warmed up, at least. I returned from my holiday to find that the entire local network had stopped working because after mucking around with U-boot on the device while I was away, I'd inadvertently let the default openwrt installation on the MT300A start , and it was running a DHCP server. Shouldn't have put it on the LAN, I suppose. One round of reboots later ... but apart from that it has been a productive week)

Let's start by spoiling the ending, so you don't have to read the rest of this post: said MT300A now boots to user space and runs init (and monit). I hope that all that remains is to get the Ethernet working and to build a flashable image.

On our way to that destination, we ... basically, this is another thrilling instalment of "don't trust the bootloader"

This board builds with device tree, but its u-boot has no way to provide a device tree blob at start up, so what we have to do instead is bodge the device tree into the kernel itself.

The actual mechanics of glomming a device tree binary blob onto the kernel image are fairly straightforward if you have an openwrt build to crib from: generate the ELF vmlinux as usual, convert it to a raw binary, compile the DTS file (which involves preprocessing it with cpp because - I don't know why - it contains two different kinds of include directive and the dtc tool only understands one of them) and then use some magic patch-dtb tool to stick them together.

I previously described device tree as "let's pretend that the hardware description was provided to us by open firmware." and insofar as this means the hardware is described by a data file instead of by the effect of running imperative C code then it's definitely good. But when that data file is provided by the kernel source [*] and attached to the kernel image, instead of being passed in by the bootloader as an input to the kernel, things get weird. For example, the default config for the MT300A is that the kernel command line is coming from DT - effectively this means that the kernel provides its own command line. And you may ask yourself: where does that highway go to? It would make sense if the command line had been provided by the user to the bootloader which then merged it into the DT, but in this scenario ... not so much.

[*] splitting hairs here: the particular DTS we're using comes from LEDE and not from the mainline kernel. But that doesn't invalidate my point, which is that it doesn't come from the bootloader.

At this point, and skating over a minor digression where I had accidentally built a kernel that thought it was little endian but using a big endian compiler (tl;dr - didn't work), I had a kernel that booted most of the way to mounting root but then failed.

The reasons it failed were approximately legion in number, or at least that's how it felt. In the order I discovered them :

I. The kernel was configured to get its command line - see above - from DT not the bootloader, so was ignoring all my phram options because the hardcoded command line overrode them.

II. For reasons that no doubt made perfect sense to the people who were there are the time, the u-boot build on this board doesn't support bootargs anyway. So the kernel was ignoring all my phram options because it wasn't even seeing them . Remember, kids - the U in "u-boot" stands for #undef . (I am assuming the sources there correspond with the binary that shipped on my device, to be quite honest I haven't checked this properly for myself).

III. another bit of hardcoded magic ... there is a kernel config option in LEDE, again enabled by default, that looks through the MTD partition table and if it finds a partition named rootfs, sets it up as the root filesystem. This overrides any root= option given on the command line. So the openwrt rootfs on the actual flash was being used instead of the custom root fs in phram - and failing to work not least because it's a jffs filesystem and I'd set rootfstype=squashfs.

IV. Whereas the Yun wanted the address of the filesystem image as an offset from 0x80000000, the MT300A wants it offset from 0. I haven't got the MIPS memory map entirely straight in my head but i think I would be using the right words if I said it wants a kuseg memory address not a kseg0 address. Or perhaps it wants a physical address not a kseg0 address. (The same physical RAM is mapped in both places but the cache behaviour is different). Whichever is the right explanation I don't know, but when I fixed that, bingo, it boots!

I don't know if the Yun would work with kuseg addresses too, but next time I plug it back in I'll try it.

Also this week, significantly cleaned up how we provide config options to the nixwrt kernel derivation - now we read the appropriate defconfig file(s) into an attrset then pass that attrset into overrideConfig. overrideConfig is a function that accepts an attrset of default config options and is expected to return a customised set. It's a lot prettier than the rather awful mess of echo and grep -v it replaces, but it's not 100% perfect because it doesn't know the type of each config option value - if you have a string value you need to quote that string yourself (tip: use builtins.toJSON) but it will do for now. Probably I should extract it into a function so that I could use the same code to configure Busybox.

That's about it for this week.

Lede by example#

Tue, 20 Mar 2018 21:51:52 +0000

My GL-MT300A arrived just as I was about to go on holiday. This is how far I've got -

Serial console

These things are, if not actually made for DIY purposes, at least very tolerant to such uses. Take it out of its case and you find three standard 0.1" header pins on the PCB labelled "TX", "RX", "GND" - connect each of them to something that speaks TTL serial (I used a Raspberry Pi) and set the baud rate to 115200. Worked first time.

"The U is for Uninitialized"-Boot

I commented previously about the differences one may encounter between two devices both of which run the allegedly "Universal" U-Boot boot loader. This time I couldn't work out why my tftp downloads were loading into memory at offset 0 instead of, say, 0x811f8000. Until I realised that (i) it no longer sufficed to say

setenv rootaddr 11f8000
setenv rootaddr_useg 0x$rootaddr
setenv rootaddr_ks0 0x8$rootaddr

and I must now surround environment variable references with curly braces.

setenv rootaddr 11f8000
setenv rootaddr_useg 0x${rootaddr}
setenv rootaddr_ks0 0x8${rootaddr}

and (ii) on this device, double quotes around the value of a setenv are no longer special, so

  setenv bootn "foo;bar"
will set the value of bootn to "foo and then attempt to run the command bar". Which typically doesn't work all that well.

Hello darkness my old friend

Having made the relevant changes I was able to get the following output:

## Booting image at 81000000 ... Image Name: Linux-4.9.76 Image Type: MIPS Linux Kernel Image (lzma compressed) Data Size: 1705466 Bytes = 1.6 MB Load Address: 80001000 Entry Point: 803fa9c0 Verifying Checksum ... OK Uncompressing Kernel Image ... OK No initrd ## Transferring control to Linux (at address 803fa9c0) ... ## Giving linux memsize in MB, 128

Starting kernel ...

followed by indefinite but emphatic silence, and various bouts of fiddling with CONFIG_EARLY_PRINTK and stuff have not yet persuaded it to loosen up. Currently I am running LEDE in a Docker container to see what it does, and diffing its .config with mine. This has shown up a couple of things that I've now added to my configuration, but I am only going to get one shot at running it before I go home, because at 70 miles distant from the hardware I can't reach across and power cycle it.