A short post this week, but this is because I need to sleep, not
because there is nothing to write about.
First up, NixWRT has moved. It is no longer part of a "lightly forked
nixpkgs" repo, it has its own repo containing only NixWRT stuff at
https://github.com/telent/nixwrt . Instead of embedding the Nix package collection it now requires that you provide it with one by e..g using the -I flag to nix-build
nix-build -I nixpkgs=../nixpkgs-for-nixwrt/ -A tftproot backuphost.nix
Presently there is still a mildly forked Nix package collection
involved, but it is now available separately, and I have started the
process of feeding the changes back into upstream so I hope to be able
to eliminate that dependency in time.
Second, it builds with musl - which is great news as the image for
`backuphost` is too big to fit in 8MB flash when using glibc. The
changes
required
to switch to musl are - apart from a small bug in nixpkgs libiconv
derivation - ludicrously trivial.
Third, I was not entirely correct last week when I said that upgrading
to nixpkgs master caused nixwrt to break "almost not at all", because
after I actually split the repos up I found a couple more patches
needed than just the two mentioned. But nothing too serious.
Here's what it looks like:
[dan@loaclhost:~/src/nixwrt]$ ls -l yun/
total 9608
-r--r--r-- 1 root root 1565199 Jan 1 1970 kernel.image
-r--r--r-- 1 root root 2568192 Jan 1 1970 rootfs.image
-r-xr-xr-x 1 root root 5698784 Jan 1 1970 vmlinux
(vmlinux is not actually required on the target, it's a leftover)
Next up will be more patch upstreaming, and making it generate an
image I can actually flash onto a TL-WR842. It is claimed that the
emergency debricking TFTP client only works when fed with actual
TP-Link images and not with OpenWRT, which is going to be bit of a
drag if true.
No Nix content at all this week, as all I've done is flash (please
refer back to blog post title) my TL-WR842ND back to the factory
firmware in preparation for figuring out how to get NixWRT onto it.
There's some discussion of how to do this on the OpenWRT
wiki - attach
router to wired network, configure a tftp server to answer on
192.168.1.66 and respond to requests for a file called
wr842ndv1_tp_recovery.bin which was previously downloaded from the
TP-Link site, then turn the router on while holding RESET and wait for
stuff to happen.
As always, however, there is a wrinkle. The firmware I downloaded was a ZIP which contained a file called wr842ndv1_en_3_12_25_up_boot(130322).bin, and according to most sources (most sources parrot the OpenWRT wiki)
in case the file name of this firmware file does contain the word “boot” in it, you need to cut off parts of the image file before flashing it:
specifically, remove the first 131584 bytes. Why that number? It doesn't say.
So there you are: the emergency tftp restore expects an image with a TP-Link firmware header followed by a kernel followed by a filesystem - which roughly corresponds with the description of mtd5 in the openwrt flash layout - but the image on the TP-Link site prefaces that with about 128k of something that might be U-boot, which roughly corresponds with the layout of the entire flash chip
Going forward this is relevant insofar as it means we really have two
problems not just one
creating a firmware layout for NixWRT which is acceptable to some
flashing tool or other - the tftp emergency flash, or the "Firmware
upgrade" web ui in the OEM firmware, or some facility offered by
OpenWRT if I flash that first.
creating a kernel and fs which will boot successfully on the
hardware and work well enough to bring up networking. Because, as
previously mentioned, I have not yet been able to make the serial
console work.
Currently thinking: we can tackle problem 2 first. Let's put OpenWRT
on the machine (then at least I have ssh available) and then build a
kernel/fs I can start with kexec and iterate on that until I know it
works on the hardware. Once we have the right code then we can
start figuring out how to put it at the right offset.
last week: https://ww.telent.net/2018/2/28/musl_memory
Das U-Boot is billed as "the Universal Boot Loader", but sometimes I
wonder if in practice the U stands for "unique per board" or
"unco-ordinated" or even "uninstallable" - simply because the actual
version of u-boot that comes installed on your cheap consumer router or IoT device board is a forked and undocumented
mess based on an upstream release that's probably about ten years old,
and if you want to replace it with mainline U-Boot you have to either
(1) be lucky enough to have your new build work perfectly first time,
or (2) have access to JTAG or a serial programmer in case it doesn't.
Unfortunately, as it doesn't support my device (it supports some
varieties of TL-WR841 and a later revision of WR842 than mine) I'm
disinclined to try building it given that if it doesn't work - and
that it's sensitive to things like gcc version - there is again no way
to resurrect the device without special hardware.
Excuses, excuses. What's the answer?
New hardware
I ordered this yesterday, so when Amazon eventually deign to deliver it,
development will/may resume.
My GL-MT300A arrived just as I was about to go on holiday. This is how far I've got -
Serial console
These things are, if not actually made for DIY purposes, at least
very tolerant to such uses. Take it out of its case and you find
three standard 0.1" header pins on the PCB labelled "TX", "RX", "GND"
- connect each of them to something that speaks TTL serial (I used a
Raspberry Pi) and set the baud rate to 115200. Worked first time.
"The U is for Uninitialized"-Boot
I commented previously about the differences one may encounter
between two devices both of which run the allegedly "Universal" U-Boot
boot loader. This time I couldn't work out why my tftp downloads were
loading into memory at offset 0 instead of, say, 0x811f8000. Until I
realised that (i) it no longer sufficed to say
and (ii) on this device, double quotes around the value of a setenv are no longer special, so
setenv bootn "foo;bar"
will set the value of bootn to "foo and then attempt to run the command bar". Which typically doesn't work all that well.
Hello darkness my old friend
Having made the relevant changes I was able to get the following output:
## Booting image at 81000000 ...
Image Name: Linux-4.9.76
Image Type: MIPS Linux Kernel Image (lzma compressed)
Data Size: 1705466 Bytes = 1.6 MB
Load Address: 80001000
Entry Point: 803fa9c0
Verifying Checksum ... OK
Uncompressing Kernel Image ... OK
No initrd
## Transferring control to Linux (at address 803fa9c0) ...
## Giving linux memsize in MB, 128
Starting kernel ...
followed by indefinite but emphatic silence, and various bouts of
fiddling with CONFIG_EARLY_PRINTK and stuff have not yet persuaded
it to loosen up. Currently I am running LEDE in a Docker container to
see what it does, and diffing its .config with mine. This has
shown up a couple of things that I've now added to my configuration, but I am
only going to get one shot at running it before I go home, because at
70 miles distant from the hardware I can't reach across and power
cycle it.
(Once I got warmed up, at least. I returned from my holiday to find
that the entire local network had stopped working because after
mucking around with U-boot on the device while I was away, I'd
inadvertently let the default openwrt installation on the MT300A start
, and it was running a DHCP server. Shouldn't have put it on the LAN,
I suppose. One round of reboots later ... but apart from that it has
been a productive week)
Let's start by spoiling the ending, so you don't have to read the rest of this
post: said MT300A now boots to user space and runs init (and monit). I hope
that all that remains is to get the Ethernet working and to build a flashable
image.
On our way to that destination, we ... basically, this is another
thrilling instalment of "don't trust the bootloader"
This board builds with device tree, but its u-boot has no way to
provide a device tree blob at start up, so what we have to do instead
is bodge the device tree into the kernel itself.
The actual mechanics of glomming a device tree binary blob onto the kernel image
are fairly straightforward if you have an openwrt build to crib from:
generate the ELF vmlinux as usual, convert it to a raw binary, compile the DTS
file (which involves preprocessing it with cpp because - I don't know why -
it contains two different kinds of include directive and the dtc tool only
understands one of them) and then use some magic patch-dtb tool to stick
them together.
I previously described device tree as "let's pretend that the hardware
description was provided to us by open firmware." and insofar as this
means the hardware is described by a data file instead of by the
effect of running imperative C code then it's definitely good. But
when that data file is provided by the kernel source [*] and attached
to the kernel image, instead of being passed in by the bootloader as
an input to the kernel, things get weird. For example, the default
config for the MT300A is that the kernel command line is coming from
DT - effectively this means that the kernel provides its own command
line. And you may ask yourself: where does that highway go to? It
would make sense if the command line had been provided by the user to
the bootloader which then merged it into the DT, but in this scenario
... not so much.
[*] splitting hairs here: the particular DTS we're using comes from
LEDE and not from the mainline kernel. But that doesn't invalidate
my point, which is that it doesn't come from the bootloader.
At this point, and skating over a minor digression where I had accidentally
built a kernel that thought it was little endian but using a big endian
compiler (tl;dr - didn't work), I had a kernel that booted most of the way to
mounting root but then failed.
The reasons it failed were approximately legion in number, or at least that's
how it felt. In the order I discovered them :
The kernel was configured to get its command line - see above - from DT not
the bootloader, so was ignoring all my phram options because the hardcoded
command line overrode them.
For reasons that no doubt made perfect sense to the people who
were there are the time, the u-boot build on this board doesn't
support bootargs anyway. So the kernel was ignoring all my phram
options because it wasn't even seeing
them
. Remember, kids - the U in "u-boot" stands for #undef . (I am
assuming the sources there correspond with the binary that shipped on
my device, to be quite honest I haven't checked this properly for myself).
another bit of hardcoded magic ... there is a kernel config option in
LEDE, again enabled by default, that looks through the MTD partition table and
if it finds a partition named rootfs, sets it up as the root filesystem.
This overrides any root= option given on the command line. So the openwrt
rootfs on the actual flash was being used instead of the custom root fs in
phram - and failing to work not least because it's a jffs filesystem and I'd set rootfstype=squashfs.
Whereas the Yun wanted the address of the filesystem image as an
offset from 0x80000000, the MT300A wants it offset from 0. I haven't
got the MIPS memory map entirely straight in my head but i think I
would be using the right words if I said it wants a kuseg memory
address not a kseg0 address. Or perhaps it wants a physical address
not a kseg0 address. (The same physical RAM is mapped in both places
but the cache behaviour is different). Whichever is the right
explanation I don't know, but when I fixed that, bingo, it boots!
I don't know if the Yun would work with kuseg addresses too, but next time I plug it
back in I'll try it.
Also this week, significantly cleaned up how we provide config
options to the nixwrt kernel
derivation
- now we read the appropriate defconfig file(s) into an attrset then
pass that attrset into overrideConfig. overrideConfig is a
function that accepts an attrset of default config options and is
expected to return a customised set. It's a lot prettier than the
rather awful mess of echo and grep -v it replaces, but it's not
100% perfect because it doesn't know the type of each config option
value - if you have a string value you need to quote that string
yourself (tip: use builtins.toJSON) but it will do for now.
Probably I should extract it into a function so that I could use the
same code to configure Busybox.
That's about it for this week.
With luck and a following wind I'll be at the NixOS 18.03 install
party
at Codenode - I hope to bring along some hardware - so if you're
interested and in London, sign up and come along.
If you'd like to get NixWRT updates more than once a week, I
encourage you to follow me on Mastodon