diary at Telent Netowrks

gehen Sie bitte mit, hier ist Nix zu sehen#

Tue, 02 Jan 2018 12:08:23 +0000

[ Meta: I don't actually speak German. I hope the pun works, but I have no particular reason to suppose it should do. ]

Happy New Year, if you observe the Gregorian Calendar. This week in NixWRT was typified by lots of beating head on brick wall followed by an unexpected achievement: I have a working rootfs in qemu!

Look, isn't it cool?

[nix-shell:~/src/nixwrt]$ qemu-system-mipsel  -M malta -m 64 -nographic -kernel 
linux-*/vmlinux   -append 'root=/dev/sr0 console=ttyS0 init=/bin/sh' -blockdev d
river=file,node-name=squashed,read-only=on,filename=tftproot/rootfs.image -block
dev driver=raw,node-name=rootfs,file=squashed,read-only=on -device ide-cd,drive=
rootfs -nographic                                                               
Linux version 4.14.1 (dan@loaclhost) (gcc version 6.4.0 (GCC)) #2 SMP Tue Jan 2 
14:58:10 UTC 2018                                                               
[...]
BusyBox v1.27.2 () built-in shell (ash)

# LD_TRACE_LOADED_OBJECTS=1 /nix/store/*-rsync*/bin/rsync --version linux-vdso.so.1 (0x77cc8000) libpopt.so.0 => /nix/store/79ffdcjvk5bpbm1vgrxii935vhjbdg5p-popt-1.16-mi psel-unknown-linux-gnu/lib/libpopt.so.0 (0x77c70000) libc.so.6 => /nix/store/7njknf9mhcj7jd3l0axlq8ql0x7396pk-glibc-2.26-75-m ipsel-unknown-linux-gnu-mipsel-unknown-linux-gnu/lib/libc.so.6 (0x77ad4000) /nix/store/7njknf9mhcj7jd3l0axlq8ql0x7396pk-glibc-2.26-75-mipsel-unknown -linux-gnu-mipsel-unknown-linux-gnu/lib/ld.so.1 (0x77c98000)

Points of note here:

(-> head wall)

I spent a lot of time, with no actual result yet, on getting the Yun to tftp its kernel and rootfs and run them in-place without having to write anything to flash. Motivation here is: it's not my Yun, it belongs to my employer who will probably want it back next time we do a hackathon or something. So I don't want to brick the device accidentally, nor use all the flash erase cycles, and anyway it's probably slower than running from RAM.

This is my theory which almost works but for some reason not quite: we should be able to tftp the root fs into RAM then use the MTD "phram" driver to emulate an MTD device at that address, and the memmap option to hide that region of memory from the Linux system (so it doesn't overwrite it)

ar7240> setenv kernaddr 0x81000000
ar7240> setenv rootaddr 1178000
ar7240> setenv rootaddr_useg 0x$rootaddr
ar7240> setenv rootaddr_ks0 0x8$rootaddr
ar7240> setenv bootargs keep_bootcon console=ttyATH0,250000 panic=10 oops=panic init=/bin/sh
phram.phram=rootfs,$rootaddr_ks0,9Mi root=/dev/mtdblock0 memmap=10M\$$rootaddr_useg
ar7240> setenv bootn "tftp $kernaddr /tftp/kernel.image ; tftp $rootaddr_ks0 
/tftp/rootfs.image; bootm  $kernaddr"
ar7240> run bootn

Here's where it gets weird. With those options, it rus most of the way through boot then hangs after printing NET: Registered protocol family 17 (that's netlink, if you were wondering). If I misspell the console device name, though, it gets slightly further. wat?

NET: Registered protocol family 17
Warning: unable to open an initial console.
VFS: Mounted root (squashfs filesystem) readonly on device 31:0.
Freeing unused kernel memory: 208K
This architecture does not have kernel memory protection.
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000

So it's identified that there is a squashfs filesystem there, which is a positive sign, but it's not going to run init without a console.

Also falling into the "known unknowns" quadrant: you will note that we randomly set and unset the high bit on some of our addresses there: this is because the same physical RAM is mapped into more than one place in the MIPS address space and I sort of think I have a handle on how it works but not really.

[ Postemporaneous edit: the next thrilling installment in this series is now up at https://ww.telent.net/2018/1/7/baud_games ]

Baud games#

Sun, 07 Jan 2018 11:46:21 +0000

Epiphany (n): (1) January 6 observed as a church festival in commemoration of the coming of the Magi as the first manifestation of Christ to the Gentiles or in the Eastern Church in commemoration of the baptism of Christ; (2) a moment of sudden and great revelation or realization.

Milestone

This week in NixWRT was typified by lots of trying stuff that didn't work followed by an unexpected achievement: I have a shell running on the actual hardware!

When we left off last week , if you will recall, we had a kernel that booted most of the way to mounting the root filesystem and executing init but not quite, and for some odd reason it booted a little bit further if I lied to it about the console device. Since then:

This is simultaneously a victory and a complete PITA, because there's no way to change the baud rate in this feature-impoverished branch of u-boot , so every time I reboot I have to change speed back and forth to talk to the bootloader. It would be nice if we could get it to work at 250000 (perhaps the u-boot console code has some pointers), or find a way to make u-boot speak more slowly, and I will probably look at that at some point.

Other things to do

[ Postemporaneous edit: the next thrilling installment in this series is now up at https://ww.telent.net/2018/1/15/in_the_nix_of_time ]

In the Nix of time#

Mon, 15 Jan 2018 18:35:04 +0000

[ I'm not sure I can keep up these puns in the blog post titles much longer. That may be welcome news for my readers, of course. ]

I was expecting this blog post to be along the lines of "there is no progress to report since last week but I am writing anyway just to maintain the weekly schedule", but happily, last night I saw the board boot with an ethernet driver and was even able to ping it.

<6>libphy: ag71xx_mdio: probed
<6>ag71xx-mdio.1: Found an AR7240/AR9330 built-in switch
<6>eth0: Atheros AG71xx at 0xba000000, irq 5, mode:GMII
<6>ag71xx ag71xx.0: connected to PHY at ag71xx-mdio.1:04 [uid=004dd041, driver=]
<6>eth1: Atheros AG71xx at 0xb9000000, irq 4, mode:MII

Once I realised I should be using eth1 not eth0, at least.

The things I have learnt this week were almost entirely not about Nix: instead I was looking at the kernel, and the OpenWRT (actually LEDE-which-will-soon-be-OpenWRT-again) build process. Which was to some extent what I was originally trying to avoid by basing this whole thing on Nix, but there we are.

What's the problem?

The Linux kenel 4.14.1 has no support for the wired Ethernet device builtin to the AR933x SoC.

(I was actually quite surprised to find this out)

Do you have any plausible but unworkable suggestions to fix it?

Porting the driver from OpenWRT should be pretty simple. Just copy some files across and patch the Makefile, right?

I infer from your use of the words "simple" and "just" that it turns out to be a bit more complicated?

Damn, you know me too well.

I am you

True dat.

So?

OpenWRT is based on the upstream kernel (score one over Android, at least) but diverges quite significantly, to the extent that the kernel stuff in the LEDE source repo contains about 250 extra source files you have to copy into your kernel source tree, and 2500 patch files that need to be applied on top. And a lot of the patches depend on previous patches in the series, and basically the upshot is that the chance of cherry-picking only the changes you want is kind of ... remote. At least certainly not without at least downloading and applying the whole series, by which time you have the whole series anyway.

There's another, slighty more long-term, problem with this suggestion, too: a tonne of those files are basically copy-paste jobs of each other, which makes me hope (admittedly against my own immediate self-interest) that upstream would refuse to adopt the resulting patch.

You're going to expound on this at tedious length, aren't you?

I'll try to keep it brief. Grown-up computers like PCs and SPARCs usually have standards by which an operating system may discover what hardware is attached/plugged in - PCI bus enumeration or Open Firmware or something like that. This is good because it means the kernel doesn't have to hardcode all this stuff. Embedded systems, on the other hand ...

... don't?

Often don't, no. Please stop finishing my sentences. So, historically, for every board or product that runs the Linux MIPS kernel, there is a chunk of code that registers all the devices and memory regions and all that stuff which the drivers will need, and this all gets a bit repetitive when there are a zillion of the buggers and they're all approximately the same but have slightly different base addresses for their USB ports, or they have two ethernets instead of 4, or the LEDs and the WPS buttons are hooked up to different GPIO pins.

Madness!

Understandable in context, because what router manufacturer really cares that much that the same Linux kernel image will run across not only their entire product range but also the product ranges of seventeen of their competitors? But still, for our purposes a PITA.

So what can be done?

The Device Tree, or "why write code when you can write data?". First mooted back in 2009 and gradually (tending sometimes to grudgingly) accepted over the following nine years, the device tree for some particular board is essentially a serialisation of the data structures that Open Firmware would provide the OS when running on that board, in the hypothetical event that the board had Open Firmware. Upstream support for the ar71xx (a.k.a ath79) has rudimentary support for device tree, but no ethernet devices therein, and the old mach_* files have not yet been removed.

(Here's an example for the TL-MR3020 , a device almost-but-not-quite-identical to the Yun, which is too long to paste but definitely short enough that you should have a look at it)

So that's the Right Answer: add the ag71xx ethernet driver to the tree. Forward port it from 4.9 to 4.14, abstract somehow over the eleventy-billion-branch switch statements it's littered with so it works on multiple SoCs, decide what to do about the driver for the SoC's network switch that it relies on, and ponder whether to delete some mach_*.c files that clearly shouldn't be needed before deciding not to make that many needless enemies among the commercial users of this code.

Contrast, however, with the Pragmatic Answer: for the moment at least, until the circular tuit drought ends, why don't we switch to the OpenWRT kernel ? Which, as you can see from the printk output that started this entry, Already Just Works.

You said "just" again

Yeah. Sorry.

Finished?

Pretty much. Also this week I made the kernel image build process a teeny bit less hacky, and added some frivolous stuff like cat, ifconfig and mount to the root filesystem, but that was basically trivial. And I posted to nix-devel about it and several people were quite kind.

Next stop, some userland - including the thorny question of what shall we use for an init system - and maybe some forward porting to make it work on nixpkgs master.

[ 1 week on: the next installment is here ]

Userland? I monit, init#

Tue, 23 Jan 2018 23:58:57 +0000

[ ObTitleCommentary: sorry ]

This week really is not much more than the "there is no progress to report since last week but I am writing anyway just to maintain the weekly schedule" post I threatened last week

"The question is,", said Humpty Dumpty, "which is to be PID 1".

tl;dr I'm playing with the busybox init applet plus Monit run from an inittab entry. Haven't got very far yet: not had time.

Running the rootfs in ram

You may remember that a few weeks ago I was describing how I thought I ought to be able to use the MTD "phram" driver to download and use a root filesystem without having to write it into flash? For the record, it did work, it was just that the console bug made it impossible to test.

Well, it did work until I switched to the 4.9 kernel, in which the memmap= parameter has no effect, and even then it carried on apparently working for a while until I tried running a few more processes, and then I started getting filesystem corruption. Another one for the "obvious in hindsight" tally, when I checked the boot messages. So, I had to backport that code from 4.14

<6>Determined physical RAM map: 
<6> memory: 04000000 @ 00000000 (usable)
<6>User-defined physical RAM map:
<6> memory: 04000000 @ 00000000 (usable)
<6> memory: 00b00000 @ 01178000 (reserved)
<5>Kernel command line: console=ttyATH0 panic=10 oops=panic init=/bin/init phram.phram=rootfs,0x81178000,10Mi root=/dev/mtdblock0 memmap=11M$0x1178000 ath79-wdt.from_boot=n ath79-wdt.timeout=30 ethaddr=90:A2:DA:F9:07:5A machtype=AP121 mem=64M                                      

No time because ...

Because other things. One of them is that I'm moving my shell host from Debian to NixOS [*] and have been having fun with setting up email. Now running nixos-mailserver and fairly happy with it, but more on that subject forthcoming after I've refined my notmuch configuration a bit.

[*] Once upon a time it was running debian stable. Then while I wasn't looking it became oldstable. Then I did a dist-upgrade and that upgraded Puppet to a version that no longer understood my Puppet manifest and then I decided that it was time to start again

Little Nix#

Wed, 31 Jan 2018 00:50:01 +0000

In short, because There is Little To Relate and the Hour Is Late:

I don't have much to expound on here because nixwrt has taken a back seat to $dayjob and family this week, The only things I feel I should point out (because Pages I Have Googled are by and large not very good at pointing them out) is that if you ever get a message

mount: mounting tmpfs on /run failed: Invalid argument

it may well be because you didn't enable CONFIG_TMPFS when building the kernel, and if you get

tmpfs: No value for mount option 'defaults'

then ... actually I don't know what the correct way to deal with this one is, but the pragmatic response is to edit /etc/fstab and replace defaults with rw. Works For Me.

Hopefully next week some actual news.