gehen Sie bitte mit, hier ist Nix zu sehen#
Tue, 02 Jan 2018 12:08:23 +0000
[ Meta: I don't actually speak German. I hope the pun works, but I
have no particular reason to suppose it should do. ]
Happy New Year, if you observe the Gregorian Calendar. This week in
NixWRT was typified by lots of beating head on brick wall followed by
an unexpected achievement: I have a working rootfs in qemu!
Look, isn't it cool?
[nix-shell:~/src/nixwrt]$ qemu-system-mipsel -M malta -m 64 -nographic -kernel
linux-*/vmlinux -append 'root=/dev/sr0 console=ttyS0 init=/bin/sh' -blockdev d
river=file,node-name=squashed,read-only=on,filename=tftproot/rootfs.image -block
dev driver=raw,node-name=rootfs,file=squashed,read-only=on -device ide-cd,drive=
rootfs -nographic
Linux version 4.14.1 (dan@loaclhost) (gcc version 6.4.0 (GCC)) #2 SMP Tue Jan 2
14:58:10 UTC 2018
[...]
BusyBox v1.27.2 () built-in shell (ash)# LD_TRACE_LOADED_OBJECTS=1 /nix/store/*-rsync*/bin/rsync --version
linux-vdso.so.1 (0x77cc8000)
libpopt.so.0 => /nix/store/79ffdcjvk5bpbm1vgrxii935vhjbdg5p-popt-1.16-mi
psel-unknown-linux-gnu/lib/libpopt.so.0 (0x77c70000)
libc.so.6 => /nix/store/7njknf9mhcj7jd3l0axlq8ql0x7396pk-glibc-2.26-75-m
ipsel-unknown-linux-gnu-mipsel-unknown-linux-gnu/lib/libc.so.6 (0x77ad4000)
/nix/store/7njknf9mhcj7jd3l0axlq8ql0x7396pk-glibc-2.26-75-mipsel-unknown
-linux-gnu-mipsel-unknown-linux-gnu/lib/ld.so.1 (0x77c98000)
Points of note here:
- I fixed the stupid-huge image size I talked about last week
by removing dontStrip in
the glibc
derivation
. Although I don't know if this is correct, it Seems To Work. My
hypothesis is that the derivation was previously running an x86 strip
on target (in this case, MIPS) binaries and trashing them, and so
dontStrip
was added to stop it doing that. Now it's using a strip
that understands MIPS, so it can be re-enabled. Is my guess.
-
mksquashfs
is super fussy about trailing slashes on filenames when
you use the -root-becomes
option to regraft directories. For best results, don't use any
(-> head wall)
I spent a lot of time, with no actual result yet, on getting the Yun
to tftp its kernel and rootfs and run them in-place without having to
write anything to flash. Motivation here is: it's not my Yun, it
belongs to my employer who will
probably want it back next time we do a hackathon or something. So I
don't want to brick the device accidentally, nor use all the flash
erase cycles, and anyway it's probably slower than running from RAM.
This is my theory which almost works but for some reason not quite: we
should be able to tftp the root fs into RAM then use the MTD "phram"
driver to emulate an MTD device at that address, and the memmap
option to hide that region of memory from the Linux system (so it
doesn't overwrite it)
ar7240> setenv kernaddr 0x81000000
ar7240> setenv rootaddr 1178000
ar7240> setenv rootaddr_useg 0x$rootaddr
ar7240> setenv rootaddr_ks0 0x8$rootaddr
ar7240> setenv bootargs keep_bootcon console=ttyATH0,250000 panic=10 oops=panic init=/bin/sh
phram.phram=rootfs,$rootaddr_ks0,9Mi root=/dev/mtdblock0 memmap=10M\$$rootaddr_useg
ar7240> setenv bootn "tftp $kernaddr /tftp/kernel.image ; tftp $rootaddr_ks0
/tftp/rootfs.image; bootm $kernaddr"
ar7240> run bootn
Here's where it gets weird. With those options, it rus most of the
way through boot then hangs after printing NET: Registered protocol
family 17
(that's netlink, if you were wondering). If I misspell the
console device name, though, it gets slightly further. wat?
NET: Registered protocol family 17
Warning: unable to open an initial console.
VFS: Mounted root (squashfs filesystem) readonly on device 31:0.
Freeing unused kernel memory: 208K
This architecture does not have kernel memory protection.
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000
So it's identified that there is a squashfs filesystem there, which is
a positive sign, but it's not going to run init without a console.
Also falling into the "known unknowns" quadrant: you will note that we
randomly set and unset the high bit on some of our addresses there:
this is because the same physical RAM is mapped into more than one
place in the MIPS address space
and I sort of think I have a handle on how it works but not really.
[ Postemporaneous edit: the next thrilling installment in this series is now up at https://ww.telent.net/2018/1/7/baud_games ]
Baud games#
Sun, 07 Jan 2018 11:46:21 +0000
Epiphany (n): (1) January 6 observed as a church festival in commemoration of the coming of the Magi as the first manifestation of Christ to the Gentiles or in the Eastern Church in commemoration of the baptism of Christ; (2) a moment of sudden and great revelation or realization.
Milestone
This week in NixWRT was typified by lots of trying stuff that didn't work followed by an unexpected achievement: I have a shell running on the actual hardware!
When we left off last week , if you will recall, we had a kernel that
booted most of the way to mounting the root filesystem and
executing init
but not quite, and for some odd reason it booted a
little bit further if I lied to it about the console device. Since then:
- becaause only about half the time its failure mode was "reboot" and the other half it was "hang requiring someone to physically push the reset button" I wrote a small and shonky kernel patch to allow enabling the watchdog timer at boot. This makes remote development much more pleasant. Or indeed, possible.
- I learnt quite a lot (though probaby not enough to be dangerous) about how the kernel boots . Specifically, that it attempts to open
/dev/console
on the root filesystem before it has even mounted the root filesystem you asked it to mount - and that apparently this should never fail . Turns out this is probably even true - although the logic in the Makefile is a bit weird and I haven't fully understood it, one of initramfs.o
or noinitramfs.o
sets up a ramfs containing (your specified cpio archive) or (a device node for /dev/console
) as root, and then you can open the console device before you have mounted the actual root filesystem. I don't know why initramfs is dependent on the initrd config (they are two quite different things), but I took that option out to be sure.
- None of this is at all new, of course, but it is newer than last
time I really dug into Linux booting which was back when LILO was
normal and rdev was still even a
thing.
- But it still wasn't actually working, so I decided to try downgrading to 4.4 on the basis that that's what the newest and spangliest version of OpenWRT^WLEDE uses. This time it got as far as displaying
ar933x-uart: ttyATH0 at MMIO 0x18020000 (irq = 11, base_baud = 1562500) is a ARy
- slightly earlier in the boot sequence - and then started printing gibberish. "Aha", says I, "that looks like wrong baud rate, maybe it only thinks it knows how to speak at 250000", and indeed, when I switched from console=250000
to console=115200
- and changed the Arduino baud rate to match - I had a shell prompt!
- 24 hours later it occured to me to wonder if the problem we started with is that 4.14 also can't talk at 250000 but is just differently broken, so I tried switching to 115200 there as well, and bingo - that worked too. tl;dr it all turned out to be very simple.
This is simultaneously a victory and a complete PITA, because there's
no way to change the baud rate in this feature-impoverished branch of
u-boot
, so every time I reboot I have to change speed back and forth to talk
to the bootloader. It would be nice if we could get it to work at
250000 (perhaps the u-boot console code has some pointers), or find a
way to make u-boot speak more slowly, and I will probably look at that
at some point.
Other things to do
- The next priority is to try porting the OpenWRT ag71xx
ethernet
driver
so it can speak to the network: whether I embark on the (rather
high-risk ) attempt to reinstall u-boot will depend mostly on how much
the ensuing baud rate twiddling annoys me.
- convert to musl or uclibc for smaller binaries
- flesh out the filesystem a bit so we can run anything useful
- try it on the actual target device and not the Yun (which as I keep saying, is not mine) - or if it turns out I've completely toasted that router, pick up a cheap GL-AR150 or something)
- forward port to nixpkgs master (after #30882 has landed)
[ Postemporaneous edit: the next thrilling installment in this series is now up at https://ww.telent.net/2018/1/15/in_the_nix_of_time ]
In the Nix of time#
Mon, 15 Jan 2018 18:35:04 +0000
[ I'm not sure I can keep up these puns in the blog post titles much
longer. That may be welcome news for my readers, of course. ]
I was expecting this blog post to be along the lines of "there is no
progress to report since last week but I am writing anyway just to
maintain the weekly schedule", but happily, last night I saw the board boot with an
ethernet driver and was even able to ping it.
<6>libphy: ag71xx_mdio: probed
<6>ag71xx-mdio.1: Found an AR7240/AR9330 built-in switch
<6>eth0: Atheros AG71xx at 0xba000000, irq 5, mode:GMII
<6>ag71xx ag71xx.0: connected to PHY at ag71xx-mdio.1:04 [uid=004dd041, driver=]
<6>eth1: Atheros AG71xx at 0xb9000000, irq 4, mode:MII
Once I realised I should be using eth1
not eth0
, at least.
The things I have learnt this week were almost entirely not about Nix:
instead I was looking at the kernel, and the OpenWRT (actually
LEDE-which-will-soon-be-OpenWRT-again) build process. Which was to
some extent what I was originally trying to avoid by basing this whole
thing on Nix, but there we are.
What's the problem?
The Linux kenel 4.14.1 has no support for the wired Ethernet device
builtin to the AR933x SoC.
(I was actually quite surprised to find this out)
Do you have any plausible but unworkable suggestions to fix it?
Porting the driver from OpenWRT should be pretty simple. Just copy
some files across and patch the Makefile, right?
I infer from your use of the words "simple" and "just" that it turns out to be a bit more complicated?
Damn, you know me too well.
I am you
True dat.
So?
OpenWRT is based on the upstream kernel (score one over Android, at
least) but diverges quite significantly, to the extent that the kernel
stuff in the LEDE source repo contains about 250
extra source files you have to copy into your kernel source tree, and
2500 patch files that need to be applied on top. And a lot of the patches
depend on previous patches in the series, and basically the upshot is
that the chance of cherry-picking only the changes you want is kind of
... remote. At least certainly not without at least downloading and
applying the whole series, by which time you have the whole series
anyway.
There's another, slighty more long-term, problem with this suggestion,
too: a tonne of those files are basically copy-paste jobs of each
other, which makes me hope (admittedly against my own immediate
self-interest) that upstream would refuse to adopt the resulting
patch.
You're going to expound on this at tedious length, aren't you?
I'll try to keep it brief. Grown-up computers like PCs and SPARCs
usually have standards by which an operating system may discover what
hardware is attached/plugged in - PCI bus enumeration or Open Firmware
or something like that. This is good because it means the kernel
doesn't have to hardcode all this stuff. Embedded systems, on the
other hand ...
... don't?
Often don't, no. Please stop finishing my sentences. So,
historically, for every board or product that runs the Linux MIPS
kernel, there is a chunk of code that registers all the devices and
memory regions and all that stuff which the drivers will need, and
this all gets a bit repetitive when there are a zillion of the
buggers
and they're all approximately the same but have slightly different
base addresses for their USB ports, or they have two ethernets instead
of 4, or the LEDs and the WPS buttons are hooked up to different GPIO
pins.
Madness!
Understandable in context, because what router manufacturer really cares that much that the same Linux kernel image will run across not only their entire product range but also the product ranges of seventeen of their competitors? But still, for our purposes a PITA.
So what can be done?
The Device Tree, or "why write code when you can write data?". First
mooted back in
2009
and gradually (tending sometimes to grudgingly) accepted over the
following nine years, the device tree for some particular board is
essentially a serialisation of the data structures that Open Firmware
would provide the OS when running on that board, in the hypothetical
event that the board had Open Firmware. Upstream support for the
ar71xx (a.k.a ath79) has rudimentary support for device tree, but no
ethernet devices therein, and the old mach_*
files have not yet been
removed.
(Here's an example for the TL-MR3020 , a device almost-but-not-quite-identical to the Yun, which is too long to paste but definitely short enough that you should have a look at it)
So that's the Right Answer: add the ag71xx ethernet driver to the
tree. Forward port it from 4.9 to 4.14, abstract somehow over the
eleventy-billion-branch switch
statements it's littered with so it
works on multiple SoCs, decide what to do about the driver for the
SoC's network switch that it relies on, and ponder whether to delete
some mach_*.c
files that clearly shouldn't be needed before deciding
not to make that many needless enemies among the commercial users of
this code.
Contrast, however, with the Pragmatic Answer: for the moment at least, until the circular tuit drought ends, why don't we switch to
the OpenWRT kernel
? Which, as you can see from the printk output that started this
entry, Already Just Works.
You said "just" again
Yeah. Sorry.
Finished?
Pretty much. Also this week I made the kernel image build process a
teeny bit less hacky, and added some frivolous stuff like cat
,
ifconfig
and mount
to the root filesystem, but that was basically trivial.
And I posted to nix-devel about it and several people were quite kind.
Next stop, some userland - including the thorny question of what shall
we use for an init system - and maybe some forward porting to make it
work on nixpkgs master.
[ 1 week on: the next installment is here ]
Userland? I monit, init#
Tue, 23 Jan 2018 23:58:57 +0000
[ ObTitleCommentary: sorry ]
This week really is not much more than the "there is no progress to report since last week but I am writing anyway just to maintain the weekly schedule" post I threatened last week
"The question is,", said Humpty Dumpty, "which is to be PID 1".
- DO WANT processes restarted when they die
- DO WANT the option for services to be restarted (or some other
remedial action taken) when they become unhealthy. For example, the
wireless driver in my GL-MT300A
periodically gets confused and I would like to automate restarting
it. Note that "services" here is used in a quite broad sense that
doesn't necessarily mean "processes"
- DO NOT WANT systemd (not because I'm especially anti-systemd, but it does a lot of things that desktops want and I don't know yet that I need all of them because this is not a desktop)
tl;dr I'm playing with the busybox init applet plus Monit run from an inittab
entry. Haven't got very far yet: not had time.
Running the rootfs in ram
You may remember that a few weeks ago
I was describing how I thought I ought to be able to use the MTD
"phram" driver to download and use a root filesystem without having to
write it into flash? For the record, it did work, it was just that
the console bug made it impossible to test.
Well, it did work until I switched to the 4.9 kernel, in which the
memmap=
parameter has no effect, and even then it carried on
apparently working for a while until I tried running a few more
processes, and then I started getting filesystem corruption. Another
one for the "obvious in hindsight" tally, when I checked the boot
messages. So, I had to backport that code from 4.14
<6>Determined physical RAM map:
<6> memory: 04000000 @ 00000000 (usable)
<6>User-defined physical RAM map:
<6> memory: 04000000 @ 00000000 (usable)
<6> memory: 00b00000 @ 01178000 (reserved)
<5>Kernel command line: console=ttyATH0 panic=10 oops=panic init=/bin/init phram.phram=rootfs,0x81178000,10Mi root=/dev/mtdblock0 memmap=11M$0x1178000 ath79-wdt.from_boot=n ath79-wdt.timeout=30 ethaddr=90:A2:DA:F9:07:5A machtype=AP121 mem=64M
No time because ...
Because other things. One of them is that I'm moving my shell host
from Debian to NixOS [*] and have been having fun with setting up email.
Now running
nixos-mailserver and
fairly happy with it, but more on that subject forthcoming after I've
refined my notmuch configuration a bit.
[*] Once upon a time it was running debian stable. Then while I wasn't
looking it became oldstable. Then I did a dist-upgrade and that
upgraded Puppet to a version that no longer understood my Puppet
manifest and then I
decided that it was time to start again
Little Nix#
Wed, 31 Jan 2018 00:50:01 +0000
In short, because There is Little To Relate and the Hour Is Late:
- Adopted a slightly more elegant way to specify files in /etc (the
interface looks a lot like NixOS
environment.etc
but the
implementation is creating an input file for squashfs -pf
option)
- Brought the README up to date
I don't have much to expound on here because nixwrt has taken a back
seat to $dayjob and family this week, The only things I feel I should
point out (because Pages I Have Googled are by and large not very good
at pointing them out) is that if you ever get a message
mount: mounting tmpfs on /run failed: Invalid argument
it may well be because you didn't enable CONFIG_TMPFS
when building the kernel, and if you get
tmpfs: No value for mount option 'defaults'
then ... actually I don't know what the correct way to deal with
this one is, but the pragmatic response is to edit /etc/fstab and
replace defaults
with rw
. Works For Me.
Hopefully next week some actual news.