[ meta: I wanted to call this one "sudo make me a LAN switch" but now the
MRAs have ruined that phrase for everyone ]
I got networking working on the GL-MT300A. This entailed:
Patching the device tree definition.
This is slightly cargo-culted (I Read It On The Internet), but by
removing the pinctrl-0 entry from the ðernet stanza I managed
to change the bootup mesages from saying
[ 1.873672] rt2880-pinmux pinctrl: could not request pin 40 (io40) from group
ephy on device rt2880-pinmux
[ 1.883620] mtk_soc_eth 10100000.ethernet: Error applying setting, reverse things back
[ 1.891724] mtk_soc_eth: probe of 10100000.ethernet failed with error -22
to saying
<6>[ 2.586201] mtk_soc_eth 10100000.ethernet eth0 (uninitialized): port 1 link up (100Mbps/Full duplex)
<6>[ 2.595753] mtk_soc_eth 10100000.ethernet: loaded mt7620 driver
<6>[ 2.602600] mtk_soc_eth 10100000.ethernet eth0: mediatek frame engine at 0xb0100000, irq 5
which felt a lot like progress but did not result in actual connectivity.
Building swconfig
After trying the obvious culprits (firewall rules? weird routing?) for
why my board was not seeing the network - indeed, not even able to
ping its own IP address - I studied the dmesg output a bit more
closely, and noticing the line
<6>[ 2.581845] gsw: setting port4 to ephy mode
I took a wild-ass guess that given I knew the the device contains
some kind of network switch, maybe the switch doesn't come up in a
useful state. So we needed a tool of some kind to reconfigure it and
apparently the appropriate tool is
swconfig
(If you followed that link and struggled to understand what it was
talking about, be assured that it means nothing to me either. Ure not
alone)
Building swconfig is easier when you start with a fork for
Debian instead of the original
OpenWRT package: I simply made my kernel derivation install the header
files (first time I have written a multi-output derivation, but turned
out that in this case it was a one-line change), wrote a derivation with
libnl as a dependency, and
created an 80MB filesystem image.
PRO TIP: don't override phases in a Nixpkgs derivation unless you
understand what is done by all the phases you didn't include. In
this case, not running the fixup phase meant that
nothing ran the "shrink rpath" magic which removes unneeded
compile-time dependencies from the runtime dependency list. Image
size go boom.
Running swconfig
I spent some time trying to figure out what I was doing with vlans and port configuration and worrying that when it said link: ?unknown-type? against each port that would mean I had to tell it somehow what kind of link type to use. Then eventually I hit on
swconfig dev switch0 set enable_vlan 0
swconfig dev switch0 set apply
and as if by magic (= "insufficiently advanced understanding of
technology") it all started working. When I eventually want it to
function as a switch, obviously I will need to revisit this. But that
is Milestone 1
and this is Milestone 0 - sufficient unto the day etc etc.
See subject. Most of the work this last week has been moving things
around in the hope of making it possible to support more than one
device, and then merging the gl-mt300a branch into master (nitpick:
for reasons I can't remember and are unlikely to be convincing, the
primary development branch is actually called nixwrt not master.
Probably something to do with git filter-branch)
This breaks the Yun code that was previously there, because "possible"
and "actually implemented" are two different things. But I am not
using it, I am pretty certain nobody else is either, and at least now
I can see how to fix it.
I have added a targetBoard argument which will soon allow choice of
mt300a, yun or malta (that last one for qemu), so the build command is
presently:
I have added a `swconfig` invocation to the monit config so that
networking comes up. I've done it in a totally hacky way until I
decide how to represent the switch in configuration, but that might
involve learning how the damn thing works.
That's about all for now, save to say that yesterday I went to the
NixOS London
meetup which in
the event did not involve an install party (everyone present had
installed it already) but did involve some interesting conversations,
learning something about overlays and giving a quick demo of NixWRT so
far. Albeit that I was demonstrating by ssh back to my hardware at
home : no actual hardware at the venue that could be waved around or
kicked. I had intended to do something a bit more visual but thought
better of it when I realised how much of my home kit I'd have to
unplug. Next time, definitely.
This week I successfully flashed NixWRT to my GL-MT300A, such that it
runs whenever I turn the device on. And then I found it slowly fills
up all RAM and the process table over about half a day then stops
working.
But we'll get to that in a minute. Let's talk about the good bits
first.
Graven image
Given we have a kernel and a root filesystem, how exactly should we
mash them up into a flashable image such that (i) uboot will find and
run the kernel, (ii) the kernel will find its filesystem? Let's look
at the dmesg output from booting OpenWRT -
[ 0.620000] m25p80 spi32766.0: w25q128 (16384 Kbytes)
[ 0.630000] 5 ofpart partitions found on MTD device spi32766.0
[ 0.630000] Creating 5 MTD partitions on "spi32766.0":
[ 0.640000] 0x000000000000-0x000000030000 : "u-boot"
[ 0.650000] 0x000000030000-0x000000040000 : "u-boot-env"
[ 0.650000] 0x000000040000-0x000000050000 : "factory"
[ 0.660000] 0x000000050000-0x000000fd0000 : "firmware"
[ 0.780000] 2 uimage-fw partitions found on MTD device firmware
[ 0.790000] 0x000000050000-0x000000174720 : "kernel"
[ 0.790000] 0x000000174720-0x000000fd0000 : "rootfs"
[ 0.800000] mtd: device 5 (rootfs) set to be root filesystem
[ 0.810000] 1 squashfs-split partitions found on MTD device rootfs
[ 0.810000] 0x000000890000-0x000000fd0000 : "rootfs_data"
[ 0.820000] 0x000000ff0000-0x000001000000 : "art"
It has five "ofpart" partitions. I'm guessing the "of" stands for
"open firmware" and indeed if we look at the DTS file (remember if you
will from blog entries passim that the device tree is a representation
of "what you'd get from open firmware if the hardware had open
firmware") we can see five partitions defined there.
It has two "uimage-fw" partitions which further subdivide the
firmware partition. Unlike the ofpart partitions these are not
actually defined anywhere: they are the result of kernel code (in
drivers/mtd/mtdsplit/mtdsplit{,_uimage}.c) which looks for
partitions which start with a uimage, parses the image length, and
then looks for a filesystem signature of some kind on the next erase
block boundary. (This is very convenient magic not least because it
means we don't have to update some partition table each time our
kernel size changes, but it still makes me uneasy; I have a very low
threshold for magic)
Hypothesis (subsequently proven): if our firmware file consists of a
kernel wrapped in a uimage, plus padding to the next erase block
boundary, plus the filesystem image, Linux will report an MTD
uimage-fw partition that starts where the filesystem starts. We can
copy this combined image into flash at offset 0x000000050000 to
overwrite the existing kernel/root fs while leaving the rest of the
flash undisturbed.
(To get the erase block size, we could (a) guess it's probably 128k, or
(b) if we were more prudent, check in in /proc/mtd )
Short version: that half of the puzzle is solved by creative use of dd.
Flash override
So the general principle for flashing from U-Boot once you have a
suitable image, is (1) download it into RAM somewhere, (2) erase an
appropriate section of flash, (2.1) hope the power doesn't fail before
you finish step 3, (3) copy the image from RAM into flash, (4) reboot
and see whether you've bricked the device. Although unless you've
done something badly wrong (like overwrite uboot itself or the ART
partition) then it doesn't matter too much if the image you've
uploaded doesn't actually work because you can just go back to the
u-boot prompt and try again. There's an explanation of how to do
this on the Yun
which I had previously successfully followed, converting that to
another board is just a matter of working out whereabouts in RAM the
flash chip is mapped.
And the simplest way of doing this is to look at what uboot does by default
when the board is powered on. If we run printenv we see i.a.
bootcmd=bootm 0xbc050000
With 99.8% certainty, we think that this is a jump to offset 0x50000
(remember, this is the offset of the "firmware" partition) of a flash
chip that starts at 0xbc000000, and this is probably all we need. So:
on the build machine
and then offer up a silent prayer because 99.8% is still less than 100%.
Punch the air shortly thereafter :-)
Flash I love you but we only have 14 hours
As alluded to above, there is something weird going on in userland
that makes it presently less than useful: the ntpd and syslogd
processes (both actually BusyBox applets) don't write pid files when
they start up, causing monit to decide they have failed to start and
spawn another. One of each of them added every 30 seconds soon leads
to a poorly computer.
No idea why, yet. I'd run strace but it doesn't want to build (maybe
a MIPS thing, maybe a musl thing). Hopefully next week...
Completely random aside
And I mean completely random. Is it just me or does anyone else
find that the taillight cluster on the new Prius reminds them of Ming
the Merciless?
When I blogged last week
I left off at the point that syslogd and ntpd were not writing their
pid files into /run.
Having basically no hypothesis[*] as to why this might be, I looked into
building gdbserver and I looked into building strace and then I
decided to try good old printf debugging first. To do this I wanted
a slightly faster development cycle than having to build and tftp an
entire image for each change.
Shared folders
It is probably fair to say I had never heard of the Plan 9 Filesystem
Protocol (9P for short)
before I started messing around wth Nix, but it turns out not only is
it a Thing That Replaces NFS, but that it is a Thing which has client
support in the Linux kernel and server support in qemu. So now I
start qemu with some command rather like
where for the purpose of this blog post the interesting bit is the
line starting with virtfs. This exports the current working
directory such that inside the qemu VM I can mount it with
mount -t 9p -o trans=virtio,version=9p2000.L host0 /run/mnt
There is one weirdness that I've found, which is that trying to run a
binary from from inside /run/mnt fails with Not a socket errors. I
don't know why, but if I copy the same binary into /tmp it works fine. Don't ask me, because I don't know.
Anyway, this is all rather lovely because now I can build binaries -
such as, for completely random example, a busybox executable with
calls to fprintf(stderr, "here %s:%d\n", __FILE__, __LINE__) every
four lines - and try them instantly in a running emulator wthout
restarting anything. Doing this led me to the discovery that it's
trying to open /dev/null as part of daemonising itself, and that
that file is somehow mode 0660 instead of 0666. I took a guess (yes,
ladies and gentlemen, the "null hypothesis" [*]) that this might make
a difference, and turns out I was right. So, fix is to
add an mdev.conf entry
. It would be interesting to know why it makes a difference,
considering that both of the daemons involved run as root, but I haven't
really dug into it.
Change my switch up
In other news, after some experimentation with iproute2 and some
more reconfiguring the kernel, I believe I might just understand
how the builtin switch works and can do networking the Right
Way (i.e. treat the WAN and LAN ports separately). But more about
that in another thrilling installment.
[*] if you're reading this footnote for the first time, it's not just
a lame joke it's foreshadowing an even worse one. See you again in about five paragraphs.
Achieving anything new this week has been rather hampered by (1) my
decision to try out XMonad, and then (2) the kids all picked up some
kind of vomiting bug. I do not intend you to infer any connection.
(XMonad: dunno yet, haven't really tried using it for long enough.
The mouse pointer is impossibly small and I'm going to have to fix
that sooner or later, but I only need it for gui apps and all I've
really tried using thus far is emacs and rxvt)
But let's see if I can explain why I've been hung up on switches
lately. If you've ever wanted to know why your OpenWRT router has
network interfaces with names like eth0.1 (no, it's not a misguided
decision to do semantic versioning on them)
maybe this is for you.
First, you need to know about VLANs. A VLAN is a "virtual LAN": a way
to multiplex traffic for multiple independent LANs onto the same
cable. In the picture on your right, VLAN 5 connects ports 1 and 3 on
switch A with ports 2 and 4 on switch B, and VLAN 6 connects port 2 on
A with 3 on B.
The switches do this by "tagging" the packets (frames) according to
per-port rules. If alice sends to fred, her frames will be tagged
with VLAN ID 5 when they enter switch A, sent (with the tag intact) to
switch B, and then untagged again as they are sent out of port 3 to
fred. Long story short -
frames from end-user devices get tagged
frames to end-user devices get detagged
frames between VLAN-aware devices (usually, one switch and another)
have had tags applied already and are transmitted without change
This is super useful, I have no doubt, if you're running an enterprise
network and need to keep devices separated without having to run
multiple sets of cables to every desk. But how is it relevant to *WRT? Because the hardware you're running it on, despite any
impressions you might have had from its inclusion of two or five or
eight RJ45 sockets, quite likely only has one ethernet device that
Linux can see. This device is connected (internally, inside the SoC)
to a builtin switch, which is also connected to all the sockets you
can see. So if you want to address them separately - for example, you
want to connect your upstream connection to one of them without giving
it full access to your LAN and vice versa - you do it with VLAN
configuration.
Make the switch
Set the upstream port to tag VLAN 1 and the LAN ports to tag VLAN 2,
and the "CPU port" (the one connected to the eth0 device that Linux
sees) to allow VLANs 1 and 2 but not to tag/untag either - i.e. it
receives tagged packets and particapates in the VLAN as if it were
another switch.
We would do this with swconfig but we haven't yet because this turns
out to be the default configuration anyway, at least on the MT300A.
(I have no idea whether it's hardware or u-boot or the devicetree
config that makes it be this way - at some point I dare say I will
find out though)
Configure Linux
So we have a switch on
the SoC which is sending VLAN tagged frames to the ethernet interface
- we need to tell Linux to expect them. (You might say it needs
configuring to serve VLAN).
ip link add link eth0 name eth0.1 type vlan id 1
ip link add link eth0 name eth0.2 type vlan id 2
ip addr add 192.168.0.251/24 dev eth0.2
ip link set dev eth0 up
and now for most purposes you can treat eth0.1 and eth0.2 as
though you have two network interfaces.
I'd like to close by saying "Simples!" but when I googled that term I
learned of the existence of
contemporary mereology as a field of
philosophical study, and I am not sure anything can ever be simple
again.