diary at Telent Netowrks

Illogical Volume Management#

Sun, 16 Jul 2023 12:29:18 +0000

I bought a new SSD for my primary desktop system, because the spinning rust storage I originally built it with is not keeping up with the all the new demands I'm making of it lately: when I sit down in front of it in the morning and wave the mouse around, I have to sit listening to rattly disk sounds for tens of seconds while it pages my desktop session back in. For reasons I can no longer remember, the primary system partition /dev/sda3 was originally set up as a LVM PV/VG/VL with a bcache layered on top of that. I took the small SSD out to put the big SSD in, so this seemed like a good time to straighten that all out.

Removing bcache

Happily, a bcache backing device is still readable even after the cache device has been removed.

echo eb99feda-fac7-43dc-b89d-18765e9febb6 > /sys/block/bcache0/bcache/detach

where the value of the uuid eb99...ebb6 is determined by looking in /sys/fs/bcache/ (h/t DanielSmedegaardBuus on Stack Overflow )

It took either a couple of attempts or some elapsed time for this to work, but eventually resulted in

# cat /sys/block/bcache0/bcache/state
no cache

so I was able to boot the computer from the old HDD without the old SSD present

Where is my mind?

At this time I did a fresh barebones NixOS 23.05 install onto the new SSD from an ISO image on a USB stick. Then I tried mounting the old disk to copy user files across, but it wouldn't. Even, for some reason, after I did modprobe bcache. Maybe weird implicit module dependencies?

The internet says that you can mount a bcache backing device even without bcache kernel support, using a loop device with an offset:

If bcache is not available in the kernel, a filesystem on the backing device is still available at an 8KiB offset.

... but, that didn't work either? binwalk will save us:

$ nix-shell -p binwalk --run "sudo binwalk /dev/backing/nixos"|head

41943040      0x400000        Linux EXT filesystem, blocks count: 730466304, image size: 747997495296, rev 1.0, ext4 filesystem data, UUID=37659245-3dd8-4c60-8aec-cdbddcb4dcb4, volume name "nixos"

The offset is not 8K, it's 8K * 512. Don't ask me why, I only work here. So we can get to the data using

$ sudo mount /dev/backing/nixos /mnt -o loop,offset=4194304

and copy across the important stuff like /home/dan/src and my .emacs. But I'd rather like a more permanent solution as I want to carry on using the HDD for archival (it's perfectly fast enough for my music, TV shows, Linux ISOs etc) and nixos-generate-config gets confused by loop devices with offsets.

If it were an ordinary partition I'd simply edit the partition table to add 8192 sectors to the start address of sda3, but I don't see a straightforward way to do the analogous thing with a logical volume.


Courtesy of Andy Smith's helpful blog post (you should read it and not rely on my summary) and a large degree of luck, I was able to remove the LV completely and turn sda3 back into a plain ext4 partition. We follow the steps in his blog post to find out how many sectors at the start of sda3 are reserved for metadata (8192) and how big each extent is (8192 sectors again, or 4MiB). Then when I looked at the mappings:

sudo pvdisplay --maps /dev/sda3
  --- Physical volume ---
  PV Name               /dev/sda3
  VG Name               backing
  PV Size               2.72 TiB / not usable 7.44 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              713347
  Free PE               0
  Allocated PE          713347
  PV UUID               7ec302-b413-8611-ea89-ed1c-1b0d-9c392d

  --- Physical Segments ---
  Physical extent 0 to 713344:
    Logical volume	/dev/backing/nixos
    Logical extents	2 to 713346
  Physical extent 713345 to 713346:
    Logical volume	/dev/backing/nixos
    Logical extents	0 to 1

It's very nearly a continuous run, except that the first two 4MiB chunks are at the end. But ... we know there's a 4MiB offset from the start of the LV to the ext4 filesystem (because of bcache). Do the numbers match up? Yes!

Physical extent 713345 to 713346 are the first two 4MiB chunks of /dev/backing/nixos. 0-4MiB is bcache junk, 4-8MiB is the beginning of the ext4 filesystem, all we need to do is copy that chunk into the gap at the start of sda3 which was reserved for PV metadata:

# check we've done the calculation correctly
# (extent 713346 + 4MiB for PV metadata)
$ sudo dd if=/dev/sda3 bs=4M skip=713347  count=1 | file -
/dev/stdin: Linux rev 1.0 ext4 filesystem data, UUID=37659245-3dd8-4c60-8aec-cdbddcb4e3c8, volume name "nixos" (extents) (64bit) (large files) (huge files)
# save the data
$ sudo dd if=/dev/sda3 bs=4M skip=713347  count=1 of=ext4-header

# backup the start of the disk, in case we got it wrong
$ sudo dd if=/dev/sda3 bs=4M  count=4 of=sda3-head

# deep breath, in through nose
# exhale
# at your own risk, don't try this at home, etc etc
$ sudo dd bs=4M count=1 conv=nocreat,notrunc,fsync if=ext4-header of=/dev/sda3

It remains only to fsck /dev/sda3, just in case, and then it can be mounted somewhere useful.

With hindsight, the maths is too neat to be a coincidence, so I think I must have used some kind of "make-your-file-system-into-a-bcache-device tool" to set it all up in the first place. I have absolutely no recollection of doing any such thing, but Firefox does say I've visited that repo before ...

Turning the nftables#

Fri, 02 Jun 2023 23:16:32 +0000

In the course of Liminix hacking it has become apparent that I need to understand the new Linux packet filtering ("firewall") system known as nftables

The introductory documentation for nftables is a textbook example of pattern 1 in Julia Evans Patterns in confusing explanations document. I have, nevertheless, read enough of it that I now think I understand what is going on, and am ready to attempt the challenge of describing

nftables without comparing to ip{tables,chains,fw}

We start with a picture:

This picture shows the flow of a network packet through the Linux kernel. Incoming packets are received from the driver on the far left and flow up to the aplication layer at the top, or rightwards to the be transmitted through the driver on the right. Locally generated packets start at the top and flow right.

The round-cornered rectangles depict hooks, which are the places where we can use nftables to intercept the flow and handle packets specially. For example:

The picture is actually part of the docs and I think it should be on the first page.

Chains and rules

A chain (more specifically, a "base chain") is registered with one of the hooks in the diagram, meaning that all the packets seen at that point will be sent to the chain. There may be multiple chains registered to the same hook: they get run in priority order (numerically lowest to highest), and packets accepted by an earlier chain are passed to the next one.

Each chain contains rules. A rule has a match - some criteria to decide which packets it applies to - and an action which says what should be done when the match succeeds.

A chain has a policy (accept or drop) which says what happens if a packet gets to the end of the chain without matching any rules.

You can also create chains which aren't registered with hooks, but are called by other chains that are. These are termed "regular chains" (as distinct from "base chains"). A rule with a jump action will execute all the rules in the chain that's jumped to, then resume processing the calling chain. A rule with a goto action will execute the new chain's rules in place of the rest of the current chain, and then the packet will be accepted or dropped as per the policy of the base chain.

[ Open question: the doc claims that a regular chain may also have a policy, but doesn't describe how/whether the policy applies when processing reaches the end of the called chain. I think this omission may be because it is incorrect in the first claim: a very sketchy reading of the source code suggests that you can't specify policy when creating a chain unless you also specify the hook. Also, it hurts my brain to think about it. ]

Chain types

A chain has a type, which is one of filter, nat or route.


Chains are contained in tables, which also contain sets, maps, flowtables, and stateful objects. The things in a table must all be of the same family, which is one of

There's a handy summary in the docs describing which chains work with which families and which tables.

What next?

I hope that makes sense. I hope it's correct :-). I haven't explained anything about the syntax or CLI tools because there are perfectly good docs for that already which you now have the background to understand.

Now I'm going to read the script I cargo-culted when I wanted to see if Liminix packet forwarding was working, and replace/update it to perform as an adequate and actually useful firewall

Self-ghosting email#

Tue, 21 Mar 2023 22:13:55 +0000

[ Reminder: more regular updates on what I'm spending most of my time on lately are at https://www.liminix.org/ ]

I had occasion recently to set up some mailing lists and although the subject matter for those lists is Liminix-relevant, the route to their existence really isn't. So, some notes before I forget:

Anyway, that's where we are. I'm quite certain I've done something wrong, but I'm yet to discover what.

Sub-liminix messaging#

Wed, 15 Feb 2023 22:23:48 +0000

I am restarting/rewriting NixWRT,

he said, a few months ago. This is a short follow-up announcement to say that

I am very stoked about this. I'm aiming for ~ weekly updates in that place.

Crossing the threshold - Liminix#

Wed, 19 Oct 2022 21:20:32 +0000

I am restarting/rewriting NixWRT, which has seen no real development in, erm, about four years (my, how the time has flown) and is showing its age and showing my Nix inexperience.

līmen (genitive līminis) (neut.)

  1. threshold, doorstep, sill (bottom-most part of a doorway)
    1. lintel
    2. threshold, entrance, doorway, approach; door
    3. house, home, abode, dwelling
    4. beginning, commencement
    5. end, termination

Thus: Liminix, which stands at the threshold of your home network. According to the commit history I've been playing around with it for about a month now (so, since shortly after I broke the family internet for most of a morning while trying to upgrade OpenWrt ), so although it still doesn't actually do anything useful yet perhaps it's time to break cover.

The objectives are quite similar to the NixWRT objectives in that I want to have congruent configuration management on the "infrastructure" devices that make up my home network, and those devices are typically underpowered for running full-blown NixOS. I do though have a shopping list of things I want to do better/differently:

So far: we're using s6-rc for services, which seems to be quite nice and well-put together but I haven't tried too hard to hurt yet. We're using the NixOS module system infra for declaring configuration option types and merging logic. We have significantly more in the way of automated testing than NixWRT had - admittedly not a high bar - and an entirely unrealised/untested idea of how we might do secrets. And the "we" there is, yes, editorial

We don't yet have: writable filesystem (ubifs?); anything o11y; more than one hardware device. And it's not yet at the point that I can dogfood it. Although technically it boots and runs on my spare GL-AR750, I haven't ported wifi across yet.

The primary repo is at https://gti.telent.net/dan/liminix because the older I get the more stubborn I become about free "if you're not paying for it you're the the product" services, but there's a mirror on Github for everyone who's not me. Because federated Gitea is not yet an available thing, and I don't want to throw up all the barriers to contribution.