Illogical Volume Management
Sun, 16 Jul 2023 12:29:18 +0000
I bought a new SSD for my primary desktop system, because the spinning rust storage I originally built it with is not keeping up with the all the new demands I'm making of it lately: when I sit down in front of it in the morning and wave the mouse around, I have to sit listening to rattly disk sounds for tens of seconds while it pages my desktop session back in. For reasons I can no longer remember, the primary system partition
/dev/sda3 was originally set up as a LVM PV/VG/VL with a bcache layered on top of that. I took the small SSD out to put the big SSD in, so this seemed like a good time to straighten that all out.
Happily, a bcache backing device is still readable even after the cache device has been removed.
echo eb99feda-fac7-43dc-b89d-18765e9febb6 > /sys/block/bcache0/bcache/detach
where the value of the uuid
eb99...ebb6 is determined by looking in
/sys/fs/bcache/ (h/t DanielSmedegaardBuus on Stack Overflow )
It took either a couple of attempts or some elapsed time for this to work, but eventually resulted in
# cat /sys/block/bcache0/bcache/state no cache
so I was able to boot the computer from the old HDD without the old SSD present
Where is my mind?
At this time I did a fresh barebones NixOS 23.05 install onto the new SSD from an ISO image on a USB stick. Then I tried mounting the old disk to copy user files across, but it wouldn't. Even, for some reason, after I did
modprobe bcache. Maybe weird implicit module dependencies?
The internet says that you can mount a bcache backing device even without bcache kernel support, using a loop device with an offset:
If bcache is not available in the kernel, a filesystem on the backing device is still available at an 8KiB offset.
... but, that didn't work either? binwalk will save us:
$ nix-shell -p binwalk --run "sudo binwalk /dev/backing/nixos"|head DECIMAL HEXADECIMAL DESCRIPTION -------------------------------------------------------------------------------- 41943040 0x400000 Linux EXT filesystem, blocks count: 730466304, image size: 747997495296, rev 1.0, ext4 filesystem data, UUID=37659245-3dd8-4c60-8aec-cdbddcb4dcb4, volume name "nixos"
The offset is not 8K, it's 8K * 512. Don't ask me why, I only work here. So we can get to the data using
$ sudo mount /dev/backing/nixos /mnt -o loop,offset=4194304
and copy across the important stuff like
/home/dan/src and my
.emacs. But I'd rather like a more permanent solution as I want to carry on using the HDD for archival (it's perfectly fast enough for my music, TV shows, Linux ISOs etc) and
nixos-generate-config gets confused by loop devices with offsets.
If it were an ordinary partition I'd simply edit the partition table to add 8192 sectors to the start address of
sda3, but I don't see a straightforward way to do the analogous thing with a logical volume.
Courtesy of Andy Smith's helpful blog post (you should read it and not rely on my summary) and a large degree of luck, I was able to remove the LV completely and turn
sda3 back into a plain ext4 partition. We follow the steps in his blog post to find out how many sectors at the start of
sda3 are reserved for metadata (8192) and how big each extent is (8192 sectors again, or 4MiB). Then when I looked at the mappings:
sudo pvdisplay --maps /dev/sda3 --- Physical volume --- PV Name /dev/sda3 VG Name backing PV Size 2.72 TiB / not usable 7.44 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 713347 Free PE 0 Allocated PE 713347 PV UUID 7ec302-b413-8611-ea89-ed1c-1b0d-9c392d --- Physical Segments --- Physical extent 0 to 713344: Logical volume /dev/backing/nixos Logical extents 2 to 713346 Physical extent 713345 to 713346: Logical volume /dev/backing/nixos Logical extents 0 to 1
It's very nearly a continuous run, except that the first two 4MiB chunks are at the end. But ... we know there's a 4MiB offset from the start of the LV to the ext4 filesystem (because of bcache). Do the numbers match up? Yes!
Physical extent 713345 to 713346 are the first two 4MiB chunks of
/dev/backing/nixos. 0-4MiB is bcache junk, 4-8MiB is the beginning of the ext4 filesystem, all we need to do is copy that chunk into the gap at the start of sda3 which was reserved for PV metadata:
# check we've done the calculation correctly # (extent 713346 + 4MiB for PV metadata) $ sudo dd if=/dev/sda3 bs=4M skip=713347 count=1 | file - /dev/stdin: Linux rev 1.0 ext4 filesystem data, UUID=37659245-3dd8-4c60-8aec-cdbddcb4e3c8, volume name "nixos" (extents) (64bit) (large files) (huge files) # save the data $ sudo dd if=/dev/sda3 bs=4M skip=713347 count=1 of=ext4-header # backup the start of the disk, in case we got it wrong $ sudo dd if=/dev/sda3 bs=4M count=4 of=sda3-head # deep breath, in through nose # exhale # at your own risk, don't try this at home, etc etc $ sudo dd bs=4M count=1 conv=nocreat,notrunc,fsync if=ext4-header of=/dev/sda3 $
It remains only to
fsck /dev/sda3, just in case, and then it can be mounted somewhere useful.
With hindsight, the maths is too neat to be a coincidence, so I think I must have used some kind of "make-your-file-system-into-a-bcache-device tool" to set it all up in the first place. I have absolutely no recollection of doing any such thing, but Firefox does say I've visited that repo before ...