Cracking over the papers#

Fri, 30 Aug 2019 11:19:35 +0000

One of the several billion things that needs sorting in our new house is a filing system for all the paper junk we get sent that we don't immediately need to deal with but probably shouldn't throw away: bills, PAYE coding notices, letters from the council, bank statements etc. After some thought and thanks to @antifuchs alerting me to the existence of the Paperless project I decided to go digital. Recommend.

Notes, in the order they occur to me

I bought a Fujitsu fi-5120C scanner on Ebay, because it was cheap (around £30 for the scanner, another £15 for the PSU it didn't come with), allegedly supported by SANE, supports duplex (i.e. it scans both sides of the page) and has an automatic document feed.

when I received it, I initially could not make it work in Linux - at least wth my motherboard. The Internet thinks this may be an incompatibility with USB 3, and indeed, after careful reading of my motherboard manual I found a USB 2.0 port and swapped it - now it's working fine.

paperless exists as a Nix package and NixOS module in nixos-unstable as of now. I upgraded my home server to -unstable because it seemed less complicated than trying to run modules from -unstable on a 19.03 base.

it logs to the systemd journal. If it fails to start and the message Origin 'localhost:8080' in CORS_ORIGIN_WHITELIST is missing scheme or netloc appears in the logs, this is because of something something django mutter mutter. There is a patch to paperless that purports to fix this, but the workaround is to change the configuration to use a proper URL instead of just a hostname/port

    services.paperless = {
      enable = true;
      extraConfig = {
        PAPERLESS_CORS_ALLOWED_HOSTS="http://localhost:8080";
      };
    };

to scan my docs into the system I run something like

sudo -u scanner scanimage --format png --batch=/var/spool/scans/$(date +%Y%m%d%H%M%S)Z_p%04d.png  --resolution 300 --source 'ADF Duplex'

where scanner is the username that paperless is running as and /var/spool/scans/ is where I arbitrarily decided the consumption directory should be. I scan to png not pdf because scanimage was silently failing to convert to pdf and instead leaving the files as pbm images. (1) pbm images are huge; (2) pbm images files with .pdf suffixes confuse the paperless web frontend and they confuse me too. I would like to automate this so it runs whenever I (or a family member) presses the "scan" button on the scanner itself, but haven't got that far yet. scanbd will probably do it but seems excessively featureful for my needs.

the paperless documentation mentions that you can run manage.py document_retagger to retag your documents after you change tagging rules. It doesn't mention (as far as I can see) that there is a similar document_correspondents parameter to assign your documents to correspondents when you update/add rules to assign those. See issue 347 for details.

to run manage.py with the right nix config, you will want to find the wrapper that sets up all the environment variable blah and run that instead. Do systemctl cat paperless-consumer.service and snag the pathname of the script it calls in its ExecStart entry.

[dan@loaclhost:~]$ systemctl cat paperless-consumer.service| grep 'ExecStart='
ExecStart=/nix/store/2jaqzp6yhqwb2p0vs93whkwj0r0jf509-paperless document_consumer
[dan@loaclhost:~]$ sudo -u scanner /nix/store/2jaqzp6yhqwb2p0vs93whkwj0r0jf509-paperless document_correspondents

the sheet feeder is ... wow. I make some token effort to smooth out the crumples and folds before adding pages to the feeder, but the five year old pile of invoices and council tax notices and letters from home insurance companies that comprise the figurative grist to my mill are never going to be as smooth and flat as a newly cracked ream of paper, and I am never going to be less than impressed by the way it just deals with them. What is this technology and why can't printer manufacturers use it?

⟪Jul 2019 Sep 2019⟫