Cracking over the papers#
Fri, 30 Aug 2019 11:19:35 +0000
One of the several billion things that needs sorting in our new house is a filing system for all the paper junk we get sent that we don't immediately need to deal with but probably shouldn't throw away: bills, PAYE coding notices, letters from the council, bank statements etc. After some thought and thanks to @antifuchs alerting me to the existence of the Paperless project I decided to go digital. Recommend.
Notes, in the order they occur to me
- I bought a Fujitsu fi-5120C scanner on Ebay, because it was cheap (around £30 for the scanner, another £15 for the PSU it didn't come with), allegedly supported by SANE, supports duplex (i.e. it scans both sides of the page) and has an automatic document feed.
- when I received it, I initially could not make it work in Linux - at least wth my motherboard. The Internet thinks this may be an incompatibility with USB 3, and indeed, after careful reading of my motherboard manual I found a USB 2.0 port and swapped it - now it's working fine.
- paperless exists as a Nix package and NixOS module in nixos-unstable as of now. I upgraded my home server to -unstable because it seemed less complicated than trying to run modules from -unstable on a 19.03 base.
- it logs to the systemd journal. If it fails to start and the message
Origin 'localhost:8080' in CORS_ORIGIN_WHITELIST is missing scheme or netloc
appears in the logs, this is because of something something django mutter mutter. There is a patch to paperless that purports to fix this, but the workaround is to change the configuration to use a proper URL instead of just a hostname/port
services.paperless = { enable = true; extraConfig = { PAPERLESS_CORS_ALLOWED_HOSTS="http://localhost:8080"; }; };
- to scan my docs into the system I run something like
sudo -u scanner scanimage --format png --batch=/var/spool/scans/$(date +%Y%m%d%H%M%S)Z_p%04d.png --resolution 300 --source 'ADF Duplex'
where scanner
is the username that paperless is running as and
/var/spool/scans/
is where I arbitrarily decided the consumption
directory should be. I scan to png not pdf because scanimage was
silently failing to convert to pdf and instead leaving the files as
pbm images. (1) pbm images are huge; (2) pbm images files with .pdf
suffixes confuse the paperless web frontend and they confuse me too.
I would like to automate this so it runs whenever I (or a
family member) presses the "scan" button on the scanner itself, but
haven't got that far yet. scanbd will probably do it but seems
excessively featureful for my needs.
- the paperless documentation mentions that you can run
manage.py document_retagger
to retag your documents after you change tagging rules. It doesn't mention (as far as I can see) that there is a similardocument_correspondents
parameter to assign your documents to correspondents when you update/add rules to assign those. See issue 347 for details.
- to run
manage.py
with the right nix config, you will want to find the wrapper that sets up all the environment variable blah and run that instead. Dosystemctl cat paperless-consumer.service
and snag the pathname of the script it calls in itsExecStart
entry.
[dan@loaclhost:~]$ systemctl cat paperless-consumer.service| grep 'ExecStart=' ExecStart=/nix/store/2jaqzp6yhqwb2p0vs93whkwj0r0jf509-paperless document_consumer [dan@loaclhost:~]$ sudo -u scanner /nix/store/2jaqzp6yhqwb2p0vs93whkwj0r0jf509-paperless document_correspondents
- the sheet feeder is ... wow. I make some token effort to smooth out the crumples and folds before adding pages to the feeder, but the five year old pile of invoices and council tax notices and letters from home insurance companies that comprise the figurative grist to my mill are never going to be as smooth and flat as a newly cracked ream of paper, and I am never going to be less than impressed by the way it just deals with them. What is this technology and why can't printer manufacturers use it?