diary at Telent Netowrks

Not much spam - now even less#

Wed, 26 Aug 2020 23:28:47 +0000

After a long period of not getting around to it, finally I have added spam/ham training support to my self-hosted mail config. Key words: emacs, notmuch, muchsync, rspamd, nixos

The plan was: I want to be able to add a tag while reading mail on my laptop that would indicate we need to retrain using it, then after syncing back to the server I could run some notmuch search and feed the results through rspamc to rspamd.

First, something on my laptop to mark emails that need retraining. This is emacs, "obviously". (Seriously, it's unlikely to be obvious to most people but if it's not obvious to you personally, dear reader, you won't get much value from this blog post)

(define-key notmuch-search-mode-map (kbd "#")
  (lambda ()
    (interactive nil)
    (let* ((was-spam-p (member "spam" (notmuch-search-get-tags (point))))
	   (tag-changes (list "+retrain" 
			      (if was-spam-p "-spam" "+spam"))))
      (notmuch-search-tag tag-changes (point) (point)))))

Syncing the tags between laptop and server is ably handled already by muchsync, so no change needed there.

To run rspamc on the server I first had to enable the rspam "controller" worker: adding

services.rspamd.workers.controller = {
  enable = true;

was necessary but not sufficient: running rspamc learn-spam would give an error message HTTP error : 500, Unknown statistics error. Apparently it defaults to requiring a redis backend for something, so this is not wholly surprising as I wasn't running redis.

services.redis = {
   enable = true;
   bind = "";

and then we need to tell rspamd where to find redis:

services.rspamd.locals."redis.conf".text = "servers = \"\"";

and then all that remains is to figure out the correct commandline (still on the server) to query the wrongly classified emails and send them to rspamd:

notmuch search --output=files --format=text0 --exclude=false is:retrain and is:spam | \
  xargs -0  rspamc learn_spam -c bayes
notmuch search --output=files --format=text0 --exclude=false is:retrain and not is:spam | \
  xargs -0  rspamc learn_ham -c bayes
notmuch tag -retrain tag:retrain

I am writing this as a blog post in the vague hope that it will be useful to someone, but also as a prompt to myself so that at some point I will nixify the bits of this that aren't already. I have to say, it's overall been a whole lot less frustrating than my efforts trying to manage my firefox config (yesterday evening and ongoing).