What's up, D?#

Fri, 30 Apr 2021 17:27:22 +0000

One of the nice things about side projects (as opposed to the ones I get paid for) is not having to be accountable to anybody for timely delivery, and the freedom to head off on tangents or side investigations. I have recently picked up NixWRT again, but I regret to inform you that "off piste" is where I am right now.

Backstory: monit is not a great service monitor for my needs, mostly because it isn't the parent process of the service daemons it starts, so can't tell when they exit without polling/pid files etc - which is slow and unreliable. So I decided that the world does not have enough init replacements already, and because of the risk that anyone else might otherwise find it useful I decided to write it in Fennel. Then a little while later, I decided to do something else for about five months, and now I have no mental context of what I was doing.

So why not start again? This time it's called upd and although it shares some ideas with swarm, I am attempting to use tests to drive more of its design.

I started by sketching out the shape of a plausible service monitor for a pppoe daemon. You can see that the state logic is complicated and it has many collaborators, so I install mock versions of them by setting entries in the package.loaded table which require checks before loading a file. The most complicated bit is probably the fake event loop: the test setup provides an array my-events of functions which may update state, replace mocks etc, perform assertions etc, and and the mock event loop runs them one by one. This lets me write my first test, to see that the daemon is started when the interface is present. A followup refactor replaces the longwinded calls to tset package.loaded with more intention-revealing names mock and mocks.

The second test checks that we observe an exponential backoff when the process fails to start. We don't have any visibility into the actual backoff state in the system under test, so have to observe from the outside. First we stop the process from an event function, and then poll the process state repeatedly while incrementing a counter until it restarts. 0e966459

This is convoluted and might suggest that we should make that state visible. But it also prompted me to try making backoff state be a property of the process monitor itself, not the overall system under test - suppose we want to write a script that watches two or more processes? The next few commits

move the backoff state variables into the process
make the process own backoff state mutation by adding a backoff method
move backoff testing into the process with a backoff-expired? method

Of course, making the backoff state a property of the process has the side effect of making it visible. The next test is that the process is stopped if the underlying link is lost, and it's actually quite handy to be able to reset the backoff interval in the test setup: 0350c8aed6a

Something was still bothering me here, though. In order to move the backoff testing from the script to the process, I'd done

-      (when (and (not (process.running? pppd))
-                 (nil? pppd.backoff-until))
+      (when (not (process.running? pppd))
         (pppd:backoff))

This is logically correct, as pppd:backoff does nothing anyway if pppd.backoff-until is nil. But it's really quite non-obvious. What we actually want to say here is "if the process died, back off" - so let's say it! Add a process.died? method and rewrite the test to use it.

The last interesting change in this sequence was triggered by looking at the event loop, when I realised that we don't anywhere use the value of the event that comes back from next-event - we wait for something to happen, but then we discern what happened by testing various bits of state. To say the same thing again in fewer words: we're don't receive events, we wait for state changes. So let's change the code to use better names

What's the conclusion? Is there a conclusion? A few thoughts -

for the TDD purists, it wasn't "real" TDD because I wrote quite a lot of implementation code first instead of writing the test first.
by listening to the pain of testing I was able to improve the micro-level design in a few aspects: I am happier with the result than with the initial code
by having the tests I was able to make those changes while being reasonably confident that I hadn't broken anything
the next thing on my list (which is in progress already) is writing the service monitor for a hardware ethernet device. When it's in slightly better shape I will write another post full of netlink and qemu trivia, but until then, goodbye.