diary at Telent Netowrks

Mostly because I like to see my referrers#

Wed, 21 Sep 2016 21:56:44 +0000

I can think of no security justification for encrypting the pages served by this site: the data is public, viewing it is unlikely to get you arrested in most jurisdictions, it doesn't have any kind of forms or upload facilities or anything - but, I would like to see referers(sic) in my server logs. So now it's all HTTPS, using the rather fantastic Let's Encrypt service.

Let me know if you see anything broken as a result (i.e. anything broken that wasn't already broken) or can think of a principled reason by which I can justify the (admittedly rather paltry) time I spent on doing it.

Giving clojurescript the boot#

Mon, 19 Sep 2016 21:58:57 +0000

Recently I decided to unearth Sledge and fix enough of the bugs in it that I will actually want to use it day-to-day, and because changing only one thing at a time is too easy I thought I'd try updating all its dependencies and moving it from Leinginen to the new shiny Boot Clojure build tooling.

First impressions of Boot:

Point 1 is in my judgment so far a compelling reason to perservere through the pain of points 2 and 3.

I did a little gist already for making an uberjar with Boot and you should definitely pay attention to line 25. But today I want to talk about defining your own tasks, and as an example I'm going to add support for building Clojurescript.

The well-informed reader will know that there is already a Boot task to compile ClojureScript programs on Github. I chose not to use this, mostly because I was getting error messages I didn't understand and also partly because the Clojurescript Quick Start wiki page so strongly recommends understanding the fundamentals of compiling Clojurescript before plugging in a tool-based workflow.

So. Here are some things you may or may not already know about Boot if you've previously been using it by cargo-cult:

  1. You tell Boot what you want it to do by composing tasks into a build pipeline
  2. Each task defines a handler . A handler accepts a fileset, does something (e.g. compiles some files to create output files) and then calls the next handler with a new fileset which represents the passed fileset plus the result of whatever it did. It's a lot like Ring nested middleware. The first handler in the pipeline gets a fileset consisting only of the input files (your project source files), calls the second which calls the third etc, and once all the nested handlers have run then the returned fileset contains all the output files.
  3. From the end-user or even the task author's point of view, filesets are immutable. Handlers never directly change or create files in the project directory - instead they always create a new fileset, and Boot copies/moves files around in temporary directories behind your back to maintain the abstraction. As a task writer you don't have to care too much how this works: there are functions to map between fileset entries and the full pathname you should use to read the corresponding file; also to create a new temporary directory for output and then to add the files in that directory to a fileset.

And here are some things about the Clojurescript compiler which probably should be apparent from reading the Quick Start and the code it refers to:

  1. The compiler API lives in ns cljs.build.api and the important bits are inputs which creates a 'compilable object' from the directories/files you give it, and compile which does the compilation.
  2. Contrary to anything you might have thought by reading the doc string of inputs - it does not accept "a list", it accepts multiple arguments. If you have a list you will need to use apply here. I wasted a lot of time on this by not reading the code properly.

So, how do we marry the two up? Look upon my task ye mighty and despair ....

A task is a function that returns a middleware, which is a function that returns a handler. This is not super-obvious from the code on display here, because we're using a small piece of handy syntax sugar called with-pre-wrap which lets us provide the body of the handler and returns a suitable middleware.

What else? Not much else. This code lives in the sledge.boot-build namespace and gets required by build.boot . We have to override and/or augment what the user passes for output-to and output-dir to make sure it ends up somewhere it'l get added to the fileset instead of writing straight into the project working directory. And I haven't decided how to do a repl yet. I will probably add that (one way or another) before merging the das-boot branch into master.

The trouble with triples#

Sat, 26 Mar 2016 16:29:15 +0000

The other day I had occasion to write

(defn triples-to-map [triples]
  (reduce (fn [m row]
            (update-in m (butlast row)
                       (fn [old new] (if old (conj old new) [new]))
                       (last row)))
          {}
          triples))

and be surprised and delighted that it ran first time with the expected result. As witness:

foo.search=> (clojure.pprint/pprint triples_)
([:bnb:016691109 :published "2014"]
 [:bnb:016691109 :title "The Seven Streets of Liverpool"]
 [:bnb:016691109 :publisher "Orion"]
 [:bnb:016691109 :schema :shlv:Book]
 [:bnb:016691109 :author "Lee, Maureen"]
 [:bnb:016594932 :published "2013"]
 [:bnb:016594932 :title "Stephen Guy's forgotten Liverpool"]
 [:bnb:016594932 :publisher "Trinity Mirror"]
 [:bnb:016594932 :schema :shlv:Book]
 [:bnb:016594932 :author "Guy, Stephen"]
 [:bnb:016242841 :published "2012"]
 [:bnb:016242841
  :title
  "Robbed : my Liverpool life : the Rob Jones story"]
 [:bnb:016242841 :publisher "Kids Academy Publishing"]
 [:bnb:016242841 :schema :shlv:Book]
 [:bnb:016242841 :author "Jones, Rob, 1971-"]
 [:bnb:016744037 :published "2012"]
 [:bnb:016744037 :title "Steven Gerrard : my Liverpool story"]
 [:bnb:016744037 :publisher "Headline"]
 [:bnb:016744037 :schema :shlv:Book]
 [:bnb:016744037 :author "Gerrard, Steven, 1980-"])
foo.search=> (clojure.pprint/pprint (triples-to-map triples_))
{:bnb:016691109
 {:published ["2014"],
  :title ["The Seven Streets of Liverpool"],
  :publisher ["Orion"],
  :schema [:shlv:Book],
  :author ["Lee, Maureen"]},
 :bnb:016594932
 {:published ["2013"],
  :title ["Stephen Guy's forgotten Liverpool"],
  :publisher ["Trinity Mirror"],
  :schema [:shlv:Book],
  :author ["Guy, Stephen"]},
 :bnb:016242841
 {:published ["2012"],
  :title ["Robbed : my Liverpool life : the Rob Jones story"],
  :publisher ["Kids Academy Publishing"],
  :schema [:shlv:Book],
  :author ["Jones, Rob, 1971-"]},
 :bnb:016744037
 {:published ["2012"],
  :title ["Steven Gerrard : my Liverpool story"],
  :publisher ["Headline"],
  :schema [:shlv:Book],
  :author ["Gerrard, Steven, 1980-"]}}
nil

(Now I write that code down for the second time I wonder whether using update-in is slightly overkill when I know the map will only ever be two levels deep. But that's not something I'm interested in right now.)

What I'm interested in right now is that the input list for this function is itself the output of some other code which - mostly thanks to Instaparse - was unexpectedly easy to write. I've been playing around lately with RDF and the Semantic Web, and needed a way of parsing N-Triples - which looks superficially simple enough that Awk could do it, until you start thinking about comments and strings with spaces in them and escaped special characters and ...

Anyway, Instaparse steps in to save the day again. I believe I have written previously to give my opinion that Instaparse is awesome and I will go on record to say that this fresh experience merely serves to cement my first impression.

N-Triples has a published EBNF grammar . I had to monkey with this a bit to get it into Instaparse

Here's the final result

ntriplesDoc 	::= line*
line ::= WS* triple? EOL
triple 	::= 	subject WS* predicate WS* object WS* '.' WS*
subject 	::= 	IRIREF | BLANK_NODE_LABEL
predicate 	::= 	IRIREF
object 	::= 	IRIREF | BLANK_NODE_LABEL | literal
literal 	::= 	STRING_LITERAL_QUOTED ('^^' IRIREF | LANGTAG)?
LANGTAG 	::= 	'@' #"[a-zA-Z]"+ ('-' #"[a-zA-Z0-9]"+)*
EOL 	::= 	#"[\n\r]"+ 
WS 	::= 	#"[ \t]" | #"#.*"
IRIREF 	::= 	'<' IRI '>'
IRI ::= (#"[^\u0000-\u0020<>\"{}|^`\\]" | UCHAR)*
STRING_LITERAL_QUOTED 	::= 	'"' STRING_LITERAL  '"'
STRING_LITERAL ::= ( #"[^\u0022\u005C\u000A\u000D]" | ECHAR | UCHAR)*
BLANK_NODE_LABEL 	::= 	'_:' (PN_CHARS_U | #"[0-9]") ((PN_CHARS | '.')* PN_CHARS)?
UCHAR 	::= 	'\\u' HEX HEX HEX HEX | '\\U' HEX HEX HEX HEX HEX HEX HEX HEX
ECHAR ::= "\\" #"[tbnrf\"\'\\]"

HEX ::= #"[0-9A-Fa-f]"

PN_CHARS_BASE ::= #"[A-Z]" | #"[a-z]" | #"[\u00C0-\u00D6]" | #"[\u00D8-\u00F6]" | #"[\u00F8-\u02FF]" | #"[\u0370-\u037D]" | #"[\u037F-\u1FFF]" | #"[\u200C-\u200D]" | #"[\u2070-\u218F]" | #"[\u2C00-\u2FEF]" | #"[\u3001-\uD7FF]" | #"[\uF900-\uFDCF]" | #"[\uFDF0-\uFFFD]" | #"[\x{10000}-\x{EFFFF}]"

PN_CHARS_U ::= PN_CHARS_BASE | ":" | "_"

PN_CHARS ::= PN_CHARS_U | "-" | #"[0-9]" | "\u00B7" | #"[\u0300-\u036F]" | #"[\u203F-\u2040]"

Calling insta/parse with this grammar on a sample line gets you something looking like

[:ntriplesDoc
 [:line
  [:triple
   [:subject
    [:IRIREF "<" [:IRI  "h" "t" "t" "p" ":" "/" "/" "b" "n" "b" "."
                        "d" "a" "t" "a" "."  "b" "l" "."  "u" "k" "/" "i" "d"
                        "/" "r" "e" "s" "o" "u" "r" "c" "e" "/" "0" "1" "6" "7"
                        "0" "6" "8" "5" "5"] ">"]]
   [:WS " "]
   [:predicate
    [:IRIREF "<" [:IRI "h" "t" "t" "p" ":" "/" "/" "l" "o" "c" "a" "l"
                       "h" "o" "s" "t" ":" "3" "0" "3" "0" "/" "p" "u"
                       "b" "l" "i" "s" "h" "e" "d"] ">"]]
   [:WS " "] [:object [:literal [:STRING_LITERAL_QUOTED
      "\"" [:STRING_LITERAL "2" "0" "1" "4"] "\""]]] [:WS " "]
   "."]
  [:EOL "\n"]]]

which clearly is going to need some more attention before it's usable. We do this in two passes: first we visit the entire tree node-by-node to do things like turn literal node values into strings and IRI nodes into URI objects.

(defn visit-node [branch]
  (if (vector? branch)
    (case (first branch)
      :IRIREF
      (let [[_< [_iri_tok & letters] _>] (rest branch)
            iri (str/join letters)]
        (or (prefixize iri)
            (URI. iri)))
      :STRING_LITERAL (str/join (rest branch))
      :STRING_LITERAL_QUOTED (let [[_ string _] (rest branch)] string)
      :literal (second branch)
      :WS ""
      :UCHAR (let [[_ & hexs] (rest branch)]
               (String.
                (Character/toChars
                 (Integer/parseInt (str/join (map second hexs)) 16))))
      :triple (let [m (reduce (fn [m [k v]] (assoc m k v)) {}
                              (rest branch))]
                [:triple [(:subject m) (:predicate m) (:object m)]])
      branch)
    branch))

Then we transform the tree into a seq and filter the seq to get only the :triple nodes. Putting it all together:

(defn parse-n-triples [in-string]
  (->> in-string
       (insta/parse n-triple-parser)
       (walk/postwalk visit-node)
       (tree-seq #(and (vector? %)
                       (keyword? (first %))
                       (not (= (first %) :triple)))
                 #(rest %))
       (filter #(= (first %) :triple))
       (map second)))

I'm reasonably confident that the grammar is correct: I pushed all the official N-Triples Test Suite through it without error. My post-parsing massage passes, though, are possibly not correct and certainly not complete, which is one reason I'm just blogging about it instead of publishing it as a standalone library somewhere. Things I already know it doesn't do: blank node support, language tags, datatypes, escaped characters. Things I don't know it doesn't do: don't know. But it seems to work for my use case - of which, more later.

First steps in NixOS#

Mon, 15 Jun 2015 15:10:01 +0000

According to the mtime of /nix on this laptop I've been running NixOS since February, so I should be past "first steps" by now, really. But I decided last week to switch to the Nix package collection on my work Mac as well, and that has prompted me to learn how to package some of the stuff I use that isn't already available.

(tl;dr - It's here: https://github.com/telent/nix-local/ )

Item zero was to find a way of keeping local packages alongside the real nixpkgs collection without a permanently divergent fork of the nixpkgs repo. The approach I eventually decided on was to use packageOverrides to augment the upstream package list with my own packages in an entirely separate repo. See https://github.com/telent/nix-local/blob/master/README.md#installation for details

With that out of the way, the fist thing I needed to package is vault which is a quite neat program for generating secure passwords given a master secret and a service name - i.e. you can have per-service passwords for each site you use without having to store the passwords anywhere.

It's Javascript/NPM. NPM is a bad fit for Nix because as explained by Sander van der Burg it does dependency management as well as building, and its model of dependencies (semver, in theory) is considerably more lax than the Nix model. So we use npm2nix to produce nix expressions for all its dependencies from `package.json`

$ git clone git@github.com:jcoglan/vault
$ git co 0.3.0
$ `nix-build '<nixpkgs>' -A npm2nix`/bin/npm2nix package.json node-packages.generated.nix

then we copy the generated files into our nix-local repo.

$ mkdir -p ~/nix-local/vault/
$ cp node-packages.generated.nix default.nix ~/nix-local/vault/

The generated default.nix then needed significant manual editing:

deps = (filter (v: nixType v == "derivation") (attrValues nodePackages))

Finally the package can be installed with nix-env -iA nixpkgs.vault or nix-env -i nodejs-vault. I don't know which of these is stylistically preferable, but in this case they both have exactly the same effect. As far as I know.

Regaining my compojure#

Thu, 19 Feb 2015 11:16:22 +0000

Picking up $secret_project which I put down in November to do Sledge and Yablog, I find that the routing of http requests is a horrendous mess based on substring matching and ad hoc argument parsing, and would really benefit from some Compojure

(Now, that is, that I've actually got my head around how Compojure works)

Because I'm using Hiccup to compose pages out of parts, I thought it would be neat if I could return hiccup page bodies directly as responses so that they get uniformly topped and tailed and turned into response maps. Turns out to be fairly straightforward:

  1. define a type for the hiccup response
  2. extend the compojure.response/Renderable protocol to deal with the new type

(deftype Hiccupage [body])
 
(defn page-surround [page]
  (let [head [:head
              [:link {:rel "stylesheet" :type "text/css"
                      :href "/static/default.css"}]
              [:script {:src "/static/stuff.js"}]
              [:title "Hey you"]]]
    (into head (.body page))))
 
(extend-protocol compojure.response/Renderable
  Hiccupage
  (render [page req]
    {:status  200
     :headers {"Content-Type" "text/html; charset=utf-8"}
     :body   (hiccup.page/html5 (page-surround page))}))

Now we can return Hiccupage (hiccup page? geddit? I'm still looking for a better name, yes) objects directly from our routes

(defroutes app
    (GET "/hello/:name" [name] 
	 (Hiccupage.
	  [[:h1 "hello"]
	   [:p "hello " name]
	   [:p "This is text. I like text"]]))
  ...
  )