diary at Telent Netowrks

Functionally I'll iterate#

Fri, 18 Dec 2020 22:53:03 +0000

I'm going to attempt to explain Lua iterators, in the hope that by the time I've finished writing I'll understand them. So.

There's a construct in Lua called for which is used for iterating over a collection or other sequencey-type thing. You might write e.g.

> for k, v in pairs({a=2, b=5, c=9}) do print(k,v) end
a	2
b	5
c	9

or

> b=io.open("/etc/hosts", "r"); for l in b:lines() do print(l) end
127.0.0.1 localhost
::1 localhost
127.0.0.2 noetbook
::1 noetbook

Per the Lua manual, the syntax for the for is as follows:

    for <var-list> in <exp-list> do
      <body>
    end

where <exp-list> is (between one and) three values - or something that produces three values when evaluated. The first value is an "iterator function" f, the second is "invariant state" state and the third is the "control variable" a.

The interpreter then runs the loop: repeatedly, it

until a becomes nil which signals the end of the iteration.

The builtin ipairs iterator, used for traversing "array" tables - tables in which the indices are consecutive integers - provides an iterator function which when provided with a table and an index returns the next indes, and the value at that index

> gen,_,_ = pairs({5,6,7,8})
> =gen
function: 0x41c600
> gen({5,6,7,8}, nil)  -- what's the first element?
1	5
> gen({5,6,7,8}, 3)    -- what's after the third element?
4	8

The standard pairs iterator, used for traversing tables with arbitrary keys, uses the builtin next function as an iterator function. next works similarly to the iterator function in the previous example: given a key and a table, it returns the next key (for some value of "next" we aren't interested in the details of) and the value at that key

> next({a=2, b=4}, "a")
b	4

These both depend on the for construct's behaviour of using the first return value from each call to the iterator function as the second argument to it on the subsequent call. Which is fine if we want that value, but quite often we don't, so we end up doing this:

for _, v in ipairs(an_array) do
  print(v)
end

If we want to write an iterator values_of that returns only the value not the index, so could be used something more like this:

for v in values_of(an_array) do
  print(v)
end

then the iterator function returned by values_of would be called each time only with the value v, which would not be sufficient for it to work out how far through the array it had got last time. Instead we have to make our iterator close over the state it needs:

function valuesof(anarray)
  local index = 0
  return function()
    index = index+1
    return an_array[index]
  end
end

> for v in values_of({7,3,5}) do print(v) end 7 3 5

I note that the io.lines function has similar behaviour to values_of in that it returns only the next data value and not also the index of that value. Assuming it uses the standard C library functions for file input, I am guessing that it does this by reading from the current file position. So it also has internal state, but that state is implicit in the depths of stdio

Yes, but why?

Why did I start looking at this? Reasonable question. Because I wanted to write an idiomatic find function in Fennel and thought it would be a lot neater to have it work on any iterator, not just on tables. So I I could say something like

>> (find (fn [_ v] (= (% v 2) 0)) (pairs [ 1 2 3 4 5]))

or

>> (find #(even? $1) (pairs [ 1 2 3 4 5]))

but it does look messy having to accept and ignore that first parameter, especially if I want to pass bog standard predicate functions. I want to write something more like

>> (find even? (vals [ 1 2 3 4 5]))

and indeed this is possible, but only if I write the vals iterator to close over its state instead of being able to use the parameters that Lua passes into it when it's called.

See also: