diary @ telent

On a train, on the way back from Alan & Telsa's, after my#

Thu Nov 28 12:02:37 2002

Topics: lisp sbcl

On a train, on the way back from Alan & Telsa's, after my spur-of-the-moment decision to go to the SWLUG meeting last night. A more informal gathering than OxLUG, held in a café then moved to the pub next door when it became obvious that there were more people present than would fit. Fun. Back to railway station for what is apparently fairly normal confusion over the timetable: "where's this train going?" "the board says `terminates'" "ok, where's Terminates, then?" -- got on a train that showed some sign of wanting to head in the right direction, and after it pulled out, the train manager walked through with a notepad polling the passengers for where they wanted to go. OK, possibly just for the entirely prosaic reason that some of the stops en route are basically request stops, but I think that explanation is less entertaining.

Today, tour of Swansea ("here's the castle, here's the sheep shop, here's the market, here's the marina, this is the beach") and introduction to Welshcakes. These can be approximately described as half-height dense scones, dusted with sugar

A few words about the progress of threading for SBCL, after my vague hints on Monday:

So far ...

  1. CL has dynamically scoped variables. Not, thank goodness, by default, but available. The semantics we've adopted are that (1) a symbol's global value can be set and is visible in all threads, and (2) it can also be dynamically bound: the dynamic binding is visible in only the thread that it was bound in. Other threads running concurrently still see and may also change the global value, but the thread that's bound the variable won't see those changes.

  2. This requires thread-local storage of some kind for the symbol values. On x86, we point to this with a segment register (%gs) as we're already using all the real registers. Each thread has a vector of values, and each symbol gains a tls-index slot which has its index symbol-value.

  3. Spent much of the weekend fiddling with modify_ldt, and teaching the sbcl assembler how to assemble instructions with segment override prefixes (in fact, I still need to go back to it and teach it how to disassemble them again)

  4. On Monday, I looked through the diffs from the previous round of hacking and forward-ported them - or at least, the bits that still made sense - into the current version. This was for formerly-global variables which are actually supposed to be per-thread (stack pointers and the like)

  5. On Tuesday, I had other unrelated work to do for, which I actually get paid (note to interested readers: if you would like to see a threaded SBCL sooner rather than later, I am available to implement this and other SBCL/CMUCL enhancements on a contract basis. Email me for details). Spent a couple of hours on various kinds of public transport thinking about symbol binding and unbinding

  6. Wednesday (yesterday) and today I wrote bind, unbind, set and symbol-value vops, and equivalent functionality in C (following a variation of the Extreme Programming methodologists' Once And Only Once principle known as "Once And Only Once More") for the bits of C code that need to do these things in situations where not enough of Lisp is running to use the normal vops. Then I debugged it. This is mostly a question of rebuilding, watching it segmentation fault, attaching gdb, disassembling, cursing, set disassemble intel, disassembling again, and scratching head. With intermissions for previously noted LUG meeting in Cardiff, and tour of Swansea.

    <dan_b> straw poll: how many special variables do we think that sbcl binds during cold init?
    <dan_b> I think 4096 sounds like a lot
    <Krystof> I'd have been surprised to find that we had 4096 distinct special variables
    <Krystof> * (length (apropos-list "*"))
    <Krystof> 1055
    <dan_b> hmm
    [...]
    <dan_b> aha.  got it, i think
    <dan_b>       (storew tls-index symbol symbol-tls-index-slot other-pointer-lowtag)
    <dan_b> having assigned a new tls index for the symbol, it would help if we actually stored it in the symbol strcuture for next time
    

    Right now work is paused, because recompiling takes an hour and eats battery life, and I'm on a train. So, time out to write diary and plan the next bit.

    (The next bit will actually be integration with the allocator and garbage collector, given that right now it would be completely unsafe for two threads to cons at once)

    Ext3 errors continue (though no panics today yet, at least) even with the new kernel, so that's not the problem. Would blame hardware except that there's none of the usual scary messages from the ide driver. Maybe a filesystem integrity problem caused (I'm guessing here) by apm forced shutdown when battery low, at a time that the disk was being written to, and not fully fixed by subsequent fscks. Tempted to try mkfs and reinstall (it's the root disk, not /home or anything important), and see if that helps. I probably have a lot of old config files, orphaned library packages and other similar stuff anyway, and it would be nice to get rid of them.

Weekend off, pretty much#

Mon Dec 2 16:30:43 2002

Topics: lua lisp

Weekend off, pretty much. Weekend spent preparing for, co-ordinating and recovering from my friend Simon's stag night.

write diary and plan the next bit, he said on Thursday. Said plans have not yet reached the enemy, because still wrestling with bugs in the current bit. Most of them fairly silly bugs once found : what's wrong with this code?



(define-vop (bind)
  (:args (val :scs (any-reg descriptor-reg))
	 (symbol :scs (descriptor-reg)))
  (:temporary (:sc unsigned-reg) tls-index temp bsp)
  (:generator 5
    (let ((tls-index-valid (gen-label)))
      (load-tl-symbol-value bsp *binding-stack-pointer*)
      (loadw tls-index symbol symbol-tls-index-slot other-pointer-lowtag)
      (inst add bsp (* binding-size n-word-bytes))
      (store-tl-symbol-value bsp *binding-stack-pointer* temp)
      (inst jmp :ne tls-index-valid)
      ;; allocate a new tls-index
[...]

That's right. There's no test before we do the conditional jump. Let's try

(define-vop (bind)
  (:args (val :scs (any-reg descriptor-reg))
	 (symbol :scs (descriptor-reg)))
  (:temporary (:sc unsigned-reg) tls-index temp bsp)
  (:generator 5
    (let ((tls-index-valid (gen-label)))
      (load-tl-symbol-value bsp *binding-stack-pointer*)
      (loadw tls-index symbol symbol-tls-index-slot other-pointer-lowtag)
      (inst add bsp (* binding-size n-word-bytes))
      (store-tl-symbol-value bsp *binding-stack-pointer* temp)
      (or tls-index tls-index)
      (inst jmp :ne tls-index-valid)
      ;; allocate a new tls-index
Why doesn't that work either? This is (also) the kind of bug that's blindingly obvious once you realise it, of course: (or tls tls) is not a form that will assemble into the instruction that does set the condition register. It's a Lisp expression that evaluates tls then evaluates tls if tls was NIL. But then, as you see, doesn't actually use the result in any case. Eventually I noticed the missing bit:
      (inst or tls-index tls-index)

and things are substantially rosier. Now we make it all the way through cold initialization to the toplevel, but break with what looks like random memory corruption shortly afterwards.

There is one and only one aspect in which doing low-level stuff on an#

Tue Dec 3 23:01:41 2002

Topics: lisp

There is one and only one aspect in which doing low-level stuff on an x86 is appealing: hardware watchpoints.

After spending a lot of time today tracing through the disassembly for the note-undefined-reference function to find out how it was getting (0 . 0) into undefined-warnings - supposedly a list - I put a watchpoint on that memory location and reran. The value subsequently changed in get-output-stream-string, which is a perfectly normal library function that has nothing to do with the compiler and certainly no references to undefined-warnings. After seeing that it was the work of mere minutes (although, I concede, entirely too many of them) to realise that things would probably work a Whole Lot Better if GC were allowed to see the thread-local value vectors so it could update pointers when the objects move. Sigh.

Now we're back to a system that actually gets all the way through cold-init and PCL compilation to produce a usable Lisp. Admittedly, still not one that lets the user actually create threads (not much point adding thread creation primitives until consing is thread-safe, after all), but it's a start.

So, it looks like I write these entries once per CVS commit - or at#

Wed Dec 11 01:22:06 2002

Topics: sbcl

So, it looks like I write these entries once per CVS commit - or at least, once per version of sbcl/threads that I believe represents some kind of advance on the previous state.

GENCGC (the GENerational Conservative GC that we use on the SBCL x86 port) already has support for `allocation regions'. These are small areas (typically a couple of pages each) within which consing can be done very cheaply: by bumping a free pointer and returning its old value. If we hit the end of the area, we have to stop and allocate another, which doesn't have to be contiguous. So, all we really need for parallelisable allocation is to have one of these areas open per thread. When any thread runs out of open region it can stop and get a lock from somewhere before updating gory gc details, but doing that once every two pages (arbitrary number which in any case we can tune) has got to be better than every cons (8 bytes).

So this is what I spent all week doing. Although the code was there, my guess is that it was several years old and had cerainly never been tested with multiple regions open at once. On my first attempt it gave me two overlapping regions, so I added in some stuff to stop it allocating from apparently-empty-but-still-open regions, so in retaliation it blew its mind and spent the next several days randomly blowing up with the kind of memory corruption bugs that I love tracking down more than anything. So, I know substantially more about the operation of gencgc than I used to, and I've managed to get a spot of unrelated tidying up in there too. Which is mostly just as applicable to the base (unthreaded) SBCL and I might even backport, depending on how much easier/harder I think it might make the eventual merge.

YAY#

Thu Dec 12 03:52:13 2002

Topics:

YAY

* (defvar *foo* nil)

*FOO*
* (defun thread-nnop () (loop (setf *foo* (not *foo*)) (sleep 1)))

THREAD-NNOP
* (make-thread #'thread-nnop)
child pid is 15070

#<SB-ALIEN-INTERNALS:ALIEN-VALUE :SAP #X4095F000>
* /pausing 15070 before entering funcall0(0x9269615)
/entering funcall0(0x9269615)
9

9
* 1

1
* *foo*

NIL
* *foo*

T
* *foo*

T
* *foo*

NIL
* *foo*

T
* *foo*

NIL

The answer, in case you were wondering, is "extremely cool"

Thanks to the Phoenix Picturehouse for managing to show LoTR tonight despite#

Thu Dec 19 01:17:02 2002

Topics: lisp sbcl

Thanks to the Phoenix Picturehouse for managing to show LoTR tonight despite having had a power problem that knocked out half of their supply (and judging by the neighbourhood, approximately a third of the houses on the street. Looks like one of the phases had gone). Looking forward to the next round of Very Secret Diaries.

Thanks also to the Botley Road branch of Carphone Warehouse, for deciding that my mobile phone was in fact still under warranty and sending it back (again) for warranty repair (again). This time I showed them the engineers' reports from its last two holidays: one occasion they'd "reflowed filters and p.a." and the other time they'd "reflowed p.a.s and filters" - I think I managed to make the point that I would prefer they try something different this time (replacing the transmitter coil and upgrading the firmware is apparently the correct fix), but of course, I made that point to the staff in the shop, so it remains to be seen what the engineers will actually do.

No thanks to the Cornmarket branch of the same chain, who had decided that it was out of warranty, that the Sale of Goods Act was not relevant (personally I disagree, having old-fashioned notions that "fit for purpose sold" should usually imply "lasts for more than three months" when the item in question is a mobile telephone) and that really they could not offer any help at all. Next time I buy a phone, it won't be from you guys. I regret that the last one was, really.

And, really, no thanks to Ericsson, for (a) producing a phone with this bug in it anyway, (b) failing to spot it and fix it properly on the last two repairs. Five minutes with Google will tell you all you need to know about the T39m No Network bug - if it's been fixed in models made since end 2001, they really could have rectified it on either of the occasions it's been back to them in 2002. Ho hum.

You wanted to know about SBCL threading? Since the last diary entry, fixed stupid bug which was stopping the whole thread-local symbol access from working (creating a new thread was also setting %gs in the parent thread as well as the child, oops), then owing to not wanting to think too hard on Sunday evening about writing new VOPs for locking primitives, decided to take short break by reintroducing the control stack exhaustion checking that I'd disabled when doing the initial make-multiple-stacks work.

And also took rather longer break (Friday, Saturday, and some portion of Sunday and Monday to return suits, PA equipment and generally unwind afterwards) to do Best Man stuff for my friends' wedding. No, I did not lose the rings. No, nobody had any reasons that the two persons were not allowed to be joined in matrimony. Yes, they are now successfully married, and guests at least appeared to have enjoyed it. And laughed at (with?) my speech. But I don't write here about people with no web presence of their own, so that's all you hear about that.

Thinking about locks again, now. That's in the context of multithreaded systems, not nuptials.

As an aside I would like to insert a warning to those who identify the#

Thu Dec 19 19:32:21 2002

Topics:

As an aside I would like to insert a warning to those who identify the difficulty of the programming task with the struggle against the inadequacies of our current tools, because they might conclude that, once our tools will be much more adequate, programming will no longer be a problem. Programming will remain very difficult, because once we have freed ourselves from the circumstantial cumbersomeness, we will find ourselves free to tackle the problems that are now well beyond our programming capacity.
Edsger W Dijkstra, EWD340

When he says it, it sounds credible. When I say it, it sounds like I'm whining.

Error Messages, Rule 0#

Mon Dec 30 19:54:37 2002

Topics:
:; phoenix
:;

Writing Error Messages, Rule 0: failing silently is not acceptable

:; phoenix --display :0
Fontconfig error: Cannot load default config file
:;

Writing Error Messages, Rule 1: when the problem is with an external file, print the file name. Rule 2: if there's an OS error of some kind, print the errno information.

:; sudo apt-get install fontconfig
[...]
:; phoenix
.. and suddenly, it works. There's some comment to be made here also about Debian packaging, but the package was labelled experimental anyway, so I'm probbaly not going to be too harsh there.

First impressions:

  • it works
  • it works with emacs' browser stuff, so I can use it for hyperspec lookup
  • It has a weird unlabelled extra dialog box beside the url bar, with an icon that looks like a lollipop. Clicking on this gives a dropdown menu which allows the selection of 'Find in this Page', dmoz.org or Google, so I would guess it's intended as some kind of search box. There's no way of activating it that I can see, though - certainly, typing into it and pressing Return doesn't seem to do much. Maybe it's displaying into a hidden sidebar? From the release notes it looks like this program appears to like sidebars. This user, however, doesn't.
  • It supports that new-fangled font stuff that looks pretty but is actually harder to read than the apps I was using five years ago

Here's a screenshot. You may need to shift-click and save it if apache mod_proxy is interfering with my content-types again. Need to get that looked at, yes.