Weekend spent on Araneida hacking, which somehow ended up being#

Mon, 01 Sep 2003 02:58:02 +0000

Weekend spent on Araneida hacking, which somehow ended up being SBCL hacking. Two motivations:

For the new version of the Local Food Directory (a CLiki application, if it wasn't immediately obvious from looking at it) we're going to be doing interesting things like tying into Streetmap for postcode searches and suchlike. It's also prettier, which is nice if you like that kind of thing.
The practical impact of tie-ins with an external server, though, is that our latency for replying to requests now depends on circumstances beyond our control. Most (all) of the Araneida services so far are written so that we have some fairly good idea of how long it'll take to answer a request: we only talk http to a localhost proxy instead of slow remote clients; if we send mail, we send it to an smtp listener on the local machine; and our database queries, on the sites that use databases, are mostly fairly well tuned. But, remote hosts. Can't do much with that. Had better try this threading stuff for real, then.
The users of Araneida, of which there seem to be an expanding number no matter what obstacles I throw up in their way (making them all install SBCL was going to be a pretty good way to make them go away, I thought, but still apparently Not Enough) are fond of pointing out that the export-server stuff is a bit weird at best. I concur. It made sense once upon a time.

In brief, we add a new class http-listener (concrete subclasses threaded-http-listener and serve-event-http-listener) which represents a single endpoint (a host/port combination) and dispatches all stuff that comes in on that endpoint to a handler

(defclass http-listener ()
  ((handler :initform *root-handler* :initarg handler
	    :accessor http-listener-handler)
   (address :initform #(0 0 0 0) :initarg :address
	    :accessor http-listener-address)
   (port :initform 80 :initarg :port :accessor http-listener-port)
   ;; ...
   ))

Many listeners may dispatch to the same handler, and the handler may be a dispatching handler if it likes. There still needs to be some provision somewhere for lying about the hostname (as in, your external server address is foo.com:80/ but your araneida is actually on :8000) and a place to hang random extra bits that would be useful for generating apache httpd.conf segments (like where on the disk to find ssl certificates, etc), but it looks good so far.

While doing this, add some dynamic guess-how-many-processes-we-need stuff to adjust the number of serving threads based on load, slap it all back together and hammer it a bit with apachebench.

Oops. Not literally "Oops" in the sense that a Linux kernel hacker would know it, but EFAULT from accept() anyway, which is kind of analogous for a user program. sb-bsd-sockets:socket-accept calls accept with the contents of a suitably sized Lisp vector as the second argument (the sockaddr). To make sure this doesn't get relocated between our taking the address and calling out, we wrap it in without-gcing to disable gc for the duration. This is actually a really bad idea because all the idle threads block in select(), so we can't GC unless we're really really busy; somewhat perverse. And, just to be awkward, doesn't always work either. For some reason that I haven't yet found, it is possible to have a GC happen during without-gcing.

I shall spare you the story of my VOP-writing and GC staring-at experience on Sunday afternoon, but the upshot is that

we have two new VOPs on x86 to put pointers on the C stack and take them off again. This causes the objects pointed to to be marked dont_move at GC time, so we no longer need to without-gcing around many foreign calls: instead we can e.g.

    (sb-ext::with-pointers-preserved (sockaddr)
      (let* ((ad (sb-sys:int-sap
		  (+ (logand (sb-kernel::get-lisp-obj-address sockaddr)
			     (lognot 7)) 8)))
	     (fd (sockint::accept (socket-file-descriptor socket)
				  ad (size-of-sockaddr socket))))

after making this change and finding that the wretched thing was still faulting, it occurred to me at about 2am that marking a page dont_move doesn't stop it from being write-protected (i.e. using mprotect()) if it has no pointers to earlier generations. Although we have a SIGSEGV handler which does the necessary when we write to a protected page from Lisp, that's not going to help too much with a syscall. So, one small hack to the GC later - currently rebuilding - and let's see if that helps.

It should be noted that although I started writing this entry at whatever ungodly time it says, I finished and uploaded it at around midday on Monday

That was very nearly the right answer#

Mon, 01 Sep 2003 21:22:14 +0000

That was very nearly the right answer. In fact

there are a couple of places that write-protect pages, so both need to check the dont_move bit
GC wipes out dontmove on all pages at the start of GC, before running preservepointers(). preservepointers only looks at the 'from' space (after all, we're not moving objects anyway unless they're in in fromspace), so when we see these pages again later we no longer have them specially marked, so we write-protect them as normal. Wrong. Having realised the problem, the solution is simple: only reset dontmove on fromspace pages instead of on all pages.

:; /usr/sbin/ab -c5 -n10000 http://xxxxxxxxxxxxxxxxxxx:8009/Welcome [...] Server Software: Araneida/0.74 Server Hostname: xxxxxxxxxxxxxxxxxxx Server Port: 8009

Document Path: /Welcome Document Length: 2796 bytes

Concurrency Level: 5 Time taken for tests: 190.112 seconds Complete requests: 10000 Failed requests: 0 Broken pipe errors: 0 Total transferred: 29582958 bytes HTML transferred: 27962796 bytes Requests per second: 52.60 [#/sec] (mean) Time per request: 95.06 [ms] (mean) Time per request: 19.01 [ms] (mean, across all concurrent requests) Transfer rate: 155.61 [Kbytes/sec] received

Connnection Times (ms) min mean[+/-sd] median max Connect: -10 0 12.0 0 848 Processing: 7 42 104.7 17 1998 Waiting: 0 41 104.1 16 1998 Total: 7 42 105.9 17 1998

Percentage of the requests served within a certain time (ms) 50% 17 66% 25 75% 34 80% 42 90% 66 95% 105 98% 285 99% 665 100% 1998 (last request)

10000 requests later with no problems, and i think we've got this nailed.

Owing to an accounting screwup (which may actually have been my fault,#

Wed, 03 Sep 2003 04:23:44 +0000

Owing to an accounting screwup (which may actually have been my fault, for a change), my cablemodem service stopped at around 2pm today, NTL obviously preferring to switch everything off than to e.g. attempt to contact the customer in such circumstances. Having resolving this in the space of about half an hour - most of it in various kinds of queues, 14 hours later my service has still not been reconnected. "By the end of the day", they said. Ha.

Apparently the customer service people (who have the authority to reconnect) don't have access to the same computer systems as the technical support department (who actually have the ability) and have to send email to effect this. My guess is that the email in question is stuck behind a zillion bounces and a virus checking bottleneck, and I should plan for the return of my The Infinite Distraction That Is Internet service sometime on Friday. In the meantime I'm reduced to 56k modem on a noisy (and pay-per-minute) analogue line, so for the most part not on irc.

I think I got more work done without net than I typically (or at least, often) do with it. But on the other hand, anyone who wanted to look at cvs.telent.net couldn't. Bad luck, anyone.

So, after a fair length of time on the phone to NTL (although they've made#

Wed, 03 Sep 2003 16:57:39 +0000

So, after a fair length of time on the phone to NTL (although they've made serious improvment on their telephone hold time in recent months, for the past couple of weeks the support burden incurred by MSBlast and Sobig seem to have crippled them) I have the Intarweb back.

It feels like an achievement, somehow.

Latest hack (yesterday's): IPC using X properties and SendMessage#

Sun, 07 Sep 2003 18:07:42 +0000

Latest hack (yesterday's): IPC using X properties and SendMessage. Details here, here, and here.

In the last three#

Wed, 10 Sep 2003 16:30:23 +0000

In the last three and a bit years of self-employment, I have learned that I am a better software developer than I am a salesperson, and that the opportunities for contract SBCL development are too few and far between to make a living out of anyway. There are those who would argue that I should have known both of those things three years ago, but, well, I feel better to have found out for certain. And the (fairly interesting, but not quite my vocation) unix/web/database/etc work I've been doing in the meantime has funded some quite neat unpaid SBCL work anyway.

But, as I said, my motivation for sales is matched only by my aptitude at same, and as I don't right now have people queueing up outsde the door to thrust work into my laid-back, casual, and ungrateful hands, I think it's time to draw a line under that life episode and start looking for something else to do. In other words, I've been kind of busy with agencies and similar stuff in the last few days. Will Hack For - well, sensible amounts of money, as the saying nearly goes.

The informal groups' vested interests will be sustained by the#

Thu, 11 Sep 2003 21:55:10 +0000

The informal groups' vested interests will be sustained by the informal structures that exist, and the movement will have no way of determining who shall exercise power within it. If the movement continues deliberately not to select who shall exercise power, it does not thereby abolish power. All it does is abdicate the right to demand that those who do exercise power and influence be responsible for it. If the movement continues to keep power as diffuse as possible because it knows it cannot demand responsibility from those who have it, it does prevent any group or person from totally dominating. But it simultaneously ensures that the movement is as ineffective as possible. Some middle ground between domination and ineffectiveness can and must be found.

'The Tyranny of Structurelessness', by Jo Freeman

It's actually nothing to do with the Free Software movement or the Open Source tribe(sic); it was first printed by the women's liberation movement in 1970. But, next time I see Eric Raymond's articles on ZDNet or Cnet or wherever and find myself muttering things like "I didn't vote for him", well, hmm.

As an example, let's say that Alice and Bob generate PGP Keys with GPG and hold a PGP key signing party#

Sat, 13 Sep 2003 15:24:05 +0000

As an example, let's say that Alice and Bob generate PGP Keys with GPG and hold a PGP key signing party. At the party Alice and Bob go verify each others' key information and later sign each others' keys. GPG by default automatically signs the public key of every pair it generates with the associated private key. So, Alice and Bob both now have at least two signatures validating that their keys belong to them. Alice's key was signed by Alice herself and Bob and Bob's key was signed by Bob himself and Alice. In the future Alice and Bob meet Cathy. Cathy generates a key pair and tells Alice and Bob that she will send them her key. Alice doesn't like Cathy and doesn't want Bob to exchange encrypted communications with her. Both Alice and Cathy generate PGP keys which they claim belong to Cathy. They both send them to Bob. Both keys have one signature, the signature of the associated private key. Bob does not know which key is really Cathy's. Cathy hears that Bob got two keys, and suspects Alice. Cathy, now angry, wishes to gain information that she can use against Alice. In order to acquire this information Cathy must compromise the encrypted communications between Alice and Bob. In order to do this, Cathy decides to forge an email to Bob from Alice telling him that Alice has generated new keys. In the forged email, Cathy includes Alice's "new" public key (which is in fact a fake key generated by Cathy). However, Bob knows for sure this is a trick because even though Bob now has two keys for Alice, one of the keys has been signed by multiple people (himself and Alice) verifying that it does indeed belong to Alice, while the other key - Cathy's fake key - only has its own signature.

GnuPG Keysigning Party HOWTO

I quoted this chiefly to put you in the same frame of mind as I was when I read the following paragraph, which starts

The above example is very simplified and things can get a lot more complicated than that.

Ask me again why we don't seem to be seeing pervasive use of crypto yet.

Two weird SBCL problems, neither of whicl I have any particularly good#

Wed, 17 Sep 2003 18:20:47 +0000

Two weird SBCL problems, neither of whicl I have any particularly good ideas about

On Alpha, SBCL versions built with Glibc versions older than 2.3.2 don't run on 2.3.2, complaining about invalid args to mprotect. 2.3.2 itself won't build the SBCL runtime because we use a custom linker script to force stuff under 2Gb, which needs updating for 2.3.2
On X86/threaded, there's some kind of (what I assume to be a) thread safety bug apparently connected to &rest arguments. I can't see anything obviously bad in how we create rest lists, but one iteration in every far-too-many we get messages from e.g. char-equal saying that The value 0 is not of type BASE-CHAR. Which is perfectly correct in itself, but as we're not calling char-equal with 0, seems like it should be superfuous.

Item (2) below (logically above) turns out to be a problem in the#

Thu, 18 Sep 2003 16:00:44 +0000

Item (2) below (logically above) turns out to be a problem in the garbage collector and not at all a library bug. We weren't scavenging signal contexts (and thus registers) when they were on alternate signal stacks. Most threads don't spend a lot of time in alternate signal stacks, so you have to work the GC fairly hard to see this. Fixed now, anyway.

There still seems to be some bug which I haven't found, where the GCing thread will occasionally overcount the number of threads it's signalled to stop, and then sit in a loop indefinitely waiting for a thread that doesn't exist.

I discovered last night that gpg has an alternate output mode which is intended to be#

Sun, 21 Sep 2003 02:40:52 +0000

I discovered last night that gpg has an alternate output mode which is intended to be machine-parseable, so I've just committed some exciting new breakage to SBCL's asdf-install contrib to make it a bit smarter about checking GPG signatures. It now attempts to check signatures for all packages no matter where they've come from, but there are restarts to bypass most of the checks.

A package may

have no gpg signature at all
be signed by a gpg key you don't have on your keyring
be signed by a key on your keyring but which you don't have a trust relationship with (i.e. nobody you know has signed it)
be signed by a trusted key, but not be on the list of package suppliers (after all, just because you trust someone is who they say they are, you might not want to install their lisp software)

The first two of these are presently terminal errors, the third can be ignored, and the fourth has a restart that lets you add the packager to your package supplier list. The package supplier list is stored between sessions in ~/.sbcl/trusted-uids.lisp

This is an incompatible change: cue evil laughter.

Half the known world have probably linked to this already:#

Mon, 22 Sep 2003 23:39:42 +0000

Half the known world have probably linked to this already:

Some people read it as a personal endorsement of PHP, VB, and other semi-baked programming languages. Actually my personal preference is a much darker, uglier, and more shameful secret: Common Lisp, CLOS, plus an ML-like type inferencing compiler/error checker (with some things done in a sublanguage with Haskell semantics and Lisp syntax). Common Lisp dates from around 1982 and ML from 1984.
I try to keep this preference concealed from young people who've been raised on a diet of C, Java, C#, Perl, etc. They just wouldn't find it credible that 20-year-old systems and ideas are actually better than the latest and greatest from Microsoft and Sun.

philg

To be honest, I find his protestations a little unconvincing. Just like jwz, the Unix-Haters "I wish I was still using Lisp" sentiment seems to shine through in almost everything he writes.

Freenode #lisp denizens, and some percentage of comp.lang.lisp readers#

Thu, 25 Sep 2003 03:45:20 +0000

Freenode #lisp denizens, and some percentage of comp.lang.lisp readers who read the article I mistakenly posted instead of mailed privately, will know that earlier this year I was flirting with the possibility of writing a book. 'Flirting' is not really the right term: I got as far as producing an outline and sending it to a publisher who appeared to like the idea, so if we're going to stick with this general metaphor, I got as far as a dinner date and invitation back for coffee afterwards.

Actually, let's drop that metaphor.

So anyway, today I'm feeling slightly up on hearing the news that there were more than 100 participants in #lisp again (even if two of them are me), and slightly down on hearing from the prospective publisher, who I hope won't mind being excerpted here (this is all public information anyway; I wouldn't post the sensitive stuff), that

According to sourceforge, the i386 rpms for [GNU CLISP] 2.3.1 have been downloaded 351 times since 8/31/03, and the 2.3.0 rpms have been downloaded 2,454 times since 9/02. The win32 archive has been downloaded 1,436 and 11,059 times, respectively. The other lisp projects on sourceforge have barely been downloaded at all.

(He's basically engaged in trying to figure out the size of the potential market for a book aimed at teaching Lisp to users of other high-level languages: Perl and Python and suchlike. Poor soul)

So I went to have a look, and modulo that I wouldn't trust sourceforge stats further than I can throw them, he's absolutely right. SBCL downloads seem to average about 100 a month, though with enough noise to make any kind of useful analysis worthless. That's one user for every ten developers. Of course, there are probably also people following CVS, and the Debian packages aren't counted in that and may be good for a few more, but even so. (Yes, I've looked at the popularity-contest figures. No, they don't cheer me up much)

In fairness, I think the real reasons I'm not feeling so great right now are (a) I still have no idea about this stupid threading/GC/locking/whatever-it-is bug, which moves somewhere else every time I put in any kind of code that would help me track it down, and (b) I'm still spending too much time dealing with recruiters. If I knew what I was doing on at least one of these scores, I would face with more grace and equinamity the knowledge that it's getting on for the best part of a year since I started on the current attempt to add thread support to SBCL.

Yesterday's diary entry demonstrates quite convincingly why it's#

Fri, 26 Sep 2003 01:46:03 +0000

Yesterday's diary entry demonstrates quite convincingly why it's not a good idea to me to write when too tired to keep my eyes open. "One user for every ten developers"? "Strike that, reverse it", as Willy Wonka said. Said in the film, anyway. It's decades since I read the book, but I suspect he didn't say it there.

Made a certain amount of what I think is progress on the GC issue#

Sun, 28 Sep 2003 02:38:00 +0000

Made a certain amount of what I think is progress on the GC issue.

The pseudo-atomic mechanism in SBCL is supposed to let you run short sequences of Lisp code without icky things happening if signals are received during them. What it does is set a flag at the start of the PA sequence; any signal handler is supposed to check this and say "ah, unsafe. Let's save the information that this signal went off, and the signal mask as it was before the signal (from the signal context), then block all signals and return immediately". At the end of the PA sequence we check whether a signal had been received. If so, we run a trap instruction which sends us SIGTRAP which runs sigtrap_handler, which eventually gets around to running the signal handler for the pending signal. We also hack up the sigmask in the signal context to be whatever we'd saved when the pseudo-atomic section was interrupted, so when this handler returns, the process will once again be running with the normal mask, which is probably "nothing blocked". Clear? Good. I have to think about it quite hard too.

The net result should be pretty much as it would be if we just masked signals for the duration, but in the normal case that no signal was sent, we don't have the expense of a couple of syscalls to disable/reenable. Note that we don't have room to save more than one signal this way, but that's OK, because we block them all as soon as the first one goes off.

So, what have we found

One of the new signals we've added recently (in fact, I haven't even committed this yet, that's how recent) is SIGSTOPFORGC (actually SIGRTMIN+n), which is half of the protocol for pausing threads when we need to vacuum underneath them. The thread that wants to GC sends SIGSTOPFORGC to all the others. They have a sigstopforgchandler which does the pseudo-atomic thing, then kill(thisthread->pid,SIGSTOP). The first point of note is that this signal is not in the set that gets blocked during pseudo-atomic. So, if a thread in PA gets some other signal followed by a request to stop for GC, the STOPFOR_GC signal might overwrite the pending signal data. So, add it to the list.
But this means we can't do any kind of likely-to-block operation with signals masked, because the GC will want to pause us while we do it, and if we ignore its signal it will sit there (looping) until we take note. Ungood. Likely-to-block operations include waiting on queues, so anything involving lock acquisition is out.
Allocation is sometimes done with signals masked. I know that allocations from C inside an error handler or similar will do this, but maybe there's lisp code that also does. Allocation is pseudo-atomic: if we allocate with signals blocked, and the allocator decides to arrange for a garbage collection to happen after we're done, the saved signal mask in the pending information will have signals blocked too, so, well, the upshot is that we will run SUB-GC (the Lisp-level GC routine that calls collect_garbage) with, guess what, blocked signals. This is bad because it contains a with-mutex form to ensure that only one thread is collecting at a time.

This is looking less and less likely to be fixed in time for reasonable testing before 0.8.4.

Anyway, we can now at least unblock before calling SUB-GC, and it makes some kind of (positive, I think) difference. There's still more to do, though. (1) I've managed one time to get a thread looping in get-mutex, (2) it's hit against my new debugging assertion that says "don't sleep on a queue with signals disabled", but in this case the only two blocked signals are SIGTRAP and SIG_DEQUEUE (actually SIGRTMIN+n'), and (3) SIGSEGV is not one of the usual blocking set. I don't know if this is because (a) it can never happen in pseudo-atomic code - though SIGSEGV as a write barrier for objects in old spaces can happen just about anywhere, as far as I can see, or (b) because it didn't occur to the authors that SIGSEGV can occur just about anywhere - the code might well predate the generational gc and not have been updated, or (c) for some other reason I still don't understand. (b) or (c) seem likely.

What would block SIGTRAP and SIG_DEQUEUE? We don't even have a signal handler for the latter: we use sigwait() for it instead.

Oh, yeah, and another thing#

Sun, 28 Sep 2003 03:30:34 +0000

Oh, yeah, and another thing. GC hooks are playing up on some platforms, including Linux 2.6. So I installed it on my desktop yesterday (which is a rant for another time; the motherboard onboard audio is now painful to listen to) to have a look at. Th\xce problem seems to be with hooks that cons. Likely fix for for 0.8.4 is to remove the before-gc hooks altogether - they weren't that useful anyway - and move the after-gc hooks (which includes the code that runs finalizers for dead objects, so it's worth keeping) later, so that they don't run until the instance is in a more normal state (e.g. other threads are resumed, gc lock has been released, etc).

In what might be its most striking victory, the Alertbox ushered in#

Mon, 29 Sep 2003 17:00:43 +0000

In what might be its most striking victory, the Alertbox ushered in the decline of the glamour design agency.

Jakob Nielsen on presenting correlation as causation.

In other news, pop music radio found to be the primary cause of drug addiction in young people.

6043 calling sub-gc#

Tue, 30 Sep 2003 01:37:02 +0000

6043 calling sub-gc
6040 stopping for gc
6041 deferred stop for gc
6041 calling sub-gc
6042 stopping for gc
There was a terrible ghastly silence

Deadlock. Boom. I'm going to go to bed now, but I think in the morning, or reasonable analogue thereto, we can have this GC problem fixed.

OK, it looks like that was half the problem#

Tue, 30 Sep 2003 12:36:40 +0000

OK, it looks like that was half the problem. Or, at least, one of the two problems.

Allocation is done in pseduo-atomic sections. When alloc() decides that it's time to GC, it uses the same deferred handler mechanism as an interrupt received during pseudo-atomic to schedule a collection as soon as the allocation itself is done. Problem is that it doesn't (didn't, anyway) check whether there was already a deferred handler to run, so the message that said "stop for another thread to gc us" got whapped by a message saying "now do a gc". It wouldn't break us to call gc from two threads at once - appropriate locking mechanisms are in place - but it doe hurt to not stop when people are waiting.

So, one down. The other one is that sometimes threads don't seem to wake up after gc, so after a few minutes of running, all our threads quietly come to rest waiting for a signal.

Earlier we asked "What would block SIGTRAP and SIGDEQUEUE?". wait-on-queue blocks SIGDEQUEUE temporarily while it frobs the waitqueue data before it can go to sleep. rundeferredhandler is called from the sigtrap_handler, and although we unblock the usual culprits before calling into Lisp, SIGTRAP is (along with SIGSEGV) not in that set.

I've added the good parts of this experimentation (without, I hope, the debugging cruft) to CVS under the tag atropos-branch. If you can deal with the shear abhorrence of all these signals, you're welcome to take a look.

⟪Aug 2003 Oct 2003⟫