Weekend spent on Araneida hacking, which somehow ended up being
SBCL hacking. Two motivations:
For the new version of the Local Food Directory (a
CLiki application, if it wasn't immediately obvious from looking at
it) we're going to be doing interesting things like tying into Streetmap for postcode
searches and suchlike. It's also prettier, which is nice if you like
that kind of thing.
The practical impact of tie-ins with an external server, though, is
that our latency for replying to requests now depends on circumstances
beyond our control. Most (all) of the Araneida services so far are
written so that we have some fairly good idea of how long it'll take
to answer a request: we only talk http to a localhost proxy instead of
slow remote clients; if we send mail, we send it to an smtp listener
on the local machine; and our database queries, on the sites that use
databases, are mostly fairly well tuned. But, remote hosts. Can't do
much with that. Had better try this threading stuff for real, then.
The users of Araneida, of which there seem to be an expanding number no
matter what obstacles I throw up in their way (making them all install
SBCL was going to be a pretty good way to make them go away, I
thought, but still apparently Not Enough) are fond of pointing out
that the export-server stuff is a bit weird at best. I concur.
It made sense once upon a time.
In brief, we add a new class
http-listener (concrete subclasses
threaded-http-listener and serve-event-http-listener)
which represents a single endpoint (a host/port combination) and
dispatches all stuff that comes in on that endpoint to a handler
Many listeners may dispatch to the same handler, and the handler may
be a dispatching handler if it likes. There still needs to be some
provision somewhere for lying about the hostname (as in, your external
server address is foo.com:80/ but your araneida is actually on :8000)
and a place to hang random extra bits that would be useful for
generating apache httpd.conf segments (like where on the disk to find
ssl certificates, etc), but it looks good so far.
While doing this, add some dynamic guess-how-many-processes-we-need
stuff to adjust the number of serving threads based on load, slap it
all back together and hammer it a bit with apachebench.
Oops. Not literally "Oops" in the sense that a Linux kernel hacker
would know it, but EFAULT from accept() anyway, which is kind
of analogous for a user program.
sb-bsd-sockets:socket-accept calls accept with the
contents of a suitably sized Lisp vector as the second argument (the
sockaddr). To make sure this doesn't get relocated between our taking
the address and calling out, we wrap it in without-gcing to
disable gc for the duration. This is actually a really bad idea
because all the idle threads block in select(), so we can't GC unless
we're really really busy; somewhat perverse. And, just to be awkward,
doesn't always work either. For some reason that I haven't yet found,
it is possible to have a GC happen during without-gcing.
I shall spare you the story of my VOP-writing and GC
staring-at experience on Sunday afternoon, but the upshot is that
we have two new VOPs on x86 to put pointers on the C stack and
take them off again. This causes the objects pointed to to be marked
dont_move at GC time, so we no longer need to without-gcing around
many foreign calls: instead we can e.g.
after making this change and finding that the wretched thing was
still faulting, it occurred to me at about 2am that marking a page
dont_move doesn't stop it from being write-protected (i.e. using
mprotect()) if it has no pointers to earlier generations.
Although we have a SIGSEGV handler which does the necessary when we
write to a protected page from Lisp, that's not going to help too much
with a syscall. So, one small hack to the GC later - currently
rebuilding - and let's see if that helps.
It should be noted that although I started writing this entry at
whatever ungodly time it says, I finished and uploaded it at around
midday on Monday
there are a couple of places that write-protect pages, so both
need to check the dont_move bit
GC wipes out dont_move on all pages at the start of GC,
before running preserve_pointers().
preserve_pointers only looks at the 'from' space (after all,
we're not moving objects anyway unless they're in in fromspace), so
when we see these pages again later we no longer have them specially
marked, so we write-protect them as normal. Wrong. Having realised
the problem, the solution is simple: only reset dont_move on
fromspace pages instead of on all pages.
:; /usr/sbin/ab -c5 -n10000 http://xxxxxxxxxxxxxxxxxxx:8009/Welcome
[...]
Server Software: Araneida/0.74
Server Hostname: xxxxxxxxxxxxxxxxxxx
Server Port: 8009
Document Path: /Welcome
Document Length: 2796 bytes
Concurrency Level: 5
Time taken for tests: 190.112 seconds
Complete requests: 10000
Failed requests: 0
Broken pipe errors: 0
Total transferred: 29582958 bytes
HTML transferred: 27962796 bytes
Requests per second: 52.60 [#/sec] (mean)
Time per request: 95.06 [ms] (mean)
Time per request: 19.01 [ms] (mean, across all concurrent requests)
Transfer rate: 155.61 [Kbytes/sec] received
Connnection Times (ms)
min mean[+/-sd] median max
Connect: -10 0 12.0 0 848
Processing: 7 42 104.7 17 1998
Waiting: 0 41 104.1 16 1998
Total: 7 42 105.9 17 1998
Percentage of the requests served within a certain time (ms)
50% 17
66% 25
75% 34
80% 42
90% 66
95% 105
98% 285
99% 665
100% 1998 (last request)
10000 requests later with no problems, and i think we've got this
nailed.
Owing to an accounting screwup (which may actually have been my fault,#
Wed Sep 3 04:23:44 2003
Topics:
Owing to an accounting screwup (which may actually have been my fault,
for a change), my cablemodem service stopped at around 2pm today, NTL
obviously preferring to switch everything off than to e.g. attempt to
contact the customer in such circumstances. Having resolving this in
the space of about half an hour - most of it in various kinds of
queues, 14 hours later my service has still not been reconnected.
"By the end of the day", they said. Ha.
Apparently the customer service people (who have the authority to
reconnect) don't have access to the same computer systems as the
technical support department (who actually have the ability) and have
to send email to effect this. My guess is that the email in
question is stuck behind a zillion bounces and a virus checking
bottleneck, and I should plan for the return of my The Infinite
Distraction That Is Internet service sometime on Friday. In the
meantime I'm reduced to 56k modem on a noisy (and pay-per-minute)
analogue line, so for the most part not on irc.
I think I got more work done without net than I typically
(or at least, often) do with it. But on the other hand, anyone who
wanted to look at cvs.telent.net couldn't. Bad luck, anyone.
So, after a fair length of time on the phone to NTL (although they've made#
Wed Sep 3 16:57:39 2003
Topics:
So, after a fair length of time on the phone to NTL (although they've made
serious improvment on their telephone hold time in recent months, for
the past couple of weeks the support burden incurred by MSBlast and
Sobig seem to have crippled them) I have the Intarweb back.
It feels like an achievement, somehow.
Latest hack (yesterday's): IPC using X properties and SendMessage#
Sun Sep 7 18:07:42 2003
Topics:
Latest hack (yesterday's): IPC using X properties and SendMessage.
Details here, here, and here.
In the last three
and a bit years of self-employment, I have learned that I am a
better software developer than I am a salesperson, and that the
opportunities for contract SBCL development are too few and far
between to make a living out of anyway. There are those who would
argue that I should have known both of those things three years ago,
but, well, I feel better to have found out for certain. And the
(fairly interesting, but not quite my vocation) unix/web/database/etc
work I've been doing in the meantime has funded some quite neat unpaid
SBCL work anyway.
But, as I said, my motivation for sales is matched only by my aptitude
at same, and as I don't right now have people queueing up outsde the
door to thrust work into my laid-back, casual, and ungrateful hands, I
think it's time to draw a line under that life episode and start
looking for something else to do. In other words, I've been kind of
busy with agencies and similar stuff in the last few days. Will Hack
For - well, sensible amounts of money, as the saying nearly goes.
The informal groups' vested interests will be sustained by the#
Thu Sep 11 21:55:10 2003
Topics:
The informal groups' vested interests will be sustained by the
informal structures that exist, and the movement will have no way of
determining who shall exercise power within it. If the movement
continues deliberately not to select who shall exercise power, it does
not thereby abolish power. All it does is abdicate the right to demand
that those who do exercise power and influence be responsible for
it. If the movement continues to keep power as diffuse as possible
because it knows it cannot demand responsibility from those who have
it, it does prevent any group or person from totally dominating. But
it simultaneously ensures that the movement is as ineffective as
possible. Some middle ground between domination and ineffectiveness
can and must be found.
It's actually nothing to do with the Free Software movement
or the Open Source tribe(sic); it was first printed by the
women's liberation movement in 1970. But, next time I see Eric
Raymond's articles on ZDNet or Cnet or wherever and find myself
muttering things like "I didn't vote for him", well, hmm.
As an example, let's say that Alice and Bob generate PGP Keys with GPG and hold a PGP key signing party#
Sat Sep 13 15:24:05 2003
Topics:
As an example, let's say that Alice and Bob generate PGP Keys with GPG and hold a PGP key signing party. At the party Alice and Bob go verify each others' key information and later sign each others' keys. GPG by default automatically signs the public key of every pair it generates with the associated private key. So, Alice and Bob both now have at least two signatures validating that their keys belong to them. Alice's key was signed by Alice herself and Bob and Bob's key was signed by Bob himself and Alice. In the future Alice and Bob meet Cathy. Cathy generates a key pair and tells Alice and Bob that she will send them her key. Alice doesn't like Cathy and doesn't want Bob to exchange encrypted communications with her. Both Alice and Cathy generate PGP keys which they claim belong to Cathy. They both send them to Bob. Both keys have one signature, the signature of the associated private key. Bob does not know which key is really Cathy's. Cathy hears that Bob got two keys, and suspects Alice. Cathy, now angry, wishes to gain information that she can use against Alice. In order to acquire this information Cathy must compromise the encrypted communications between Alice and Bob. In order to do this, Cathy decides to forge an email to Bob from Alice telling him that Alice has generated new keys. In the forged email, Cathy includes Alice's "new" public key (which is in fact a fake key generated by Cathy). However, Bob knows for sure this is a trick because even though Bob now has two keys for Alice, one of the keys has been signed by multiple people (himself and Alice) verifying that it does indeed belong to Alice, while the other key - Cathy's fake key - only has its own signature.
Two weird SBCL problems, neither of whicl I have any particularly good
ideas about
On Alpha, SBCL versions built with Glibc versions older than
2.3.2 don't run on 2.3.2, complaining about invalid args to mprotect. 2.3.2 itself won't build the SBCL
runtime because we use a custom linker script to force stuff under
2Gb, which needs updating for 2.3.2
On X86/threaded, there's some kind of (what I assume to be a)
thread safety bug apparently connected to &rest arguments. I
can't see anything obviously bad in how we create rest lists, but one
iteration in every far-too-many we get messages from
e.g. char-equal saying that The value 0 is not of type
BASE-CHAR. Which is perfectly correct in itself, but as we're
not callingchar-equal with 0, seems like it should be
superfuous.
Item (2) below (logically above) turns out to be a problem in the#
Thu Sep 18 16:00:44 2003
Topics:
Item (2) below (logically above) turns out to be a problem in the
garbage collector and not at all a library bug. We weren't scavenging
signal contexts (and thus registers) when they were on alternate
signal stacks. Most threads don't spend a lot of time in alternate
signal stacks, so you have to work the GC fairly hard to see this.
Fixed now, anyway.
There still seems to be some bug which I haven't found, where the
GCing thread will occasionally overcount the number of threads it's
signalled to stop, and then sit in a loop indefinitely waiting for a
thread that doesn't exist.
I discovered last night that gpg has an alternate output mode which is intended to be#
I discovered last night that gpg has an alternate output mode which is intended to be
machine-parseable, so I've just committed some exciting new breakage
to SBCL's asdf-install contrib to make it a bit smarter about checking GPG
signatures. It now attempts to check signatures for all packages no
matter where they've come from, but there are restarts to bypass most
of the checks.
A package may
have no gpg signature at all
be signed by a gpg key you don't have on your keyring
be signed by a key on your keyring but which you don't have a trust
relationship with (i.e. nobody you know has signed it)
be signed by a trusted key, but not be on the list of package
suppliers (after all, just because you trust someone is who they say they
are, you might not want to install their lisp software)
The first two of these are presently terminal errors, the third can
be ignored, and the fourth has a restart that lets you add the
packager to your package supplier list. The package supplier list is
stored between sessions in ~/.sbcl/trusted-uids.lisp
This is an incompatible change: cue evil laughter.
Half the known world have probably linked to this already:#
Half the known world have probably linked to this already:
Some people read it as a personal endorsement of PHP, VB, and other
semi-baked programming languages. Actually my personal preference is
a much darker, uglier, and more shameful secret: Common Lisp, CLOS,
plus an ML-like type inferencing compiler/error checker (with some
things done in a sublanguage with Haskell semantics and Lisp syntax).
Common Lisp dates from around 1982 and ML from 1984.
I try to keep this preference concealed from young people who've been
raised on a diet of C, Java, C#, Perl, etc. They just wouldn't find
it credible that 20-year-old systems and ideas are actually better
than the latest and greatest from Microsoft and Sun.
To be honest, I find his protestations a little unconvincing. Just
like jwz, the Unix-Haters "I wish I was still using Lisp" sentiment
seems to shine through in almost everything he writes.
Freenode #lisp denizens, and some percentage of comp.lang.lisp readers#
Freenode #lisp denizens, and some percentage of comp.lang.lisp readers
who read the article I mistakenly posted instead of mailed privately,
will know that earlier this year I was flirting with the possibility
of writing a book. 'Flirting' is not really the right term: I got as
far as producing an outline and sending it to a publisher who appeared
to like the idea, so if we're going to stick with this general
metaphor, I got as far as a dinner date and invitation back for coffee
afterwards.
Actually, let's drop that metaphor.
So anyway, today I'm feeling slightly up on hearing the news that
there were more than 100 participants in #lisp again (even if two of
them are me), and slightly down on hearing from the prospective
publisher, who I hope won't mind being excerpted here (this is all
public information anyway; I wouldn't post the sensitive stuff), that
According to sourceforge, the i386 rpms for [GNU CLISP] 2.3.1 have been
downloaded 351 times since 8/31/03, and the 2.3.0 rpms have been downloaded
2,454 times since 9/02. The win32 archive has been downloaded 1,436 and
11,059 times, respectively. The other lisp projects on sourceforge have
barely been downloaded at all.
(He's basically engaged in trying to figure out the size of the
potential market for a book aimed at teaching Lisp to users of other
high-level languages: Perl and Python and suchlike. Poor soul)
So I went to have a look, and modulo that I wouldn't trust
sourceforge stats further than I can throw them, he's absolutely
right. SBCL downloads seem to average about 100 a month, though with
enough noise to make any kind of useful analysis worthless. That's
one user for every ten developers. Of course, there are probably also
people following CVS, and the Debian packages aren't counted in that
and may be good for a few more, but even so. (Yes, I've looked at the
popularity-contest figures. No, they don't cheer me up much)
In fairness, I think the real reasons I'm not feeling so great
right now are (a) I still have no idea about this stupid
threading/GC/locking/whatever-it-is bug, which moves somewhere else
every time I put in any kind of code that would help me track it down,
and (b) I'm still spending too much time dealing with recruiters. If
I knew what I was doing on at least one of these scores, I would face
with more grace and equinamity the knowledge that it's getting on for
the best part of a year since
I started on the current attempt to add thread support to SBCL.
Yesterday's diary entry demonstrates quite convincingly why it's
not a good idea to me to write when too tired to keep my eyes open.
"One user for every ten developers"? "Strike that, reverse it", as
Willy Wonka said. Said in the film, anyway. It's decades since I
read the book, but I suspect he didn't say it there.
Made a certain amount of what I think is progress on the GC issue#
Made a certain amount of what I think is progress on the GC issue.
The pseudo-atomic mechanism in SBCL is supposed to let you run short
sequences of Lisp code without icky things happening if signals are
received during them. What it does is set a flag at the start of the
PA sequence; any signal handler is supposed to check this and say "ah,
unsafe. Let's save the information that this signal went off, and the
signal mask as it was before the signal (from the signal context),
then block all signals and return immediately". At the end of the PA
sequence we check whether a signal had been received. If so, we run a
trap instruction which sends us SIGTRAP which runs sigtrap_handler,
which eventually gets around to running the signal handler for the
pending signal. We also hack up the sigmask in the signal context to
be whatever we'd saved when the pseudo-atomic section was interrupted,
so when this handler returns, the process will once again be running
with the normal mask, which is probably "nothing blocked". Clear?
Good. I have to think about it quite hard too.
The net result should be pretty much as it would be if we just
masked signals for the duration, but in the normal case that no signal
was sent, we don't have the expense of a couple of syscalls to
disable/reenable. Note that we don't have room to save more than one
signal this way, but that's OK, because we block them all as soon as
the first one goes off.
So, what have we found
One of the new signals we've added recently (in fact, I haven't
even committed this yet, that's how recent) is SIG_STOP_FOR_GC
(actually SIGRTMIN+n), which is half of the protocol for pausing
threads when we need to vacuum underneath them. The thread that wants
to GC sends SIG_STOP_FOR_GC to all the others. They have a
sig_stop_for_gc_handler which does the pseudo-atomic thing, then
kill(this_thread->pid,SIGSTOP). The first point of note is that this
signal is not in the set that gets blocked during pseudo-atomic. So,
if a thread in PA gets some other signal followed by a request to stop
for GC, the STOP_FOR_GC signal might overwrite the pending signal
data. So, add it to the list.
But this means we can't do any kind of likely-to-block operation
with signals masked, because the GC will want to pause us while we do
it, and if we ignore its signal it will sit there (looping) until we
take note. Ungood. Likely-to-block operations include waiting on
queues, so anything involving lock acquisition is out.
Allocation is sometimes done with signals masked. I know that
allocations from C inside an error handler or similar will do this,
but maybe there's lisp code that also does. Allocation is
pseudo-atomic: if we allocate with signals blocked, and the allocator
decides to arrange for a garbage collection to happen after we're
done, the saved signal mask in the pending information will have
signals blocked too, so, well, the upshot is that we will run SUB-GC
(the Lisp-level GC routine that calls collect_garbage) with, guess
what, blocked signals. This is bad because it contains a with-mutex
form to ensure that only one thread is collecting at a time.
This is looking less and less likely to be fixed in time for
reasonable testing before 0.8.4.
Anyway, we can now at least unblock before calling SUB-GC, and it
makes some kind of (positive, I think) difference. There's still more
to do, though. (1) I've managed one time to get a thread looping in
get-mutex, (2) it's hit against my new debugging assertion that says
"don't sleep on a queue with signals disabled", but in this case the
only two blocked signals are SIGTRAP and SIG_DEQUEUE (actually
SIGRTMIN+n'), and (3) SIGSEGV is not one of the usual blocking set. I
don't know if this is because (a) it can never happen in pseudo-atomic
code - though SIGSEGV as a write barrier for objects in old spaces can
happen just about anywhere, as far as I can see, or (b) because it
didn't occur to the authors that SIGSEGV can occur just about anywhere
- the code might well predate the generational gc and not have been
updated, or (c) for some other reason I still don't understand. (b) or
(c) seem likely.
What would block SIGTRAP and SIG_DEQUEUE? We don't even have a
signal handler for the latter: we use sigwait() for it instead.
Oh, yeah, and another thing. GC hooks are playing up on some
platforms, including Linux 2.6. So I installed it on my desktop
yesterday (which is a rant for another time; the motherboard onboard
audio is now painful to listen to) to have a look at. Th\xce problem
seems to be with hooks that cons. Likely fix for for 0.8.4 is to
remove the before-gc hooks altogether - they weren't that useful
anyway - and move the after-gc hooks (which includes the code that
runs finalizers for dead objects, so it's worth keeping) later, so
that they don't run until the instance is in a more normal state
(e.g. other threads are resumed, gc lock has been released, etc).
In what might be its most striking victory, the Alertbox ushered in#
Mon Sep 29 17:00:43 2003
Topics:
In what might be its most striking victory, the Alertbox ushered in
the decline of the glamour design agency.
OK, it looks like that was half the problem. Or, at least, one of
the two problems.
Allocation is done in pseduo-atomic sections. When
alloc() decides that it's time to GC, it uses the same
deferred handler mechanism as an interrupt received during
pseudo-atomic to schedule a collection as soon as the allocation
itself is done. Problem is that it doesn't (didn't, anyway) check
whether there was already a deferred handler to run, so the message
that said "stop for another thread to gc us" got whapped by a message
saying "now do a gc". It wouldn't break us to call gc from two
threads at once - appropriate locking mechanisms are in place -
but it doe hurt to not stop when people are waiting.
So, one down. The other one is that sometimes threads don't seem
to wake up after gc, so after a few minutes of running, all our
threads quietly come to rest waiting for a signal.
Earlier we asked "What would block SIGTRAP
and SIG_DEQUEUE?". wait-on-queue blocks SIG_DEQUEUE
temporarily while it frobs the waitqueue data before it can go to
sleep. run_deferred_handler is called from the sigtrap_handler, and
although we unblock the usual culprits before calling into Lisp,
SIGTRAP is (along with SIGSEGV) not in that set.
I've added the good parts of this experimentation (without, I
hope, the debugging cruft) to CVS under the tag atropos-branch.
If you can deal with the shear abhorrence of all these signals, you're
welcome to take a look.
It is clear why men and women have sexually dimorphic reproductive organs#
Thu Oct 2 01:55:55 2003
Topics:
It is clear why men and women have sexually dimorphic reproductive organs. But why did they evolve a sexually dimorphic digit ratio? Manning notes that it has been suggested that the male digit ratio pattern may be functional - a longer ring finger may help to stabilize the third digit (the middle finger) when throwing objects, thus increasing throwing accuracy. This implies that the throwing accuracy required for successful hunting and/or tribal warfare was of sufficient importance to drive the evolution of this sexually dimorphic trait. While gathering, ancestral women presumably did not need this extra stability for the third finger. Today, this sex difference may be seen in male superiority in throwing darts. And, it would be interesting to know if men with lower digit ratios were better dart throwers and men with higher digit ratios.