Weekend spent on Araneida hacking, which somehow ended up being#
Mon, 01 Sep 2003 02:58:02 +0000
Weekend spent on Araneida hacking, which somehow ended up being
SBCL hacking. Two motivations:
- For the new version of the Local Food Directory (a
CLiki application, if it wasn't immediately obvious from looking at
it) we're going to be doing interesting things like tying into Streetmap for postcode
searches and suchlike. It's also prettier, which is nice if you like
that kind of thing.
The practical impact of tie-ins with an external server, though, is
that our latency for replying to requests now depends on circumstances
beyond our control. Most (all) of the Araneida services so far are
written so that we have some fairly good idea of how long it'll take
to answer a request: we only talk http to a localhost proxy instead of
slow remote clients; if we send mail, we send it to an smtp listener
on the local machine; and our database queries, on the sites that use
databases, are mostly fairly well tuned. But, remote hosts. Can't do
much with that. Had better try this threading stuff for real, then.
- The users of Araneida, of which there seem to be an expanding number no
matter what obstacles I throw up in their way (making them all install
SBCL was going to be a pretty good way to make them go away, I
thought, but still apparently Not Enough) are fond of pointing out
that the export-server stuff is a bit weird at best. I concur.
It made sense once upon a time.
In brief, we add a new class
http-listener (concrete subclasses
threaded-http-listener and serve-event-http-listener)
which represents a single endpoint (a host/port combination) and
dispatches all stuff that comes in on that endpoint to a handler
(defclass http-listener ()
((handler :initform *root-handler* :initarg handler
:accessor http-listener-handler)
(address :initform #(0 0 0 0) :initarg :address
:accessor http-listener-address)
(port :initform 80 :initarg :port :accessor http-listener-port)
;; ...
))
Many listeners may dispatch to the same handler, and the handler may
be a dispatching handler if it likes. There still needs to be some
provision somewhere for lying about the hostname (as in, your external
server address is foo.com:80/ but your araneida is actually on :8000)
and a place to hang random extra bits that would be useful for
generating apache httpd.conf segments (like where on the disk to find
ssl certificates, etc), but it looks good so far.
While doing this, add some dynamic guess-how-many-processes-we-need
stuff to adjust the number of serving threads based on load, slap it
all back together and hammer it a bit with apachebench.
Oops. Not literally "Oops" in the sense that a Linux kernel hacker
would know it, but EFAULT from accept() anyway, which is kind
of analogous for a user program.
sb-bsd-sockets:socket-accept calls accept with the
contents of a suitably sized Lisp vector as the second argument (the
sockaddr). To make sure this doesn't get relocated between our taking
the address and calling out, we wrap it in without-gcing to
disable gc for the duration. This is actually a really bad idea
because all the idle threads block in select(), so we can't GC unless
we're really really busy; somewhat perverse. And, just to be awkward,
doesn't always work either. For some reason that I haven't yet found,
it is possible to have a GC happen during without-gcing.
I shall spare you the story of my VOP-writing and GC
staring-at experience on Sunday afternoon, but the upshot is that
It should be noted that although I started writing this entry at
whatever ungodly time it says, I finished and uploaded it at around
midday on Monday
That was very nearly the right answer#
Mon, 01 Sep 2003 21:22:14 +0000
That was very nearly the right answer. In fact
- there are a couple of places that write-protect pages, so both
need to check the dont_move bit
- GC wipes out dontmove on all pages at the start of GC,
before running preservepointers().
preservepointers only looks at the 'from' space (after all,
we're not moving objects anyway unless they're in in fromspace), so
when we see these pages again later we no longer have them specially
marked, so we write-protect them as normal. Wrong. Having realised
the problem, the solution is simple: only reset dontmove on
fromspace pages instead of on all pages.
:; /usr/sbin/ab -c5 -n10000 http://xxxxxxxxxxxxxxxxxxx:8009/Welcome
[...]
Server Software: Araneida/0.74
Server Hostname: xxxxxxxxxxxxxxxxxxx
Server Port: 8009Document Path: /Welcome
Document Length: 2796 bytes
Concurrency Level: 5
Time taken for tests: 190.112 seconds
Complete requests: 10000
Failed requests: 0
Broken pipe errors: 0
Total transferred: 29582958 bytes
HTML transferred: 27962796 bytes
Requests per second: 52.60 [#/sec] (mean)
Time per request: 95.06 [ms] (mean)
Time per request: 19.01 [ms] (mean, across all concurrent requests)
Transfer rate: 155.61 [Kbytes/sec] received
Connnection Times (ms)
min mean[+/-sd] median max
Connect: -10 0 12.0 0 848
Processing: 7 42 104.7 17 1998
Waiting: 0 41 104.1 16 1998
Total: 7 42 105.9 17 1998
Percentage of the requests served within a certain time (ms)
50% 17
66% 25
75% 34
80% 42
90% 66
95% 105
98% 285
99% 665
100% 1998 (last request)
10000 requests later with no problems, and i think we've got this
nailed.
Owing to an accounting screwup (which may actually have been my fault,#
Wed, 03 Sep 2003 04:23:44 +0000
Owing to an accounting screwup (which may actually have been my fault,
for a change), my cablemodem service stopped at around 2pm today, NTL
obviously preferring to switch everything off than to e.g. attempt to
contact the customer in such circumstances. Having resolving this in
the space of about half an hour - most of it in various kinds of
queues, 14 hours later my service has still not been reconnected.
"By the end of the day", they said. Ha.
Apparently the customer service people (who have the authority to
reconnect) don't have access to the same computer systems as the
technical support department (who actually have the ability) and have
to send email to effect this. My guess is that the email in
question is stuck behind a zillion bounces and a virus checking
bottleneck, and I should plan for the return of my The Infinite
Distraction That Is Internet service sometime on Friday. In the
meantime I'm reduced to 56k modem on a noisy (and pay-per-minute)
analogue line, so for the most part not on irc.
I think I got more work done without net than I typically
(or at least, often) do with it. But on the other hand, anyone who
wanted to look at cvs.telent.net couldn't. Bad luck, anyone.
So, after a fair length of time on the phone to NTL (although they've made#
Wed, 03 Sep 2003 16:57:39 +0000
So, after a fair length of time on the phone to NTL (although they've made
serious improvment on their telephone hold time in recent months, for
the past couple of weeks the support burden incurred by MSBlast and
Sobig seem to have crippled them) I have the Intarweb back.
It feels like an achievement, somehow.
Latest hack (yesterday's): IPC using X properties and SendMessage#
Sun, 07 Sep 2003 18:07:42 +0000
Latest hack (yesterday's): IPC using X properties and SendMessage.
Details here, here, and here.
In the last three#
Wed, 10 Sep 2003 16:30:23 +0000
In the last three
and a bit years of self-employment, I have learned that I am a
better software developer than I am a salesperson, and that the
opportunities for contract SBCL development are too few and far
between to make a living out of anyway. There are those who would
argue that I should have known both of those things three years ago,
but, well, I feel better to have found out for certain. And the
(fairly interesting, but not quite my vocation) unix/web/database/etc
work I've been doing in the meantime has funded some quite neat unpaid
SBCL work anyway.
But, as I said, my motivation for sales is matched only by my aptitude
at same, and as I don't right now have people queueing up outsde the
door to thrust work into my laid-back, casual, and ungrateful hands, I
think it's time to draw a line under that life episode and start
looking for something else to do. In other words, I've been kind of
busy with agencies and similar stuff in the last few days. Will Hack
For - well, sensible amounts of money, as the saying nearly goes.
The informal groups' vested interests will be sustained by the#
Thu, 11 Sep 2003 21:55:10 +0000
The informal groups' vested interests will be sustained by the
informal structures that exist, and the movement will have no way of
determining who shall exercise power within it. If the movement
continues deliberately not to select who shall exercise power, it does
not thereby abolish power. All it does is abdicate the right to demand
that those who do exercise power and influence be responsible for
it. If the movement continues to keep power as diffuse as possible
because it knows it cannot demand responsibility from those who have
it, it does prevent any group or person from totally dominating. But
it simultaneously ensures that the movement is as ineffective as
possible. Some middle ground between domination and ineffectiveness
can and must be found.
It's actually nothing to do with the Free Software movement
or the Open Source tribe(sic); it was first printed by the
women's liberation movement in 1970. But, next time I see Eric
Raymond's articles on ZDNet or Cnet or wherever and find myself
muttering things like "I didn't vote for him", well, hmm.
As an example, let's say that Alice and Bob generate PGP Keys with GPG and hold a PGP key signing party#
Sat, 13 Sep 2003 15:24:05 +0000
As an example, let's say that Alice and Bob generate PGP Keys with GPG and hold a PGP key signing party. At the party Alice and Bob go verify each others' key information and later sign each others' keys. GPG by default automatically signs the public key of every pair it generates with the associated private key. So, Alice and Bob both now have at least two signatures validating that their keys belong to them. Alice's key was signed by Alice herself and Bob and Bob's key was signed by Bob himself and Alice. In the future Alice and Bob meet Cathy. Cathy generates a key pair and tells Alice and Bob that she will send them her key. Alice doesn't like Cathy and doesn't want Bob to exchange encrypted communications with her. Both Alice and Cathy generate PGP keys which they claim belong to Cathy. They both send them to Bob. Both keys have one signature, the signature of the associated private key. Bob does not know which key is really Cathy's. Cathy hears that Bob got two keys, and suspects Alice. Cathy, now angry, wishes to gain information that she can use against Alice. In order to acquire this information Cathy must compromise the encrypted communications between Alice and Bob. In order to do this, Cathy decides to forge an email to Bob from Alice telling him that Alice has generated new keys. In the forged email, Cathy includes Alice's "new" public key (which is in fact a fake key generated by Cathy). However, Bob knows for sure this is a trick because even though Bob now has two keys for Alice, one of the keys has been signed by multiple people (himself and Alice) verifying that it does indeed belong to Alice, while the other key - Cathy's fake key - only has its own signature.
I quoted this chiefly to put you in the same frame of mind as I was
when I read the following paragraph, which starts
The above example is very simplified and things can get a lot more
complicated than that.
Ask me again why we don't seem to be seeing pervasive use of crypto
yet.
Two weird SBCL problems, neither of whicl I have any particularly good#
Wed, 17 Sep 2003 18:20:47 +0000
Two weird SBCL problems, neither of whicl I have any particularly good
ideas about
- On Alpha, SBCL versions built with Glibc versions older than
2.3.2 don't run on 2.3.2, complaining about invalid args to mprotect. 2.3.2 itself won't build the SBCL
runtime because we use a custom linker script to force stuff under
2Gb, which needs updating for 2.3.2
- On X86/threaded, there's some kind of (what I assume to be a)
thread safety bug apparently connected to &rest arguments. I
can't see anything obviously bad in how we create rest lists, but one
iteration in every far-too-many we get messages from
e.g. char-equal saying that The value 0 is not of type
BASE-CHAR. Which is perfectly correct in itself, but as we're
not calling char-equal with 0, seems like it should be
superfuous.
Item (2) below (logically above) turns out to be a problem in the#
Thu, 18 Sep 2003 16:00:44 +0000
Item (2) below (logically above) turns out to be a problem in the
garbage collector and not at all a library bug. We weren't scavenging
signal contexts (and thus registers) when they were on alternate
signal stacks. Most threads don't spend a lot of time in alternate
signal stacks, so you have to work the GC fairly hard to see this.
Fixed now, anyway.
There still seems to be some bug which I haven't found, where the
GCing thread will occasionally overcount the number of threads it's
signalled to stop, and then sit in a loop indefinitely waiting for a
thread that doesn't exist.
I discovered last night that gpg has an alternate output mode which is intended to be#
Sun, 21 Sep 2003 02:40:52 +0000
I discovered last night that gpg has an alternate output mode which is intended to be
machine-parseable, so I've just committed some exciting new breakage
to SBCL's asdf-install contrib to make it a bit smarter about checking GPG
signatures. It now attempts to check signatures for all packages no
matter where they've come from, but there are restarts to bypass most
of the checks.
A package may
- have no gpg signature at all
- be signed by a gpg key you don't have on your keyring
- be signed by a key on your keyring but which you don't have a trust
relationship with (i.e. nobody you know has signed it)
- be signed by a trusted key, but not be on the list of package
suppliers (after all, just because you trust someone is who they say they
are, you might not want to install their lisp software)
The first two of these are presently terminal errors, the third can
be ignored, and the fourth has a restart that lets you add the
packager to your package supplier list. The package supplier list is
stored between sessions in ~/.sbcl/trusted-uids.lisp
This is an incompatible change: cue evil laughter.
Half the known world have probably linked to this already:#
Mon, 22 Sep 2003 23:39:42 +0000
Half the known world have probably linked to this already:
Some people read it as a personal endorsement of PHP, VB, and other
semi-baked programming languages. Actually my personal preference is
a much darker, uglier, and more shameful secret: Common Lisp, CLOS,
plus an ML-like type inferencing compiler/error checker (with some
things done in a sublanguage with Haskell semantics and Lisp syntax).
Common Lisp dates from around 1982 and ML from 1984.I try to keep this preference concealed from young people who've been
raised on a diet of C, Java, C#, Perl, etc. They just wouldn't find
it credible that 20-year-old systems and ideas are actually better
than the latest and greatest from Microsoft and Sun.
To be honest, I find his protestations a little unconvincing. Just
like jwz, the Unix-Haters "I wish I was still using Lisp" sentiment
seems to shine through in almost everything he writes.
Freenode #lisp denizens, and some percentage of comp.lang.lisp readers#
Thu, 25 Sep 2003 03:45:20 +0000
Freenode #lisp denizens, and some percentage of comp.lang.lisp readers
who read the article I mistakenly posted instead of mailed privately,
will know that earlier this year I was flirting with the possibility
of writing a book. 'Flirting' is not really the right term: I got as
far as producing an outline and sending it to a publisher who appeared
to like the idea, so if we're going to stick with this general
metaphor, I got as far as a dinner date and invitation back for coffee
afterwards.
Actually, let's drop that metaphor.
So anyway, today I'm feeling slightly up on hearing the news that
there were more than 100 participants in #lisp again (even if two of
them are me), and slightly down on hearing from the prospective
publisher, who I hope won't mind being excerpted here (this is all
public information anyway; I wouldn't post the sensitive stuff), that
According to sourceforge, the i386 rpms for [GNU CLISP] 2.3.1 have been
downloaded 351 times since 8/31/03, and the 2.3.0 rpms have been downloaded
2,454 times since 9/02. The win32 archive has been downloaded 1,436 and
11,059 times, respectively. The other lisp projects on sourceforge have
barely been downloaded at all.
(He's basically engaged in trying to figure out the size of the
potential market for a book aimed at teaching Lisp to users of other
high-level languages: Perl and Python and suchlike. Poor soul)
So I went to have a look, and modulo that I wouldn't trust
sourceforge stats further than I can throw them, he's absolutely
right. SBCL downloads seem to average about 100 a month, though with
enough noise to make any kind of useful analysis worthless. That's
one user for every ten developers. Of course, there are probably also
people following CVS, and the Debian packages aren't counted in that
and may be good for a few more, but even so. (Yes, I've looked at the
popularity-contest figures. No, they don't cheer me up much)
In fairness, I think the real reasons I'm not feeling so great
right now are (a) I still have no idea about this stupid
threading/GC/locking/whatever-it-is bug, which moves somewhere else
every time I put in any kind of code that would help me track it down,
and (b) I'm still spending too much time dealing with recruiters. If
I knew what I was doing on at least one of these scores, I would face
with more grace and equinamity the knowledge that it's getting on for
the best part of a year since
I started on the current attempt to add thread support to SBCL.
Yesterday's diary entry demonstrates quite convincingly why it's#
Fri, 26 Sep 2003 01:46:03 +0000
Yesterday's diary entry demonstrates quite convincingly why it's
not a good idea to me to write when too tired to keep my eyes open.
"One user for every ten developers"? "Strike that, reverse it", as
Willy Wonka said. Said in the film, anyway. It's decades since I
read the book, but I suspect he didn't say it there.
Made a certain amount of what I think is progress on the GC issue#
Sun, 28 Sep 2003 02:38:00 +0000
Made a certain amount of what I think is progress on the GC issue.
The pseudo-atomic mechanism in SBCL is supposed to let you run short
sequences of Lisp code without icky things happening if signals are
received during them. What it does is set a flag at the start of the
PA sequence; any signal handler is supposed to check this and say "ah,
unsafe. Let's save the information that this signal went off, and the
signal mask as it was before the signal (from the signal context),
then block all signals and return immediately". At the end of the PA
sequence we check whether a signal had been received. If so, we run a
trap instruction which sends us SIGTRAP which runs sigtrap_handler,
which eventually gets around to running the signal handler for the
pending signal. We also hack up the sigmask in the signal context to
be whatever we'd saved when the pseudo-atomic section was interrupted,
so when this handler returns, the process will once again be running
with the normal mask, which is probably "nothing blocked". Clear?
Good. I have to think about it quite hard too.
The net result should be pretty much as it would be if we just
masked signals for the duration, but in the normal case that no signal
was sent, we don't have the expense of a couple of syscalls to
disable/reenable. Note that we don't have room to save more than one
signal this way, but that's OK, because we block them all as soon as
the first one goes off.
So, what have we found
- One of the new signals we've added recently (in fact, I haven't
even committed this yet, that's how recent) is SIGSTOPFORGC
(actually SIGRTMIN+n), which is half of the protocol for pausing
threads when we need to vacuum underneath them. The thread that wants
to GC sends SIGSTOPFORGC to all the others. They have a
sigstopforgchandler which does the pseudo-atomic thing, then
kill(thisthread->pid,SIGSTOP). The first point of note is that this
signal is not in the set that gets blocked during pseudo-atomic. So,
if a thread in PA gets some other signal followed by a request to stop
for GC, the STOPFOR_GC signal might overwrite the pending signal
data. So, add it to the list.
- But this means we can't do any kind of likely-to-block operation
with signals masked, because the GC will want to pause us while we do
it, and if we ignore its signal it will sit there (looping) until we
take note. Ungood. Likely-to-block operations include waiting on
queues, so anything involving lock acquisition is out.
- Allocation is sometimes done with signals masked. I know that
allocations from C inside an error handler or similar will do this,
but maybe there's lisp code that also does. Allocation is
pseudo-atomic: if we allocate with signals blocked, and the allocator
decides to arrange for a garbage collection to happen after we're
done, the saved signal mask in the pending information will have
signals blocked too, so, well, the upshot is that we will run SUB-GC
(the Lisp-level GC routine that calls collect_garbage) with, guess
what, blocked signals. This is bad because it contains a with-mutex
form to ensure that only one thread is collecting at a time.
This is looking less and less likely to be fixed in time for
reasonable testing before 0.8.4.
Anyway, we can now at least unblock before calling SUB-GC, and it
makes some kind of (positive, I think) difference. There's still more
to do, though. (1) I've managed one time to get a thread looping in
get-mutex, (2) it's hit against my new debugging assertion that says
"don't sleep on a queue with signals disabled", but in this case the
only two blocked signals are SIGTRAP and SIG_DEQUEUE (actually
SIGRTMIN+n'), and (3) SIGSEGV is not one of the usual blocking set. I
don't know if this is because (a) it can never happen in pseudo-atomic
code - though SIGSEGV as a write barrier for objects in old spaces can
happen just about anywhere, as far as I can see, or (b) because it
didn't occur to the authors that SIGSEGV can occur just about anywhere
- the code might well predate the generational gc and not have been
updated, or (c) for some other reason I still don't understand. (b) or
(c) seem likely.
What would block SIGTRAP and SIG_DEQUEUE? We don't even have a
signal handler for the latter: we use sigwait() for it instead.
Oh, yeah, and another thing#
Sun, 28 Sep 2003 03:30:34 +0000
Oh, yeah, and another thing. GC hooks are playing up on some
platforms, including Linux 2.6. So I installed it on my desktop
yesterday (which is a rant for another time; the motherboard onboard
audio is now painful to listen to) to have a look at. Th\xce problem
seems to be with hooks that cons. Likely fix for for 0.8.4 is to
remove the before-gc hooks altogether - they weren't that useful
anyway - and move the after-gc hooks (which includes the code that
runs finalizers for dead objects, so it's worth keeping) later, so
that they don't run until the instance is in a more normal state
(e.g. other threads are resumed, gc lock has been released, etc).
In what might be its most striking victory, the Alertbox ushered in#
Mon, 29 Sep 2003 17:00:43 +0000
In what might be its most striking victory, the Alertbox ushered in
the decline of the glamour design agency.
Jakob Nielsen on presenting
correlation as causation.
In other news, pop music radio found
to be the primary cause of drug addiction in young people.
6043 calling sub-gc#
Tue, 30 Sep 2003 01:37:02 +0000
6043 calling sub-gc
6040 stopping for gc
6041 deferred stop for gc
6041 calling sub-gc
6042 stopping for gc
There was a terrible ghastly silence
Deadlock. Boom. I'm going to go to bed now, but I think in the
morning, or reasonable analogue thereto, we can have this GC problem
fixed.
OK, it looks like that was half the problem#
Tue, 30 Sep 2003 12:36:40 +0000
OK, it looks like that was half the problem. Or, at least, one of
the two problems.
Allocation is done in pseduo-atomic sections. When
alloc() decides that it's time to GC, it uses the same
deferred handler mechanism as an interrupt received during
pseudo-atomic to schedule a collection as soon as the allocation
itself is done. Problem is that it doesn't (didn't, anyway) check
whether there was already a deferred handler to run, so the message
that said "stop for another thread to gc us" got whapped by a message
saying "now do a gc". It wouldn't break us to call gc from two
threads at once - appropriate locking mechanisms are in place -
but it doe hurt to not stop when people are waiting.
So, one down. The other one is that sometimes threads don't seem
to wake up after gc, so after a few minutes of running, all our
threads quietly come to rest waiting for a signal.
Earlier we asked "What would block SIGTRAP
and SIGDEQUEUE?". wait-on-queue blocks SIGDEQUEUE
temporarily while it frobs the waitqueue data before it can go to
sleep. rundeferredhandler is called from the sigtrap_handler, and
although we unblock the usual culprits before calling into Lisp,
SIGTRAP is (along with SIGSEGV) not in that set.
I've added the good parts of this experimentation (without, I
hope, the debugging cruft) to CVS under the tag atropos-branch.
If you can deal with the shear abhorrence of all these signals, you're
welcome to take a look.