While we are on this subject, this kernel change on the sourceforge#

Mon Jul 29 17:54:28 2002

Topics: sbcl

 While we are on this subject, this kernel change on the sourceforge
 machine meant that I had to transfer my ppc testing laboratory to Dan's
 iMac. And I found another nasty surprise waiting for me there. On the SF
 RS/6000 PPC, the floating point modifications worked as expected, giving
 the right kind of exceptions in the right circumstances. On Dan's iMac:
 
 * (/ 1.0 0.0)
 
 1.0 ; should signal DIVIDE-BY-ZERO

(Christophe's email to sbcl-devel). So, somehow, I got nominated to look at it.

I haven't actually fixed it yet. The immediate problem is that it's not sufficient to twiddle the FPSCR to enable floating point traps: you also have to set two bits in MSR (which, incidentally, you can't directly, as it's a privileged register). That's not actually a major problem, because there's a neato glibc function called feenableexcept() that does this (strace suggests that it works by installing a signal handler for SIGUSR1 that frobs the on-stack MSR, then doing kill(getpid(), SIGUSR1)). What is a major problem is getting it to stay set, because it gets reset in signal handlers, and restored when the handler returns - which punts us back into last week's problem, that half of the time, we don't return from said handlers in any conventional kind of way. Um. More details here

Did I say "restored when the handler returns"#

Mon Aug 5 11:46:57 2002

Topics: sbcl

Did I say "restored when the handler returns"? What doesn't get restored when the handler returns - no matter how the return is done, or longjmped past, or ignoreed completely, or whatever, are the floating point trap bits themselves. This behaviour is common to at least x86, ppc32, and ppc64, and the kernel people think it's the right thing to do. So, we may as well get used to it.

Not that I have any pressing wish to do anything about it immediately, because the MSR-frobbing aspect still needs a conventional(sic) return from the signal handler to set up right, which we don't do at present, so probbaly we need to do some more control stack frobbing stuff.

So, instead I turned my attention to threading. After some discussion at the LSM we decided that co-op userland threads were just going to be not nearly as exciting as "proper" threads. General consensus regarding pthreads was also fairly rapidly arrived at ("let's not go there"), so lately I've been thinking about using clone(). This presents a number of issues

thread safety is suddenly really important
stopping other threads from running is going to involve signals
without-preemption (as a general available-to-the-user construct that he can wrap around arbitrary forms) is hard, slow, messy and a bad idea.
dynamically bound symbols need something other than their current implementation if they're going to work usefully.
my evil slow handwavey mmap trick is not going to sowrk when we (a) don't control the scheduler, and (b) are sharing the memory map (including the protections) between all threads. This linux-kernel article looks potentially interesting in that respect, though I suspect that it would still be easier to use PTRACE_ATTACH.

Remember all those antics last month with alternate signal stacks and SIGSTKSZ and the rest#

Wed Aug 7 01:43:38 2002

Topics:

Remember all those antics last month with alternate signal stacks and SIGSTKSZ and the rest of it?

       starting address and size  of  the  stack.   The  constant
       SIGSTKSZ  is defined to be large enough to cover the usual
       size requirements for an alternate signal stack,  and  the
       constant  MINSIGSTKSZ defines the minimum size required to
       execute a signal handler.

SIGSTKSZ on x86 linux with whatever version of glibc I have installed here (2.2? whatever debian unstable has in it last month), is not actually big enough to call printf. Granted this is in general not a completely great idea anyway as printf is probably not reentrant, but it is somewhat disconcerting to switch on the "debug cold init by spewing messages to stderr" switch and get a whole different (and rather faster) failure mode.

We do entire garbage collections inside signal handlers (admittedly not on x86, so this specific problem doesn't arise there). Why do I feel queasy about this?

Language shapes the way we think#

Sat Aug 10 14:53:00 2002

Topics: lisp

Language shapes the way we think. This is easy to believe after several days reading and writing x86 assembler.

So we're talking about the new bind vop and related bits. The issue here is dynamic variable bindings: please skip the next two paras if you feel like it.

In Common Lisp - as in most languages, variables are usually lexically bound. That is, they're declared in the current function, or in the function textually enclosing it, or in that function's enclosing that, ad global. The alternative, dynamic binding, is that when a variable is not found in the current environment, we look at the environment of the caller, and if we can't find it there we work our way up the call stack. Lexical bindings make it a whole lot easier to see statically what's going on (your function behaves the same no matter who called it) which is generally considered better for everyone concerned (humans and compilers)

Which is not to say that dynamic binding is pointless. If you want to do something like pretty-print a tree using a recursive function, you might have a whole bunch of variables describing where to draw the next subtree (left edge, right edge, scaling factor, etc etc) which change as you descend a branch and which you need to restore as you unwind back up the tree. Dynamic binding gives you this behaviour For Free (as does using lots and lots of function arguments, but that gets kind of unwieldy when you realise you need to add another argument at every call site). So, CL (like Perl, in fact) provides both kinds of variable in the language . For historical reasons, we call the dynamic variables specials, and usually we mark their names with asterisks (like-so) to alert the programmer to what's going on. All clear?

OK, break over. Settle down, class.

If we want to implement dynamic binding in a fashion that makes variable lookup reasonably fast, we do it with a slot in each symbol that stores the current value, and a stack of (variable -> previous value) pairs. To rebind the variable (this is what the bind vop is for) we push the current value on the stack. To unbind, we pop the stack and set the value slot to whatever it was. That's unbind.

For a single-thread implementation this is great. You can code symbol value lookup as

(storew value symbol symbol-value-slot other-pointer-lowtag)

which translates to something dead simple like "store value in the location given by symbol+5" (FSVO 5 equal to symbol-value-slot*4+other-pointer-lowtag).

Obviously this doesn't work when you have many threads all wanting to bind the same symbols. If you have userland threads you can make the thread switch unwind and rewind the binding stack. If the kernel is doing the context switch you can't really make it do this for you, though. If the machine is SMP, there may not even be a context switch to happen: you could actually have two cpus executing lisp code simultaneously. So, you need some kind of per-thread storage area and a slot in the symbol to store an offset into this area

So, we have three options

Use a register to point to the bottom of a thread-local storage area, and index off it. Downside:the x86 doesn't really have any spare registers
We know that the stack pointer (and frame pointer) are different in every frame. Use stacks aligned to some known boundary, stick the TLS base at some known offset from the stack base, then we can mask the stack pointer to get to TLS and play with that. Downside: practically every VOP which accesses the TLS now needs an extra scratch register to do the mask-and-add stuff, which is (a) a lot of typing, (b) badly integrated with the register packing (if we do it all with VOPs, anyway) : every time we calculate this address it comes out to the same answer, so really we'd like to be able to reuse the register if we'd already done the calculation before.
Use a segment register. This is simply a matter of
- using modify_ldt (I think) to set up the segment register ~~> base address mapping on thread creation~~
- extending the assembler to know how to output them
- and the disassembler to understand them
- making each symbol reference indirect though %gs:something
Downside: see above.

Now I have to stop writing x86 assembler for a while, and start writing English prose or some approximation thereto, anyway. The ILC people are probably expecting a paper some time between now and Thursday

Quiet lately#

Wed Aug 14 02:00:58 2002

Topics:

Quiet lately. Writing English prose is just not as exciting as code - especially when it's either a conference paper or a C.V. (US: resumé) update. Yes, time to find more paid stuff. (with-blatant-commercial-opportunism "If you're looking for contract programming/consultancy in a CL- or Linux-related field, send email. CV on request")

Last night it was pointed out to me that the lyric I'd been hearing for the last 4 years as "He's got an ice lolly" (Propellerheads, Velvet Pants, Decksandrumsandrockandroll) probably in fact doesn't say that at all. Hmm. Still sounds like it to me, though.

Paul Graham, A Plan for Spam#

Fri Aug 16 12:03:14 2002

Topics:

Paul Graham, A Plan for Spam.

I used spamassassin for a while, but removed it temporarily when it started eating my computer after being reintroduced to 200 emails at once when I'd been away from the net. And I haven't replaced it since, because I quite quickly realised that with an approximate 20:1 spam to real mail ratio after filtering out mailing list stuff, it's actually simpler to delete spam from the inbox by hand these days than it is to check the spam folder for false positives (which may be a couple of orders of magnitude rarer, and so much easier to miss). So, I don't have any filtering any more.

Probabilities better than scores? As raph pointed out, you can take logs to the probabilities and get scores, but I don't think that's the issue. The interesting point is how you arrive at the per-word numbers in the first place, and the advantage of the Bayesian system is that it's transparent. Assuming current styles of email communication, I doubt that you will see Paul Graham's webmail system decide that a valid signature delimiter is an indicator of potential spam.

But on a more general note, I think that Paul's "Defining spam" appendix is a pretty good indication that we have terminology problems. What he's built is not in itself a spam filter, it's an uninteresting-mail filter - actually a far more useful tool - and if he were to refer to it as such, a lot of the borderline cases go away. Domain renewals are interesting to me: offers from Verisign for a Free E-Commerce Web Site are not. It doesn't matter if they think I've opted in or if I have an existing relationship with them: the point is that I don't want it, and I don't need to define it as spam before deciding to filter it.

I define spam as persistent or large scale sending of email in which there is no reasonable expectation that the recipients will be interested.

This is not a good definition for computers to use; they tend to choke on words like `reasonable' - but that doesn't matter. Computers on the receiving end are just filtering for interestingness anyway and don't need to care if it's spam. Computers in the network are primarily concerned with abuse of the net, so they don't need to care if it's spam either. If it's relaying through my servers, or faking its origin, that's a good enough reason to stop it no matter what the message content.

Use of automation is a characteristic of much spam, but it's not essential or even exclusive. Suppose someone at Amazon has determined from reading my web pages that I like the Propellerheads, and sent me email to say that they have the new album at half price. That's welcome news to me, and it makes no difference whether they sent the same email to a million other people (we assume that they'd determined that those other million were equally as interested). On the other hand, you can hand-letter your offer of cheap toner cartridges on vellum with a quill ten thousand times and send me six of the copies (each to slightly different mailboxes which are all too clearly routed to the same eventual destination) by courier delivery, and it still represents the large-scale sending of mail where you clearly made no effort to determine whether the recipients were interested. Spam.

For the record, I don't want to know about toner cartridges.

Yesterday was Bletchley Park#

Mon Aug 19 16:49:08 2002

Topics: lua lisp

Yesterday was Bletchley Park. 35 miles is slightly over twice as far as anywhere I've cycled in that past two or three years, so I was quite pleased to get there in around two hours twenty minutes, especially as it turned out to be 40 miles including getting lost. 18 mph average is, I think, fairly respectable.

The Bletchley Park Computer Museum was kind of neat, but they would benefit (well, I would have benefitted) from the addition of a LispM or several.

Eventually it was time to leave, and at some point around here the realization that the return journey was likely to be significantly slower hit me. Three hours twenty, for an average speed slightly less than 13mph, and it didn't help to run out of water halfway back either. On getting to the outskirts of civilization (Headington) I found an open off-licence which would sell me a bottle of water and a Twix bar: thus rehydrated the final two miles were easy. Of course, most of them were downhill too, which didn't hurt.

Yesterday evening I had planned to go to the pub, but found I was basically too tired and went to bed in fairly short order after a really odd experience where someone sent me an encoded message I couldn't break, which sounded exactly like my phone ringing. Current working hypothesis is that perhaps it was actually just the phone ringing, and my brain had it been present would have been telling me I was already half-asleep and should turn the lights off and so forth.

Feeling remarkably well today, anyway. Mildly sunburnt, but surprisingly at least not stiff and aching all over. Maybe that happens tomorrow.

Today is deal-with-NTL day. NTL cable modems work fine until they go wrong. When they go wrong, trying to get a human being on the telephone can take most of a day. To fully express my feelings on the matter of the NTL customer service voicemail system would require the invention of several new words, but in the meantime, imagine circular voicemail systems, "your call is valuable to us", "all our operators are busy, please ring back" (after ten minutes navigating voicemail options, lovely), and enough slightly-out-of-sync customer databases that every time I ring them I learn about the existence of another one. Today I found I was in the Cable Modem technical support database with the correct postcode, but didn't show up when they did a postcode search. Or, for that matter, a search on subscriber name. My modem was in there and showed up perfectly normally on a MAC address search, but there was no link between it and me. I thought the purpose of takeovers and mergers was supposed to be to increase profit by integrating systems, not just amassing large numbers of disparate ones?

Anyway, the internet is b0rken, has been since some time on Sunday morning, and probably will be until someone at the NTL local office (which may or may not exist, because every attempt I've made to call it has been met with "number not recognized" or forwarded into the national system) gets a message to phone me back to arrange an engineer to visit. Not holding out much hope here, it must be said. If you send me email, expect to receive answers a little more slowly than usual: I've had to dust off the old analogue modem

Today is also, it happens, my birthday, not that I have any particular plans to celebrate being another year closer to death. But anyway, spoils of war so far: teatowels, set of torture^Wbarbecue implements, sundry cards, and volume 3 of the IA32 Intel Architecture Software Developer's Manual. I think that was sent without any particular reference to the time of year, but thank you anyway Mr Intel.

The CL pathname system is mostly pretty neat#

Tue Aug 27 00:00:23 2002

Topics: sbcl lisp

The CL pathname system is mostly pretty neat. Rather than representing pathnames as strings, we parse them into pathname objects with accessors to get at the various bits

(defvar my-path (parse-namestring "/etc/init.d/apache"))
MY-PATH*
(describe my-path)
P"/etc/init.d/apache"
is an instance of class #<SB-PCL::STRUCTURE-CLASS PATHNAME>.
The following slots have :INSTANCE allocation:
 HOST         #<SB-IMPL::UNIX-HOST {5010EB9}>
 DEVICE       NIL
 DIRECTORY    (:ABSOLUTE "etc" "init.d")
 NAME         "apache"
 TYPE         NIL
 VERSION      :NEWEST
(pathname-directory my-path)
(:ABSOLUTE "etc" "init.d")

Not all of the slots are useful on all possible systems: most Unix-based Lisps don't understand about any host other than the local one, for example. device is a bit useless on Unix too. But that's ok, it's there for when you need to manipulate pathnames on VMS boxen. Plus Unix doesn't really have file types as such; the foo.bar convention really is just a convention, so it's pretty much non-obvious what (pathname-type #p"foo.bar.baz") is without referring to your implementation. But overall it's a nifty facility that easily beats doing your own tokenizing for "/" characters.

Problem is, flushed with their success in providing mostly-useful pathnames, the ANSI people got a bit carried away and went on to invent these things called logical pathnames. At first sight these look really useful. Logical pathnames get their own hosts, and when you try to open them go they through a pattern-matching exercise to get mapped to customizable places in the real filesystem. For example

* (translate-logical-pathname #p"cl-library:infix;infix.lisp")
#P"/usr/share/common-lisp/source/infix/infix.lisp"
* (translate-logical-pathname #p"cl-library:infix;infix.fasl")
#P"/usr/lib/common-lisp/sbcl/infix/infix.fasl"

Note how the different file types (extensions) have caused it to go to two different places. Cool, huh?

Actually, No. Not very cool at all, when you start trying to actually use them. Let me just explain the rules which govern when you can use logical pathnames without getting very surprised thirty minutes later:

When all of the files will be created by the same Lisp implementation and only ever accessed using that Lisp implementation
and you can name them all using only uppercase letters, digits and the hyphen (-)
and you don't care too much about how they're represented in the underlying file system

That's about it: pretend that you've got a filesystem image loopback mounted at that point that only Lisp can look inside, and your expectations will be approximately correct.

Example: the only reason that it looks like I've accessed lowercase files using this is that (a) lowercase names in LPNs are silently folded to uppercase, (b) the translation process to physical pathnames on Unix does case inversion.

* (translate-logical-pathname #p"cl-library:src;SomeJavaClass.java")
#P"/usr/lib/common-lisp/sbcl/src/somejavaclass.java"

Cool, huh?

⟪ Jul 2002 Sep 2002 ⟫