This annoys me -#
Fri, 28 Feb 2003 04:13:28 +0000
This annoys me -
68.74.156.60 - - [28/Feb/2003:02:08:44 +0000] "POST /edit/db-sockets HTTP/1.1" 200 164 "-" "websphinx.Crawler" www.cliki.net "-" 0 seconds 68.74.156.60 - - [28/Feb/2003:02:09:21 +0000] "POST /edit/Graphics%20Toolkit HTTP/1.1" 200 178 "-" "websphinx.Crawler" www.cliki.net "-" 1 seconds 68.74.156.60 - - [28/Feb/2003:02:09:48 +0000] "POST /edit/CORBA HTTP/1.1" 200 154 "-" "websphinx.Crawler" www.cliki.net "-" 1 seconds
This lame excuse for a web crawler is editing cliki pages. Automatically. Specifically, it's stripping out carriage returns from the body, which as you can imagine makes it difficult to see where the paragraph breaks are.
I don't know if the fault here is with WebSPHINX itself or someone's custom crawler based on it, but if people are going to write software that doesn't actually work, they should not let it out on the internet. Really.
Oh, and I might also point out that the pages in question contain
<meta NAME="ROBOTS" CONTENT="noindex,nofollow"></meta>
in the head element, so robots really have no business being there anyway.