Happenings

Politeness And Agents

Politeness has long been a problem for automated agents on the web. Well, it’s not so much that politeness is a problem; agents are programmed to be impolite, whether accidently or not. Politeness is a solved problem.

Let us be clear on what is meant. “Agents” refers to any automated system capable of talking to web servers, i.e. search engines, aggregators, download managers etc. “Polite” means waiting a while before checking a server again.

  1. If a download manager hammers a file on my site 50 times a minute unsuccessfully, that’s impolite,
  2. If a search agent crawls my entire site, doing so again 4 hours later is impolite. Doing so again every 4 hours is downright rude,
  3. If an aggregator agent checks my RSS feeds more than 3 times an hour, that’s impolite.

How can we solve these problems? As was said earlier, politeness is a solved problem.

However, the current solutions are poor. In all 3 situations above, the user of these agents would be banned from crawling the site (either automatically or after manual inspection of logs). “Good riddance,” you might be thinking, “no good bandwidth leech, I’m better of without them.”

Have you considered that the user doesn’t know any better? They don’t know they’re being impolite because so few tools actually tell them.

Download managers: they’ll let you pick any number of connections and any time interval between retries. Would it kill the implementer to bring up a dialogue saying: “This could cause a heavy load on a server, causing you to be banned. Are you sure you want to change settings?”

Search agents: Is it so hard to say: “The time interval you’ve specified between crawls is impolite, and may get you banned and blacklisted. Are you sure you want to keep new settings?”

Aggregator agents: A similar time interval warning to search agents would suffice. As an aside, there is a mechanism in RSS that allows the producer of the feed to specify a recommended time interval between checks. Do any aggregators support it? Do they tell the end user if they’re in breach of it? Hmmm? Thought not.

If a user doesn’t heed the warnings, ban them.

This puts the pressure on the tool makers to tell users of bad behaviour. And why shouldn’t it be? They should keep their users informed; it’s only polite.

Merry Christmas

There are 2 hours or so until Christmas, so I thought I’d better say Merry Christmas to everyone who has ever read this site, to everyone who has left a comment (from some of the comments, it’s apparent that there are people who comment without reading), to all of my friends and family, and those that matter most to me: Merry Christmas!

As a bonus, have fun with this fun Make A Snowflake game. There are some jaw-dropping designs in the gallery.

Update: Thanks to the mighty Feedster, you can feel the christmas spirit on a bunch of sites you’ve never been to.

SWAT

It’s been a while since I left the cinema quite so happy. It wasn’t because SWAT is a good, well-written film. Quite the opposite.

It is utterly terrible; bad in so many ways as to be hilariously good.

No-one will ever beat the character names in this film: Jim Street, the top swat member who gets kicked off the team following an incident with his risk taking partner, Gamble. He’s later recruited to a new “ass-kicking” swat team by Hondo Harrelson, along with other fantastically monikered characters such as TJ McCabe and Sanchez (the latino).

The dialogue is straight out of an episode of McGarnacle (the fictional show in The Simpsons), with a plot and stereotyping to match; with classic lines like, “You feel me, chief”.

I loved every cheesey, pathetic minute of it.

Arial, Tolkien, And English

The second part of yesterdays link-o-rama:

  • The Scourge Of Arial – The history of Arial, and its borrowing of Helvetica,
  • How To Spot Arial – How to tell Helvetica and Arial apart. For obsessives and typographers only,
  • 4096 Colour Wheel – A colour wheel that demonstrates the difference between websafe, websmart and pure hex colours. The difference is quite pronounced,
  • Gameboy Advance Car Tuner – Turn your Gameboy Advance into a logger for an engine tuner. People go to some stupid lengths to show off,
  • MUTE file sharing – A new file sharing program that works anonymously. Although, I very much doubt this will last,
  • Link blogs in PHP – Use link blogs provided by del.icio.us in a PHP powered site. I’ll be looking into this,
  • Enemy Of Progress – An alternative view of Lord Of The Rings, with a discussion of Romaticism versus progressive society. Kinda turns the world on its head,
  • XFN – A way of creating a semantic network of friends using standard HTML mechanisms. Way too much work, I’d rather do this implicitly,
  • Illuminopoly – Alternative rules for Monopoly that focus on mind control. Seems quite interesting. I particularly like the various winning conditions,
  • Rise Of The Spammers – They’re getting very serious. Spamming is big business these days,
  • Plain English Campaign – A guide to using English in a formal, yet accessible way.

And that’s that for a few days. I’ll be back before Christmas.

Games, Astronomy, And Arial

Since I’ve been unable to update this week, the random links have amassed into a huge pile. To make sure you’re not completely overwhelmed by them (there are around 30), I’ll prune them a bit and split it into two posts. The other half should be out either tomorrow morning or Monday night. Onwards:

  • Snowfight 3D – Following on from the classic Starcraft clone, Snowfight, comes the 3D version. Requires Shockwave, but damn good,
  • Fan And Ball – The rather tricky game of moving a ball along a track using a fan. Also requires Shockwave,
  • Simpson’s Paradox – An odd bit of maths that occurs due to weighting differences,
  • Atlas Of The Universe – A map showing the major formations within the universe, centred around our solar system,
  • RSS lightcone – Following on from the previous, get an RSS feed of the major astronomical bodies entering your lightcone. Only 13 months until HR8832 enters my lightcone and I can be blamed for stuff over there,
  • Non-Semantic Semantics – In a genuinely ironic move, the European Semantic Web Symposium’s website is pretty far from semantic. It uses tables for layout and, painfully, uses images instead of text for every single word. Via Zeldman,
  • PHP highest scripting language – Although, as Simon Willison notes, Python is highest on a search for programming language, on a google search for scriping language it takes second place to PHP,
  • Bluestumbler – Information on the horrible insecurity found on most bluetooth enabled devices; including mobiles,
  • Computerman – I wish I had broadband so I could see Jack Black’s new show.

Next lot will be coming soon.