Politeness has long been a problem for automated agents on the web. Well, it’s not so much that politeness is a problem; agents are programmed to be impolite, whether accidently or not. Politeness is a solved problem.
Let us be clear on what is meant. “Agents” refers to any automated system capable of talking to web servers, i.e. search engines, aggregators, download managers etc. “Polite” means waiting a while before checking a server again.
- If a download manager hammers a file on my site 50 times a minute unsuccessfully, that’s impolite,
- If a search agent crawls my entire site, doing so again 4 hours later is impolite. Doing so again every 4 hours is downright rude,
- If an aggregator agent checks my RSS feeds more than 3 times an hour, that’s impolite.
How can we solve these problems? As was said earlier, politeness is a solved problem.
However, the current solutions are poor. In all 3 situations above, the user of these agents would be banned from crawling the site (either automatically or after manual inspection of logs). “Good riddance,” you might be thinking, “no good bandwidth leech, I’m better of without them.”
Have you considered that the user doesn’t know any better? They don’t know they’re being impolite because so few tools actually tell them.
Download managers: they’ll let you pick any number of connections and any time interval between retries. Would it kill the implementer to bring up a dialogue saying: “This could cause a heavy load on a server, causing you to be banned. Are you sure you want to change settings?”
Search agents: Is it so hard to say: “The time interval you’ve specified between crawls is impolite, and may get you banned and blacklisted. Are you sure you want to keep new settings?”
Aggregator agents: A similar time interval warning to search agents would suffice. As an aside, there is a mechanism in RSS that allows the producer of the feed to specify a recommended time interval between checks. Do any aggregators support it? Do they tell the end user if they’re in breach of it? Hmmm? Thought not.
If a user doesn’t heed the warnings, ban them.
This puts the pressure on the tool makers to tell users of bad behaviour. And why shouldn’t it be? They should keep their users informed; it’s only polite.