November 19, 2004 | Category:

Bloglines Crawler

Up until fairly recently, I didn’t use an RSS aggregator. Using my own bandwidth to check so many sites was just hammering my connection, and starting and stopping the application when I needed to reserve the bandwidth was just a pain.

Enter Bloglines. Yes, I’m late to the party. People have been praising it for months now but I only very recently saw the light. It is an excellent tool: allowing me to check sites quickly when I want to (and not using up much of my bandwidth), keeping unread items up to date no matter which computer I’m using, and making me far more productive.

Before Bloglines, I was struggling to keep up with about 50 sites regularly. I now easily keep up with around 120. Thoroughly recommended.

Now, I checked my server logs for the first time in months tonight and noticed that a single host had hit my site 2000 times this week. That’s a hit every 5 minutes from one entity. Investigating a little further made it clear that the one entity was an aggregator: the bloglines aggregator.

Now, Solitude is not a high-throughput site. I attempted to update once a day, but it’s usually more like once every two days. In the last week, there have been 3 updates (this being the 4th).

Think about that: 3 updates, 2000 checks. 3. 2000 checks. Notice the ever so slight disparity?

Those Bloglines guys make a very usable interface to a damn fine service, but they really need to work on the crawler updating logic. It’s not that hard to extrapolate predictable update patterns. If a site is updating every 15 minutes, check it every 15 minutes. If it slows down and stays at once every day, 15 minutes is probably very inappropriate. Once an hour would be better. You don’t less any real sense of freshness and you don’t over do server hits.

Common sense and the polite thing to do.