Happenings

The Bot Wars

There’s a few discussions going on just now about boycotting the new MSN crawler, MSN bot. There are those who say that we have to stop Microsoft from implementing a reasonable search engine (for those that don’t know, blogs are incredibly powerful tools in influencing search engines, for various reasons that I won’t go into now), and there are those who say that such an effort is futile and childish.

I haven’t picked a side yet, but I’m swayed towards the blocking crowd. After Microsofts recent announcement that it would cease all stand-alone development of IE, web developers are now very much in a locked in situation: we have to continue supporting a browser with terrible standards support and a limited feature set.

Essentially, Microsoft threw its weight behind the browser market, won the war there and then refused to make any progress; screwing a lot of people over.

Now, Microsoft is throwing its weight behind the search market. In typical style for the corporation, we can expect to see their search engine embedded in every program they can squeeze it into. This could be a good thing; integrated search that works could bring a lot of consistency to the search sphere.

Most likely, it’ll be a bad thing. They’ll implement a reasonable search engine, use various tactics to corner the market, lock it up, and then refuse to progress (again). It’s not that hard to imagine, and it would screw a lot of people over (seeing a theme yet?).

I’ve been researching a lot to do with searching recently, and I know there’s a lot of great ideas out there that should be done. Giving the search sphere to Microsoft would cripple those ideas.

You know what, in the process of writing this I have convinced myself what side I’m on.

Referring For Refer

This is just a quick nod to Dean Allen for his great work on Refer. A new version is out now, and well worth getting if your host doesn’t provide adequate stats.

Writing Link Text Well

I’d like to talk about writing link text well, specifically in the context of a blog. Most blogs are centred around links: blog A links to blog B, blog B links to C etc. Without that important little bit of navigation, the blogosphere would fall apart. It is, in essence, our glue.

The problem is that so many people write link text so badly that it’s difficult for the user to know where they end up. If usability studies have shown us anything it’s that the average user is fundamentally scared of the unknown. They will not use anything that they don’t have a decent understanding of; whether it be a command in a menu or following links on the web. They just won’t do it.

Now, if people aren’t following your links around the web (as is the intention of actually providing a link) then something has gone horribly wrong. They’re either not interested or are too confused by the text that they don’t want to follow it. The former you can do little about, but the latter is another matter.

First thing you need to know is how to evaluate link text to see if it is good enough. The best way of doing this is to remove it from the context you provide. Cut and paste your link text into another file (just the text, not the link itself). If you were provided with just this text and not the rest of the document, would you know what to expect upon clicking it? If you can honestly say yes, then your link text is good enough. Congratulations.

If you can ever answer no to that question, then you have a problem: your link text is not clear enough. Worry not, though, there are some very easy solutions.

First of all is the widely used technical solution: add a title attribute to your links. The text included in the title should say exactly what the link refers to. If this ends up as vague as your link text, then it’s useless. In fact, it’s distracting; you’re just giving your users more unknowns and that, as we know, will stop them clicking. An example:

the link text <a href="http://example.com" title="a site containing examples of bad link text">here</a> is terrible

Here we have a very poor choice of link text: the word “here”. What does “here” mean to anyone? It’s subjective, temporal and provides a plethora of unknown possibilities. We have, however, added a title attribute that says what the user should expect when they click. This helps a lot. The user might just click after all.

Ok, that last example is at the extreme end of bad text, but it does happen (a lot).

The second solution to the problem (and one that should be considered far more than it currently is) would be to rewrite the link text and, if necessary, the surrounding text. Now, I can hear a hundred writers complain, “But these are my words! I can’t change them to suit linking.” Well, why the hell not? Links are blogging glue. If changing a whole paragraph is necessary to let the links make sense out of context, then so be it. They are just as important, if not more so, than the commentary that surrounds them.

Think about it this way: the commentary would be utterly pointless without the links. We are generally commenting on the link content, so if the link content isn’t being viewed because of bad link text then it throughs the context and focus of the commentary way out.

Link text is important. We need it. Make your text count.

List Of Stuff To Do

I’ve spent a little bit of tonight thinking about all of the stuff that I need to do to bring my CMS to a reasonable level. I’ve spent the past while doing lots of minor updates to avoid thinking about the big parts. I’ve created a list of all the things that need done (in no particular order).

Most of this will be of absolutely no interest to anyone, but if I have this in public and update it frequently, I’ll probably work on it more often. So, without further ado, here it is:

  • Comments
  • Trackback
  • Pingback
  • New data structure (FinData)
  • Alpha indexing
  • Categories (pointers)
  • Search
  • Caching archives
  • New permalinks (/Archives/Title)
  • Genericised RSS module
  • Resolve relative links
  • Modularising archive code
  • Event handlers
  • Disentangle Add/edit components from ContentManager
  • Add API support (primarily bloggerAPI)
  • Change archive code to use a single DateString variable (rather than day/month.year)
  • Let every page (or listings thereof) be available as RSS or other formats
  • Data URI permalinks
  • Add an email form
  • Auto Convert URLS to links
  • Update meta-links
  • User Management (add users, user levels)
  • Move no. of items per page to config
  • Redo content manager
  • Get edit and delete listings to actually work beyond first page
  • Get acronymit definitions done externally

Like I said, of very little interest or sense to anyone but me, but motivational nonetheless. Go about your business.

Searching Done Right

Tim Bray has started a series of articles about search. So far he’s covered the background of searching, what users want from search engines, and the basics of building a search engine. All great articles, with more to come.

I mention this because one of my summer projects is to build a search engine for this site. I was going to use a fast dynamic form of Latent Semantic Indexing with no real index.

Basically, I was going to generate a vector space based on the query words alone and not the overall dataset (it’s normal to get all the words in every document, create a vector space and then project a query into it). For a small data set, I think my proposed method would work better: providing better spacial efficiency (for sure) and time efficiency (as long as the query and data set were reasonably small).

Of course, this site is still growing. I wouldn’t want to build a search engine only to have to rebuild it in 6 months because the number of posts had grown substantially. So I’m going to wait until I’ve finished reading this series before going ahead with the project.