Tim Bray has started a series of articles about search. So far he’s covered the background of searching, what users want from search engines, and the basics of building a search engine. All great articles, with more to come.
I mention this because one of my summer projects is to build a search engine for this site. I was going to use a fast dynamic form of Latent Semantic Indexing with no real index.
Basically, I was going to generate a vector space based on the query words alone and not the overall dataset (it’s normal to get all the words in every document, create a vector space and then project a query into it). For a small data set, I think my proposed method would work better: providing better spacial efficiency (for sure) and time efficiency (as long as the query and data set were reasonably small).
Of course, this site is still growing. I wouldn’t want to build a search engine only to have to rebuild it in 6 months because the number of posts had grown substantially. So I’m going to wait until I’ve finished reading this series before going ahead with the project.