latent semantic indexing

Traditionally, a search engine returned a set of documents containing your exact query words, with little attempt made to find related pages. So a search for ‘car insurance’ wouldn’t return pages containing the words ‘motor insurance’.

Enter – Latent Semantic Indexing. LSI attempts to map a relationship between every word or phrase in the collection of documents, based on their statistical proximity to each other and interconnectivity. In our ‘car insurance’ and ‘motor insurance’ example, LSI would know that both these phrases often appear on the same page, or pages about ‘car insurance’ are often linked to ‘motor insurance’ and visa-versa. It also knows that ‘car’ and ‘insurance’ occur together. So it’s fairly straightforward for LSI to find pages that are talking about the same subject, but using different languages.

LSI is an extremely powerful method when applied to the Web, because, unlike other techniques where relationships between words and phrases are mapped according to their dictionary definitions, LSI simply maps relationships based on how they appear in the collection of document in the index.

To the user –
It means your query language can be more generic, making it easier to find a wider set of documents matching your query – more choice.

It also means, as far as a search engine using LSI is concerned, the relationship between words are no longer defined by their dictionary meaning, but by their common usage on the web. Some believe this is a better way to derive meaning as language should be dynamic.

Leave a Reply

Your email address will not be published. Required fields are marked *