Traditionally, a search engine returned a set of documents containing your exact query words, with little attempt made to find related pages. So a search for â€˜car insuranceâ€™ wouldnâ€™t return pages containing the words â€˜motor insuranceâ€™.
Enter â€“ Latent Semantic Indexing. LSI attempts to map a relationship between every word or phrase in the collection of documents, based on their statistical proximity to each other and interconnectivity. In our â€˜car insuranceâ€™ and â€˜motor insuranceâ€™ example, LSI would know that both these phrases often appear on the same page, or pages about â€˜car insuranceâ€™ are often linked to â€˜motor insuranceâ€™ and visa-versa. It also knows that â€˜carâ€™ and â€˜insuranceâ€™ occur together. So itâ€™s fairly straightforward for LSI to find pages that are talking about the same subject, but using different languages.
LSI is an extremely powerful method when applied to the Web, because, unlike other techniques where relationships between words and phrases are mapped according to their dictionary definitions, LSI simply maps relationships based on how they appear in the collection of document in the index.
To the user â€“
It means your query language can be more generic, making it easier to find a wider set of documents matching your query â€“ more choice.
It also means, as far as a search engine using LSI is concerned, the relationship between words are no longer defined by their dictionary meaning, but by their common usage on the web. Some believe this is a better way to derive meaning as language should be dynamic.