A good explanation of query likelihood can be found in the Manning's Introduction to Information Retrieval book. I used the example mentioned in the "Language models for information retrieval" chapter. Here is the excerpt from that chapter:
In the query likelihood model, we construct from each document d in the collection a language model Md. We rank documents by P(d | q), where the probability of a document is interpreted as the likelihood that it is relevant to the query. Using Bayes rule , we have:
P(d | q) = P(q | d) P(d) / P(q)
P(q) is the same for all documents, and so can be ignored. The prior probability of a document P(d) is often treated as uniform across all d and so it can also be ignored.
In this model, Zero probabilities of words can be a problem. This can be solved by smoothing. Smoothing not only avoids zero probability but also it provides a term weighting component. More information about smoothing can be found in Manning's book.
I implemented the query likelihood model in python. I used BeautifulSoap to parse the crawled urls. You can download the python code from here. Once you have downloaded the code you can then use the code as follows:
1. import LangModel
2. Define a list of urls you want to search:
u=['http://cnn.com','http://techcrunch.com']
3. Call the crawl function to crawl the list:
LangModel.crawl(u)
4. Type the query you want to search and press the return key
5. The output will show you a ranked list of urls and the score of each according to query likelihood model.
6. You can search the list again by using the search function:
LangModel.search('your query')
That is it. Another application of Bayes' Theorem!
Stay tuned for more python code :)