A good explanation of query likelihood can be found in the Manning's Introduction to Information Retrieval book. I used the example mentioned in the "Language models for information retrieval" chapter. Here is the excerpt from that chapter:
P(q) is the same for all documents, and so can be ignored. The prior probability of a document P(d) is often treated as uniform across all d and so it can also be ignored.
In this model, Zero probabilities of words can be a problem. This can be solved by smoothing. Smoothing not only avoids zero probability but also it provides a term weighting component. More information about smoothing can be found in Manning's book.
I implemented the query likelihood model in python. I used BeautifulSoap to parse the crawled urls. You can download the python code from here. Once you have downloaded the code you can then use the code as follows:
1. import LangModel
2. Define a list of urls you want to search:
3. Call the crawl function to crawl the list:
4. Type the query you want to search and press the return key
5. The output will show you a ranked list of urls and the score of each according to query likelihood model.
6. You can search the list again by using the search function:
That is it. Another application of Bayes' Theorem!
Stay tuned for more python code :)