Mining the Web

Tuesday, November 16, 2010

Using Language Models for Information Retrieval (Python)

›
Continuing my regard for Bayes' Theorem , I decided to write a small python program that will crawl a list of urls and then will allow t...
193 comments:
Sunday, November 14, 2010

Bayes' theorem : A Love Story

›
A few days ago, I saw the video of Hilary Mason presenting the history of machine learning . As far as I know, she has covered the signifi...
13 comments:
Sunday, October 17, 2010

Text Classification using Naive Bayes Classifier

›
I received some emails related to my spam filter post . Some of them asked me to submit a code related to it. A very simple implementation o...
2 comments:
Sunday, September 26, 2010

DT-Tree: A Semantic Representation of Scientific Papers

›
This year I have been really busy working on some research projects, hence the delay in blog posts. Recently, one of my research works got a...
1 comment:
Saturday, March 20, 2010

Creating Spam Filter using Naive Bayes Classifier

›
Few months ago I gave a lecture to CS students about data mining. I decided to show how a spam filter can be built using simple data mining ...
80 comments:
Friday, January 29, 2010

Parsing Robots.txt File

›
Crawling is an essential part of search engine development. The more sites a search engine crawls, the bigger its index will be. However, a ...
8 comments:
Monday, November 23, 2009

Duplicate Detection using MD5 and Jaccard Coefficient in C#

›
Duplicate documents are significant issue in context of the Web. Detecting near duplicate documents in a large data set, like the web, is a ...
6 comments:
›
Home
View web version

About Me

My photo
Syed Rizvi
View my complete profile
Powered by Blogger.