I implemented the model in C# using the example described in Mi Islita. The program will display elements of a dictionary object that hold the name of each document and its similarity value.
You can use any sorting algorithm to sort the dictionary by the similarity values. The program also assumes that there is a string array "docs" which contains the query's (at zero index) and the documents' text.
Here is the code:
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
using System.Collections;
using System.Linq;
namespace VectorSpaceModel
{
class Program
{
static Hashtable DTVector = new Hashtable(); //Hashtable to hold Document Term Vector
static List
static Dictionary
static string[] docs ={"gold silver truck", //Query
"shipment of gold damaged in a fire",//Doc 1
"delivery of silver arrived in a silver truck", //Doc 2
"shipment of gold arrived in a truck"}; //Doc 3
static void Main(string[] args)
{
createWordList();
createVector();
classify();
var dict = sortedList;
foreach (var x in dict.Reverse())
{
Console.WriteLine("{0} -> Doc{1}", x.Key, x.Value);
}
Console.ReadLine();
}
public static void createWordList()
{
foreach (string doc in docs)
{
wordlist = getWordList(wordlist, doc);
}
}
public static List
{
Regex exp = new Regex("\\w+", RegexOptions.IgnoreCase);
MatchCollection MCollection = exp.Matches(query);
foreach (Match match in MCollection)
{
if (!wordlist.Contains(match.Value))
{
wordlist.Add(match.Value);
}
}
return wordlist;
}
public static void createVector()
{
double[] queryvector;
for (int j = 0; j < docs.Length; j++)
{
queryvector = new double[wordlist.Count];
for (int i = 0; i < wordlist.Count; i++)
{
double tfIDF = getTF(docs[j], wordlist[i]) * getIDF(wordlist[i]);
queryvector[i] = tfIDF;
}
if (j == 0) //is it a query?
{
DTVector.Add("Query", queryvector);
}
else
{
DTVector.Add(j.ToString(), queryvector);
}
}
}
public static void classify()
{
double temp = 0.0;
IDictionaryEnumerator _enumerator = DTVector.GetEnumerator();
double[] queryvector = new double[wordlist.Count];
Array.Copy((double[])DTVector["Query"], queryvector, wordlist.Count);
while (_enumerator.MoveNext())
{
if (_enumerator.Key.ToString() != "Query")
{
temp = cosinetheta(queryvector, (double[])_enumerator.Value);
sortedList.Add(temp, _enumerator.Key.ToString());
}
}
}
public static double dotproduct(double[] v1, double[] v2)
{
double product = 0.0;
if (v1.Length == v2.Length)
{
for (int i = 0; i < v1.Length; i++)
{
product += v1[i] * v2[i];
}
}
return product;
}
public static double vectorlength(double[] vector)
{
double length = 0.0;
for (int i = 0; i < vector.Length; i++)
{
length += Math.Pow(vector[i], 2);
}
return Math.Sqrt(length);
}
private static double getTF(string document, string term)
{
string[] queryTerms = Regex.Split(document, "\\s");
double count = 0;
foreach (string t in queryTerms)
{
if (t == term)
{
count++;
}
}
return count;
}
private static double getIDF(string term)
{
double df = 0.0;
//get term frequency of all of the sentences except for the query
for (int i = 1; i < docs.Length; i++)
{
if (docs[i].Contains(term))
{
df++;
}
}
//Get sentence count
double D = docs.Length - 1; //excluding the query
double IDF = 0.0;
if (df > 0)
{
IDF = Math.Log(D / df);
}
return IDF;
}
public static double cosinetheta(double[] v1, double[] v2)
{
double lengthV1 = vectorlength(v1);
double lengthV2 = vectorlength(v2);
double dotprod = dotproduct(v1, v2);
return dotprod / (lengthV1 * lengthV2);
}
}
}
17 comments:
Hey There. I discovered your weblog using msn. That is a very smartly written article.
I'll make sure to bookmark it and return to learn more of your helpful info. Thanks for the post. I'll definitely return.
Also visit my web site :: ideal waist to hip ratio
' Vegetables ѕhould make uρ a large part of yоur diet
if yߋu are trүing tο lose weight. The scientific community fօund that Garcinia Cambogia Һad ɑll thesе fat-busting qualities.
Sߋme drugs prevent absorption οf fat and glucose fгom intestine,
ѕo that some fat is excreted.
Мy blog ... วิธีลดน้ําหนักเร่งด่วน
People do not do enough to retain their hair healthy. Nonetheless, they're close enough to chickens and a number
of the images appear to be chickens. You'll have it as a topic
for the birthday party.
Review my web blog :: coloring pages xmas tree
The news breaking bad supporters have been waiting on for months
has finally been uncovered. We would like benefit; great effects, such as for instance, great storytelling, and identity advancement.
my homepage; http://movie2k-movie4k.webstarts.com/
This is the perfect wallet for "clubbing" or hiking when you don't want to hassle with a purse.
The Apple boot emblem will translate into a Skull emblem
during boot. It can be downloaded by opening Cydia and lookup for the Winterboard.
Also visit my page: ifunbox ()
Making money is not guaranteed with any opportunity, but it is
possible. Brainstorm ways to make your life experience work for you.
When it comes to ways to earn extra money
online you best believe that there are ways out there.
My site: ways to make extra money
That way you can choose the ones that can't get to you when you need
them. Quite often they are manufactured by the original manufacturer.
Other times, body shop mechanics will ensure you that they only
use OEM body parts, but you still need to be diligent and examine the final work.
Feel free to surf to my web blog; advance auto parts
eхcellent issues altogether, you smply received a brand nnew readеr.
What might you recommend in egards to your submit that you mde some days in the past?
Any certain?
Here iѕ my paցe :: Pain relief
Its lіke you read my mind! You ѕeеm tto know sso much аbout thiѕ, like you wrote
the book in it oг something.I think that you can do with
a few pics to drivе the meѕsаge home a little bit, but instead of that,
this is excellent blog. A great read. I wipl certainly be
back.
Review my homepage; living with Rheumatoid Arthritis
Nice!!!! Many thanks for providing this important information
cctv camera in Jaipur
شركة مكافحة حشرات بالدمام
شركة تسليك مجاري بالدمام
شركة تنظيف بالدمام
شركة تنظيف واجهات زجاجية بالدمام
شركة كشف تسربات المياه بالدمام
شركة تنظيف خزانات بالدمام
شركة تنظيف كنب بالدمام
شركة مكافحة حشرات بالدمام
شركة تسليك مجاري بالدمام
شركة تنظيف بالدمام
شركة تنظيف واجهات زجاجية بالدمام
شركة كشف تسربات المياه بالدمام
شركة تنظيف خزانات بالدمام
شركة تنظيف كنب بالدمام
United Guard is top security services in mumbai
شلركة المثالى بروكر لخدمات المكافحة
شركة رش مبيدات بالدمام
شركة رش مبيدات بالقطيف
شركة مكافحة حشرات بالخبر
شركة مكافحة حشرات بالجبيل
شركة مكافحة حشرات بالاحساء
شركة مكافحة حشرات بالقطيف
cheap jordans
retro jordans
cheap jordans
lebron 16
jordan 13
yeezy boost 350 v2
air yeezy
stephen curry shoes
fila
jordan shoes
زيت الحشيش الافغاني
ماهي رائحة زيت الحشيش الاصلي
زيت الحشيش للشنب
تجربتي زيت الحشيش للذقن
زيت الحشيش الاصلي للشعر
كم سعر زيت الحشيش
سعر زيت الحشيش
زيت الحشيش الافغاني الاصلي
لون زيت الحشيش الاصلي
تجارب زيت الحشيش
arabic wedding planner in dubai
destination wedding planner in dubai
شركة تنظيف في الفجيرة
شركة مكافحة الصراصير دبي
شركة تنظيف في عجمان
شركة تنظيف في ابوظبي
شركة تنظيف خزانات ابوظبي
شركة تنظيف منازل راس الخيمة
شركة تنظيف منازل الشارقة
Post a Comment