Monday, September 7, 2009

Singular Value Decomposition (SVD) and Latent Semantic Indexing (LSI) in C#

LSI is used in a variety of information retrieval and text processing applications. It uses uses a mathematical technique called Singular Value Decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text.

Tutorials for LSI and SVD can be found online. I will use Bluebit .Net Matrix Library to calculate SVD and other matrix and vector operations needed in LSI.

I implemented LSI in C# using the example described in Mi Ishlita. The program will display the name of each document and its similarity value. You can use any sorting algorithm to sort the results by the similarity values. Like in the previous example, the program assumes that there is a string array "docs" which contains the query's (at zero index) and the documents' text.

To use the code (below), you need to download and install .NET Matrix Library (NML™) 4.3 and then reference .NET Matrix Library from within your project using the following instructions:
  1. Open the Visual Studio a and open a new Visual Basic or C# project.
  2. Select Project | Add Reference...
  3. Select the .NET tab and find Bluebit .NET Matrix Library listed, double click on it and then press OK.
Here is the code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Bluebit.MatrixLibrary;
using System.Collections;
using System.Text.RegularExpressions;

namespace LSI
{
class Program
{
static double[,] A; //term-document Array
static double[] q; //term-query Array
static List wordlist = new List(); //List of terms found in documents
static SortedDictionary sortedList = new SortedDictionary(); //Documents ranked by VSM with angle value
static string[] docs ={"gold silver truck", //Query
"shipment of gold damaged in a fire",//Doc 1
"delivery of silver arrived in a silver truck", //Doc 2
"shipment of gold arrived in a truck"}; //Doc 3

static void Main(string[] args)
{
createWordList();
createVector();
LatentSemanticIndexing();


foreach (KeyValuePair kvp in sortedList)
{

Console.WriteLine(kvp.Value + " " + kvp.Key);
}
Console.ReadLine();
}


public static void createWordList()
{
foreach (string doc in docs)
{
wordlist = getWordList(wordlist, doc);
}
wordlist.Sort(); //sort the wordlist alphabetically
}

public static List getWordList(List wordlist, string query)
{
Regex exp = new Regex("\\w+", RegexOptions.IgnoreCase);
MatchCollection MCollection = exp.Matches(query);

foreach (Match match in MCollection)
{
if (!wordlist.Contains(match.Value))
{
wordlist.Add(match.Value);
}
}


return wordlist;
}

public static void createVector()
{
double[] queryvector;
q = new double[wordlist.Count];
A = new double[wordlist.Count, docs.Length-1];
for (int j = 0; j < docs.Length; j++)
{
queryvector = new double[wordlist.Count];

for (int i = 0; i < wordlist.Count; i++)
{
//calculate Term Frequency
double tf = getTF(docs[j], wordlist[i]);

if (j == 0) //if Query term then add it to query array
{
q[i] = tf;
}
else //if document term then add it to document array
{
A[i, j-1] = tf;
}
}

}

}

private static void LatentSemanticIndexing()
{
//Singular Value Decomposition
Matrix docMatrix = new Matrix(A);
SVD svd = new SVD(docMatrix);

//A = U S VT
Matrix U = svd.U;
Matrix S = svd.S;
Matrix V = svd.V;
Matrix VT = svd.VH;

//Dimensionality Reduction: Computing Uk, Sk, Vk and VkT
Matrix Uk = new Matrix(U.ToArray(), U.Rows, U.Cols - 1);
Matrix Sk = new Matrix(S.ToArray(), S.Rows - 1, S.Cols - 1);
Matrix Vk = new Matrix(V.ToArray(), V.Rows, V.Cols - 1);
Matrix VkT = Vk.Transpose();


//q = qT Uk Sk-1
Matrix queryMatrix = new Matrix(q,q.Length,1);
queryMatrix = queryMatrix.Transpose() * Uk * Sk.Inverse();

//sim(q, d) = sim(qT Uk Sk-1, dT Uk Sk-1) using cosine similarities
for (int i = 0; i < V.Rows; i++)
{
Vector docVector = Vk.RowVector(i);
Vector queryVector = queryMatrix.RowVector(0);
double sim = Vector.DotProduct(docVector, queryVector) / (docVector.Length * queryVector.Length);

Console.WriteLine("Doc " + (i + 1).ToString() + " :" + sim);
}



}
private static double getTF(string document, string term)
{
string[] queryTerms = Regex.Split(document, "\\s");
double count = 0;


foreach (string t in queryTerms)
{
if (t == term)
{
count++;
}
}
return count;

}


}
}

After you have sorted the results in descending order using the similarity values, your results will be as follows:
d2>d3>d2

40 comments:

sush said...

Hello Mr.Rizvi. I am doing a small project on LSI and came across your post at http://raza-rizvi.blogspot.com/2009/09/singular-value-decomposition-svd-and.html. I started my application on your lines but while using the bluebitlibrary and i am facing problems. I have installed and referenced the dll in my application. When my program reaches where it has to call that dll it gives me an exception which says

"Could not load file or assembly 'Bluebit.MatrixLibrary, Version=4.3.64.0, Culture=neutral, PublicKeyToken=c915370950fbbd9f' or one of its dependencies. An attempt was made to load a program with an incorrect format."

Did you encounter this problem. Please help me in using that dll if you can.

Syed Rizvi said...

Hi Sushma,
There are 2 versions of the BlueBit Matrix Library: 32-bit and 84 bit. I used the 32 bit version of the library and never encountered the problem that you have mentioned.

You are trying to use the 64 bit version of the Bluebit Matrix Library. Try the 32 bit version of the library.

Hope this helps. Let me know if it works.

Syed Rizvi said...

You can get the project file from here

Anonymous said...

Your blog keeps getting better and better! Your older articles are not as good as newer ones you have a lot more creativity and originality now keep it up!

Maqi said...

i am facing some problems
1: if i change your doc's i.e
a: contain no query word
b: contain only 1 query word
c: contain all query words
it gives a negative value to doc 2
will you plz tell me why it give negative value

2: if i give doc's containing no keywords it give me exception in the following line
queryMatrix = queryMatrix.Transpose() * Uk * Sk.Inverse();

Exception:This operation requires a non-singular matrix.

Can u help me in this regard.
thanks in advance

Anonymous said...

hi everybody


just registered and put on my todo list


hopefully this is just what im looking for looks like i have a lot to read.

Anonymous said...

hey


Just saying hello while I read through the posts


hopefully this is just what im looking for looks like i have a lot to read.

Anonymous said...

Unquestionably, I like it, significant and well-founded ideas. I highly recommend you write more fascinating content in your.

Anonymous said...

click [URL=http://jacket-dresses.net/]moncler[/URL] with low price YQnXrzCK [URL=http://jacket-dresses.net/ ] http://jacket-dresses.net/ [/URL]

Anonymous said...

Groom him on the floor or restrain him if he's on a table or in a sink I prefer to believe the Mystics It was extremely difficult to see her lying there anesthetized[url=http://fans-chicago.com/]Brian Urlacher Youth Jersey[/url]
looking like road kill[url=http://giantsofficialjersey.com/]Victor Cruz Authentic Jersey[/url]
so soon after holding Cali in my arms while she was put down
Hades is their "temporary" place of torment until that event happens2 As the cloud passes over it will begin to regroup and come back together without changing the over all regional weather

Anonymous said...


[url=http://shenenmaoyiss.soup.io/][b]sac longchamp[/b][/url]
[url=http://shenenmaoyie.tumblr.com/][b]sac longchamp[/b][/url]
[url=http://shenenmaoyi.livejournal.com/][b]sac longchamp[/b][/url]
[url=http://www.zimbio.com/Shoes+And+Fashion/articles/5BeMlTZZcvm/Articles+l+int+rieur+de+votre+sac+Avez?add=True][b]sac longchamp[/b][/url]
[url=http://shenenmaoyi.bcz.com/2012/12/27/les-nouvelles-tendances-de-sac-a-main/][b]sac longchamp[/b][/url]

Anonymous said...

click to view RAWnFJId [URL=http://www.replica-handbags2013.com/]designer inspired handbags[/URL] with confident edKKnwyg [URL=http://www.replica-handbags2013.com/ ] http://www.replica-handbags2013.com/ [/URL]

Anonymous said...

check KOkcACxn [URL=http://www.louis--vuitton--online--shop.org/]louis vuitton shopping online[/URL] for gift aIDtdkla [URL=http://www.louis--vuitton--online--shop.org/ ] http://www.louis--vuitton--online--shop.org/ [/URL]

Anonymous said...

sell WacqoqHR [URL=http://www.uggs-outlet2013.com/]uggs outlet[/URL] , just clicks away zXZOvVGN [URL=http://www.uggs-outlet2013.com/ ] http://www.uggs-outlet2013.com/ [/URL]

Anonymous said...

We [url=http://www.casinobonus.gd]free casino bonus[/url] have a large library of unqualifiedly freed casino games championing you to play opportunely here in your browser. Whether you pine for to training a provisions encounter master plan or just attempt exposed a some modern slots once playing seeking genuine money, we be undergoing you covered. These are the claim still and all games that you can play at true online casinos and you can part of them all quest of free.

Anonymous said...

top [url=http://www.c-online-casino.co.uk/]uk casino[/url] check the latest [url=http://www.casinolasvegass.com/]online casino[/url] autonomous no store hand-out at the best [url=http://www.baywatchcasino.com/]free gratuity casino
[/url].

Anonymous said...

cheap tramadol tramadol drug interactions - tramadol hcl 60 mg

Anonymous said...

xanax online xanax 1mg get high - 2mg xanax and 2 beers

Anonymous said...

buy tramadol online tramadol with acetaminophen dosage - tramadol 50 mg contraindications

Anonymous said...

generic xanax xanax overdose torn - drug stronger xanax valium

Anonymous said...

tramadol online overnight pet meds tramadol no prescription - tramadol 50 mg 100

Anonymous said...

tramadol generic tramadol legal to order online - buy tramadol online no prescription

Anonymous said...

buy tramadol online tramadol 50 mg 4 times a day - order tramadol from usa

Anonymous said...

cheap tramadol online buy tramadol online us - buy-tramadol-overnight

Anonymous said...

cheap xanax no prescription pill report for xanax - xanax bars song

Anonymous said...

buy tramadol online 300 mg tramadol overdose - tramadol for dogs best price

Anonymous said...

buy cialis online safe place buy cialis online - cialis testimonials

Anonymous said...

cialis online cialis coupon voucher - buy cialis professional 20 mg

Anonymous said...

cialis online cheap cialis prices us - cialis daily new zealand

Anonymous said...

buy tramadol tramadol dosage 150 mg - tramadol and addiction

Anonymous said...

tramadol online tramadol online american express - where to buy tramadol cheap

Anonymous said...

buy tramadol overnight cod tramadol high like vicodin - buy tramadol rx online

Anonymous said...

http://landvoicelearning.com/#51438 tramadol normal dosage - tramadol hcl 50 mg ingredients

Anonymous said...

http://buytramadolonlinecool.com/#63102 getting off tramadol withdrawal - tramadol extended release cost

Anonymous said...

buy tramadol online buy tramadol online usa - tramadol overdose antidote

Anonymous said...

http://ranchodelastortugas.com/#72895 1mg of xanax high - xanax urine test how long

Anonymous said...

buy tramadol online order tramadol online with mastercard - tramadol buy online usa

Anonymous said...

http://www.achildsplace.org/banners/tramadolonline/#2003 buy tramadol online no prescription cheap - buy tramadol cod online

yanmaneee said...

curry 4
adidas nmd r1
mbt shoes
fitflops sale clearance
canada goose
air max 270
air force 1
nike air max 270
hermes
nike air max 270

shasoo said...

buy replica bags online h93 f6s40h7t55 high quality designer replica n20 r4g87h3k48 best replica bags online v35 i6r37t7y88