Monday, September 7, 2009

Singular Value Decomposition (SVD) and Latent Semantic Indexing (LSI) in C#

LSI is used in a variety of information retrieval and text processing applications. It uses uses a mathematical technique called Singular Value Decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text.

Tutorials for LSI and SVD can be found online. I will use Bluebit .Net Matrix Library to calculate SVD and other matrix and vector operations needed in LSI.

I implemented LSI in C# using the example described in Mi Ishlita. The program will display the name of each document and its similarity value. You can use any sorting algorithm to sort the results by the similarity values. Like in the previous example, the program assumes that there is a string array "docs" which contains the query's (at zero index) and the documents' text.

To use the code (below), you need to download and install .NET Matrix Library (NML™) 4.3 and then reference .NET Matrix Library from within your project using the following instructions:
  1. Open the Visual Studio a and open a new Visual Basic or C# project.
  2. Select Project | Add Reference...
  3. Select the .NET tab and find Bluebit .NET Matrix Library listed, double click on it and then press OK.
Here is the code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Bluebit.MatrixLibrary;
using System.Collections;
using System.Text.RegularExpressions;

namespace LSI
{
class Program
{
static double[,] A; //term-document Array
static double[] q; //term-query Array
static List wordlist = new List(); //List of terms found in documents
static SortedDictionary sortedList = new SortedDictionary(); //Documents ranked by VSM with angle value
static string[] docs ={"gold silver truck", //Query
"shipment of gold damaged in a fire",//Doc 1
"delivery of silver arrived in a silver truck", //Doc 2
"shipment of gold arrived in a truck"}; //Doc 3

static void Main(string[] args)
{
createWordList();
createVector();
LatentSemanticIndexing();


foreach (KeyValuePair kvp in sortedList)
{

Console.WriteLine(kvp.Value + " " + kvp.Key);
}
Console.ReadLine();
}


public static void createWordList()
{
foreach (string doc in docs)
{
wordlist = getWordList(wordlist, doc);
}
wordlist.Sort(); //sort the wordlist alphabetically
}

public static List getWordList(List wordlist, string query)
{
Regex exp = new Regex("\\w+", RegexOptions.IgnoreCase);
MatchCollection MCollection = exp.Matches(query);

foreach (Match match in MCollection)
{
if (!wordlist.Contains(match.Value))
{
wordlist.Add(match.Value);
}
}


return wordlist;
}

public static void createVector()
{
double[] queryvector;
q = new double[wordlist.Count];
A = new double[wordlist.Count, docs.Length-1];
for (int j = 0; j < docs.Length; j++)
{
queryvector = new double[wordlist.Count];

for (int i = 0; i < wordlist.Count; i++)
{
//calculate Term Frequency
double tf = getTF(docs[j], wordlist[i]);

if (j == 0) //if Query term then add it to query array
{
q[i] = tf;
}
else //if document term then add it to document array
{
A[i, j-1] = tf;
}
}

}

}

private static void LatentSemanticIndexing()
{
//Singular Value Decomposition
Matrix docMatrix = new Matrix(A);
SVD svd = new SVD(docMatrix);

//A = U S VT
Matrix U = svd.U;
Matrix S = svd.S;
Matrix V = svd.V;
Matrix VT = svd.VH;

//Dimensionality Reduction: Computing Uk, Sk, Vk and VkT
Matrix Uk = new Matrix(U.ToArray(), U.Rows, U.Cols - 1);
Matrix Sk = new Matrix(S.ToArray(), S.Rows - 1, S.Cols - 1);
Matrix Vk = new Matrix(V.ToArray(), V.Rows, V.Cols - 1);
Matrix VkT = Vk.Transpose();


//q = qT Uk Sk-1
Matrix queryMatrix = new Matrix(q,q.Length,1);
queryMatrix = queryMatrix.Transpose() * Uk * Sk.Inverse();

//sim(q, d) = sim(qT Uk Sk-1, dT Uk Sk-1) using cosine similarities
for (int i = 0; i < V.Rows; i++)
{
Vector docVector = Vk.RowVector(i);
Vector queryVector = queryMatrix.RowVector(0);
double sim = Vector.DotProduct(docVector, queryVector) / (docVector.Length * queryVector.Length);

Console.WriteLine("Doc " + (i + 1).ToString() + " :" + sim);
}



}
private static double getTF(string document, string term)
{
string[] queryTerms = Regex.Split(document, "\\s");
double count = 0;


foreach (string t in queryTerms)
{
if (t == term)
{
count++;
}
}
return count;

}


}
}

After you have sorted the results in descending order using the similarity values, your results will be as follows:
d2>d3>d2

40 comments:

  1. Hello Mr.Rizvi. I am doing a small project on LSI and came across your post at http://raza-rizvi.blogspot.com/2009/09/singular-value-decomposition-svd-and.html. I started my application on your lines but while using the bluebitlibrary and i am facing problems. I have installed and referenced the dll in my application. When my program reaches where it has to call that dll it gives me an exception which says

    "Could not load file or assembly 'Bluebit.MatrixLibrary, Version=4.3.64.0, Culture=neutral, PublicKeyToken=c915370950fbbd9f' or one of its dependencies. An attempt was made to load a program with an incorrect format."

    Did you encounter this problem. Please help me in using that dll if you can.

    ReplyDelete
  2. Hi Sushma,
    There are 2 versions of the BlueBit Matrix Library: 32-bit and 84 bit. I used the 32 bit version of the library and never encountered the problem that you have mentioned.

    You are trying to use the 64 bit version of the Bluebit Matrix Library. Try the 32 bit version of the library.

    Hope this helps. Let me know if it works.

    ReplyDelete
  3. Your blog keeps getting better and better! Your older articles are not as good as newer ones you have a lot more creativity and originality now keep it up!

    ReplyDelete
  4. i am facing some problems
    1: if i change your doc's i.e
    a: contain no query word
    b: contain only 1 query word
    c: contain all query words
    it gives a negative value to doc 2
    will you plz tell me why it give negative value

    2: if i give doc's containing no keywords it give me exception in the following line
    queryMatrix = queryMatrix.Transpose() * Uk * Sk.Inverse();

    Exception:This operation requires a non-singular matrix.

    Can u help me in this regard.
    thanks in advance

    ReplyDelete
  5. hi everybody


    just registered and put on my todo list


    hopefully this is just what im looking for looks like i have a lot to read.

    ReplyDelete
  6. hey


    Just saying hello while I read through the posts


    hopefully this is just what im looking for looks like i have a lot to read.

    ReplyDelete
  7. Unquestionably, I like it, significant and well-founded ideas. I highly recommend you write more fascinating content in your.

    ReplyDelete
  8. click [URL=http://jacket-dresses.net/]moncler[/URL] with low price YQnXrzCK [URL=http://jacket-dresses.net/ ] http://jacket-dresses.net/ [/URL]

    ReplyDelete
  9. Groom him on the floor or restrain him if he's on a table or in a sink I prefer to believe the Mystics It was extremely difficult to see her lying there anesthetized[url=http://fans-chicago.com/]Brian Urlacher Youth Jersey[/url]
    looking like road kill[url=http://giantsofficialjersey.com/]Victor Cruz Authentic Jersey[/url]
    so soon after holding Cali in my arms while she was put down
    Hades is their "temporary" place of torment until that event happens2 As the cloud passes over it will begin to regroup and come back together without changing the over all regional weather

    ReplyDelete

  10. [url=http://shenenmaoyiss.soup.io/][b]sac longchamp[/b][/url]
    [url=http://shenenmaoyie.tumblr.com/][b]sac longchamp[/b][/url]
    [url=http://shenenmaoyi.livejournal.com/][b]sac longchamp[/b][/url]
    [url=http://www.zimbio.com/Shoes+And+Fashion/articles/5BeMlTZZcvm/Articles+l+int+rieur+de+votre+sac+Avez?add=True][b]sac longchamp[/b][/url]
    [url=http://shenenmaoyi.bcz.com/2012/12/27/les-nouvelles-tendances-de-sac-a-main/][b]sac longchamp[/b][/url]

    ReplyDelete
  11. click to view RAWnFJId [URL=http://www.replica-handbags2013.com/]designer inspired handbags[/URL] with confident edKKnwyg [URL=http://www.replica-handbags2013.com/ ] http://www.replica-handbags2013.com/ [/URL]

    ReplyDelete
  12. check KOkcACxn [URL=http://www.louis--vuitton--online--shop.org/]louis vuitton shopping online[/URL] for gift aIDtdkla [URL=http://www.louis--vuitton--online--shop.org/ ] http://www.louis--vuitton--online--shop.org/ [/URL]

    ReplyDelete
  13. sell WacqoqHR [URL=http://www.uggs-outlet2013.com/]uggs outlet[/URL] , just clicks away zXZOvVGN [URL=http://www.uggs-outlet2013.com/ ] http://www.uggs-outlet2013.com/ [/URL]

    ReplyDelete
  14. We [url=http://www.casinobonus.gd]free casino bonus[/url] have a large library of unqualifiedly freed casino games championing you to play opportunely here in your browser. Whether you pine for to training a provisions encounter master plan or just attempt exposed a some modern slots once playing seeking genuine money, we be undergoing you covered. These are the claim still and all games that you can play at true online casinos and you can part of them all quest of free.

    ReplyDelete
  15. top [url=http://www.c-online-casino.co.uk/]uk casino[/url] check the latest [url=http://www.casinolasvegass.com/]online casino[/url] autonomous no store hand-out at the best [url=http://www.baywatchcasino.com/]free gratuity casino
    [/url].

    ReplyDelete
  16. cheap tramadol tramadol drug interactions - tramadol hcl 60 mg

    ReplyDelete
  17. xanax online xanax 1mg get high - 2mg xanax and 2 beers

    ReplyDelete
  18. buy tramadol online tramadol with acetaminophen dosage - tramadol 50 mg contraindications

    ReplyDelete
  19. generic xanax xanax overdose torn - drug stronger xanax valium

    ReplyDelete
  20. tramadol online overnight pet meds tramadol no prescription - tramadol 50 mg 100

    ReplyDelete
  21. tramadol generic tramadol legal to order online - buy tramadol online no prescription

    ReplyDelete
  22. buy tramadol online tramadol 50 mg 4 times a day - order tramadol from usa

    ReplyDelete
  23. cheap tramadol online buy tramadol online us - buy-tramadol-overnight

    ReplyDelete
  24. cheap xanax no prescription pill report for xanax - xanax bars song

    ReplyDelete
  25. buy tramadol online 300 mg tramadol overdose - tramadol for dogs best price

    ReplyDelete
  26. buy cialis online safe place buy cialis online - cialis testimonials

    ReplyDelete
  27. cialis online cialis coupon voucher - buy cialis professional 20 mg

    ReplyDelete
  28. cialis online cheap cialis prices us - cialis daily new zealand

    ReplyDelete
  29. buy tramadol tramadol dosage 150 mg - tramadol and addiction

    ReplyDelete
  30. tramadol online tramadol online american express - where to buy tramadol cheap

    ReplyDelete
  31. buy tramadol overnight cod tramadol high like vicodin - buy tramadol rx online

    ReplyDelete
  32. http://landvoicelearning.com/#51438 tramadol normal dosage - tramadol hcl 50 mg ingredients

    ReplyDelete
  33. http://buytramadolonlinecool.com/#63102 getting off tramadol withdrawal - tramadol extended release cost

    ReplyDelete
  34. buy tramadol online buy tramadol online usa - tramadol overdose antidote

    ReplyDelete
  35. http://ranchodelastortugas.com/#72895 1mg of xanax high - xanax urine test how long

    ReplyDelete
  36. buy tramadol online order tramadol online with mastercard - tramadol buy online usa

    ReplyDelete
  37. http://www.achildsplace.org/banners/tramadolonline/#2003 buy tramadol online no prescription cheap - buy tramadol cod online

    ReplyDelete