i want to know how to program a cosine similarity in develop search engine using a java/jsp program :cool:

Printable View

- 03-21-2011, 07:17 AMpannycosine similarity in search engine
i want to know how to program a cosine similarity in develop search engine using a java/jsp program :cool:

- 03-21-2011, 08:01 AMJosAH
- 03-21-2011, 12:43 PMpanny
how to calculate similarity between a query and documents using java program

I have a set of documents and i have calculate both

1)Term -Frequency

2)Inverse-Frequency

3)TF/IDF

Now i need to calculate the similarity between a specific query and a document which will produce a score that will rank the document from the highest similarity to the lowest similarity towards the query.

Can anyone guide me ? I just need to know how to proceed from my current progress.

thanks - 03-21-2011, 01:17 PMJosAH
- 03-21-2011, 02:03 PMpanny
public class tf_idf {

public static int numDocs = 0;

public static int numTerms = 0;

public static int[][] termFreq;

public static int[] maxTermFreq;

public static int[] docFreq;

public static float[][] termWeight;

public static void TermWeight()//tf*idf

{

for (int i = 0; i < numTerms; i++) {

for (int j = 0; j < numDocs; j++) {

termWeight[i][j] = ComputeTermWeight(i, j);

}

}

}

public static float GetTermFrequency(int term, int doc) {

int freq = termFreq[term][doc]; //bil kata(term) dalam document(doc).E.g: term=2

int maxfreq = maxTermFreq[doc]; //jum kata dalam document(doc).e.g:doc=3

System.out.println("Term Frequency: " + ((float) freq / (float) maxfreq));

return ((float) freq / (float) maxfreq); // (freq/maxfreq) : (2/3)

}

public static float GetInverseDocumentFrequency(int term) {

int df = docFreq[term];

System.out.println("Inverse Document Frequency: " + Log((float) (numDocs) / (float) df));

return Log((float) (numDocs) / (float) df);

/** numDocs=3 df=1 idf=log(3/1)**/

}

public static float Log(float num) {

return (float) Math.log(num); //ln(num)=loge(num)

}

public static float ComputeTermWeight(int term, int doc) {

System.out.println("term: " + term + "doc: " + doc);

float tf = GetTermFrequency(term, doc);

float idf = GetInverseDocumentFrequency(term);

System.out.println("total weight: " + tf * idf);

return tf * idf;

}

}

>>really don't have any idea to proceed and solve it. hope anyone can help me :-p