i want to know how to program a cosine similarity in develop search engine using a java/jsp program :cool:
Printable View
i want to know how to program a cosine similarity in develop search engine using a java/jsp program :cool:
how to calculate similarity between a query and documents using java program
I have a set of documents and i have calculate both
1)Term -Frequency
2)Inverse-Frequency
3)TF/IDF
Now i need to calculate the similarity between a specific query and a document which will produce a score that will rank the document from the highest similarity to the lowest similarity towards the query.
Can anyone guide me ? I just need to know how to proceed from my current progress.
thanks
public class tf_idf {
public static int numDocs = 0;
public static int numTerms = 0;
public static int[][] termFreq;
public static int[] maxTermFreq;
public static int[] docFreq;
public static float[][] termWeight;
public static void TermWeight()//tf*idf
{
for (int i = 0; i < numTerms; i++) {
for (int j = 0; j < numDocs; j++) {
termWeight[i][j] = ComputeTermWeight(i, j);
}
}
}
public static float GetTermFrequency(int term, int doc) {
int freq = termFreq[term][doc]; //bil kata(term) dalam document(doc).E.g: term=2
int maxfreq = maxTermFreq[doc]; //jum kata dalam document(doc).e.g:doc=3
System.out.println("Term Frequency: " + ((float) freq / (float) maxfreq));
return ((float) freq / (float) maxfreq); // (freq/maxfreq) : (2/3)
}
public static float GetInverseDocumentFrequency(int term) {
int df = docFreq[term];
System.out.println("Inverse Document Frequency: " + Log((float) (numDocs) / (float) df));
return Log((float) (numDocs) / (float) df);
/** numDocs=3 df=1 idf=log(3/1)**/
}
public static float Log(float num) {
return (float) Math.log(num); //ln(num)=loge(num)
}
public static float ComputeTermWeight(int term, int doc) {
System.out.println("term: " + term + "doc: " + doc);
float tf = GetTermFrequency(term, doc);
float idf = GetInverseDocumentFrequency(term);
System.out.println("total weight: " + tf * idf);
return tf * idf;
}
}
>>really don't have any idea to proceed and solve it. hope anyone can help me :-p