Results 1 to 5 of 5
- 03-21-2011, 07:17 AM #1
- 03-21-2011, 08:01 AM #2
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,606
- Blog Entries
- 7
- Rep Power
- 17
- 03-21-2011, 12:43 PM #3
how to calculate similarity between a query and documents using java program
I have a set of documents and i have calculate both
1)Term -Frequency
2)Inverse-Frequency
3)TF/IDF
Now i need to calculate the similarity between a specific query and a document which will produce a score that will rank the document from the highest similarity to the lowest similarity towards the query.
Can anyone guide me ? I just need to know how to proceed from my current progress.
thanks
- 03-21-2011, 01:17 PM #4
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,606
- Blog Entries
- 7
- Rep Power
- 17
- 03-21-2011, 02:03 PM #5
public class tf_idf {
public static int numDocs = 0;
public static int numTerms = 0;
public static int[][] termFreq;
public static int[] maxTermFreq;
public static int[] docFreq;
public static float[][] termWeight;
public static void TermWeight()//tf*idf
{
for (int i = 0; i < numTerms; i++) {
for (int j = 0; j < numDocs; j++) {
termWeight[i][j] = ComputeTermWeight(i, j);
}
}
}
public static float GetTermFrequency(int term, int doc) {
int freq = termFreq[term][doc]; //bil kata(term) dalam document(doc).E.g: term=2
int maxfreq = maxTermFreq[doc]; //jum kata dalam document(doc).e.g:doc=3
System.out.println("Term Frequency: " + ((float) freq / (float) maxfreq));
return ((float) freq / (float) maxfreq); // (freq/maxfreq) : (2/3)
}
public static float GetInverseDocumentFrequency(int term) {
int df = docFreq[term];
System.out.println("Inverse Document Frequency: " + Log((float) (numDocs) / (float) df));
return Log((float) (numDocs) / (float) df);
/** numDocs=3 df=1 idf=log(3/1)**/
}
public static float Log(float num) {
return (float) Math.log(num); //ln(num)=loge(num)
}
public static float ComputeTermWeight(int term, int doc) {
System.out.println("term: " + term + "doc: " + doc);
float tf = GetTermFrequency(term, doc);
float idf = GetInverseDocumentFrequency(term);
System.out.println("total weight: " + tf * idf);
return tf * idf;
}
}
>>really don't have any idea to proceed and solve it. hope anyone can help me :-p
Similar Threads
-
is Cosine Similarity the Default Similarity in Lucene?
By sethu.iit@gmail.com in forum LuceneReplies: 0Last Post: 06-30-2010, 09:49 AM -
Search Engine on JSP Page
By samanthamaryhorgan in forum Advanced JavaReplies: 0Last Post: 02-13-2010, 12:40 PM -
simple search engine
By semoche in forum Enterprise JavaBeans (EJB)Replies: 3Last Post: 12-07-2009, 08:41 AM -
Search Engine , Web Crawler
By sahil.ansari in forum Advanced JavaReplies: 5Last Post: 07-21-2008, 01:53 AM -
Search Engine
By SSam Varghese in forum Java ServletReplies: 5Last Post: 01-05-2008, 08:26 AM


LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks