Hi I am pretty new to using lucene . I want to find the similarity between two documents using lucene . As of now i am giving the second document as the search string and then ranking the result . When i find the term frequencies i notice that they are not normalized . If that is the case with lucene then the large documents score easily .

Please correct me if am wrong . Is it true that lucene does not normalize the term frequencies and also suggest if there are any other good means to find the similarity between two documents using lucene .

Please help me out !