Results 1 to 1 of 1
Thread: Lucene score - confidence
- 12-01-2010, 12:03 PM #1Member
- Join Date
- Dec 2010
- Rep Power
Lucene score - confidence
I'm using lucene to retrieve relevant segments of a corpus based on a
given query. Every segment is represented as Document in the indexing.
Once the relevant segments are retrieved, I search for a Regex in them
to capture the requested information. It can happens that I found the
regex in two or more documents retrieved by lucene. Thus, I would like
to show a confidence score to each captured Regex. To make simple, I
means if the regex is found in the first retrieved document and in
fourth, the one retrieved in the first document is more certain to be
relevant since the its document was better scored than other one.
To show a confidence score, I tried to use scores returned by Lucene.
But, the lucene score are arbitrary and not normalized. So it's not
relevant to use. I wonder how we can have an arbitrary score (>1) when
the default scoring system in lucene is based on cosine measure. In a
simple case, the cosine score between two document vectors (obtained by
tf-idf) is between 0 and 1.
Since the lucene score is arbitrary and not normalized, if I want to
verify which query is more relevant to retrieve a document, how can I
compare the score of the first retrieved documents corresponding to each
query ? Is there any way to give a confidence score to each retrieved
document which make the comparison of the results of the different
search using the different queries possible?
Many thank :rolleyes:;)
- By belsen in forum LuceneReplies: 0Last Post: 11-16-2010, 04:04 PM
- By vlan in forum Java AppletsReplies: 11Last Post: 06-03-2010, 11:10 AM
- By ryn21 in forum New To JavaReplies: 11Last Post: 10-17-2008, 06:49 AM
- By Eric in forum Advanced JavaReplies: 2Last Post: 07-01-2007, 05:15 AM
- By levent in forum Java SoftwareReplies: 0Last Post: 05-23-2007, 08:40 AM