Results 1 to 5 of 5
  1. #1
    siddarth is offline Member
    Join Date
    Jan 2014
    Posts
    5
    Rep Power
    0

    Default Application relevant sort order of results

    Hi,
    I have lucene up and running with sorting according to relevance based on the doc scores that lucene generates.

    Now here's what I am trying to achieve,
    On searching for the text 'The brown fox' , I would like the search results to be sorted as follows :

    1) All docs which exactly match the search string i.e. the docs containing 'The brown fox'
    2) All docs with the words 'The brown'
    3) All docs with the words 'brown fox'
    4) All docs with the words 'The fox'
    5) All docs with the words 'The'
    6) All docs with the words 'brown'
    7) All docs with the words 'fox'

    The number of permutations and combinations increase as the number of words in the search string increase.
    How should I go about implementing this?

    I'm still new to lucene, so any and all help will be greatly appreciated!
    Thanks in advance.

    Sidd

  2. #2
    gimbal2 is offline Just a guy
    Join Date
    Jun 2013
    Location
    Netherlands
    Posts
    4,365
    Rep Power
    6

    Default Re: Application relevant sort order of results

    Are you sure that Lucene isn't already doing this for you?
    "Syntactic sugar causes cancer of the semicolon." -- Alan Perlis

  3. #3
    siddarth is offline Member
    Join Date
    Jan 2014
    Posts
    5
    Rep Power
    0

    Default Re: Application relevant sort order of results

    Quote Originally Posted by gimbal2 View Post
    Are you sure that Lucene isn't already doing this for you?
    Not exactly.
    To make my issue a little more clearer, consider the following example:

    I have 7 ms-word files with the following content
    Doc 1 - The brown fox
    Doc 2 - The brown
    Doc 3 - The fox
    Doc 4 - brown fox
    Doc 5 - The
    Doc 6 - brown
    Doc 7 - fox
    Also, keep in mind that my order of indexing is 1 to 7


    Now, I search for the string 'The fox'
    Expected Result for my requirement :

    Doc 3 (exact match of the search string)
    Doc 1 (contains all words in the search but not contagious/linearly arranged)
    Doc 2,4,5,7 (partial matches, can be in any order depending on the lucene score or lucene docId)


    Actual Result:
    Doc 1
    Doc 3
    Doc 2,4,5,7 (Any order will do)


    I also know the reason why lucene gives me this result.
    When I search for 'The fox', lucene searches for file content matching 'The' OR 'fox' and gives the a score
    Since both words occur in Doc1 and Doc3 the score for both files is the same.
    Now, when lucene cant sort the results on the score, it falls back to sorting by the order of index insertion.
    As mentioned earlier, the index insertion order was Doc1 to Doc7, hence it gives me this result where Doc1 comes before Doc3



    What I have no clue about is how to get the result I need! :)

    Sid

  4. #4
    gimbal2 is offline Just a guy
    Join Date
    Jun 2013
    Location
    Netherlands
    Posts
    4,365
    Rep Power
    6

    Default Re: Application relevant sort order of results

    Neither do I to be honest - I apply Lucene to give me results FAST in an order of relevance, I am then not going to limit its complex internal magic by forcing it to return results in a specific order I want; I let its internal relevance algorithms do the work as best as it can.

    Are you sure that Lucene is in fact indexing the word "the" though ? If you use the defaults it may have a filter activated that prevents it from indexing on very common and small words. I must admit that when implementing the Lucene search I have, I had to dive into the actual source code to see the real truth; the online documentation is a little too shallow for my taste. I had the problem for example that it was using a 'dot' in a word as a separator character, I had to fine tune such matters quite a bit with a custom analyzer.
    "Syntactic sugar causes cancer of the semicolon." -- Alan Perlis

  5. #5
    siddarth is offline Member
    Join Date
    Jan 2014
    Posts
    5
    Rep Power
    0

    Default Re: Application relevant sort order of results

    It does index the word 'The'.
    This is a helpful small tutorial on how to get started up on lucene:
    The Lucene search engine: Powerful, flexible, and free | JavaWorld
    Here the author mentions that lucene has a Dictionary where it indexes each unique word it comes across!

    And the reason I'm fiddling around with the internal working of lucene is since its a client requirement.
    Thanks for the response anyways!

Similar Threads

  1. Replies: 1
    Last Post: 12-08-2011, 06:31 AM
  2. Insertion Sort Algorithm in Descending Order
    By Notthead in forum Advanced Java
    Replies: 12
    Last Post: 12-07-2011, 09:27 PM
  3. Replies: 1
    Last Post: 11-13-2011, 07:23 PM
  4. array to sort names in alphabetical order
    By leoshiner in forum New To Java
    Replies: 6
    Last Post: 05-01-2011, 01:28 PM
  5. Sort in Cyrilic order
    By cselic in forum Advanced Java
    Replies: 4
    Last Post: 04-21-2010, 03:03 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •