Results 1 to 2 of 2
  1. #1
    javastu is offline Member
    Join Date
    Jul 2016
    Posts
    2
    Rep Power
    0

    Post Problems Refactoring a Lucene Index

    As our software goes through its lifecycle, we sometimes have to alter existing Lucene indexes. The way I have done that in the past is to open the existing index for reading, read each Document, modify it and write the Document to a new index. At the end of the process, I delete the old index and rename the new index to the old name.

    Full disclosure: I do not do any tokenizing and use no analyzers.

    I recently upgraded from Lucene 3.x to 4.10.4. Now I have the following problem: Suppose the existing document has 10 fields in it and there's one I have to modify. I remove that field and re-add it with the new settings. Then I add the Document to the new index. I get exceptions thrown for the fields I don't even touch. That's because their FieldType has tokenized set to true and it fails because I am using no analyzers.

    The curious thing is that when I created the original document, none of the FieldTypes for StringFields had tokenized set to true. Yet when I read it back it is set to true. Why is this?

    I tried a fix whereby I set each fieldType's tokenized to false. But that doesn't work either because I have a LongField and when reading back the LongField's their FieldTypes are set as 'frozen'

    Am I not understanding something here? This is not very usable in the newer versions of Lucene. What can I do to fix this? Is this a Lucene bug?

  2. #2
    javastu is offline Member
    Join Date
    Jul 2016
    Posts
    2
    Rep Power
    0

    Default Re: Problems Refactoring a Lucene Index

    Quote Originally Posted by javastu View Post
    As our software goes through its lifecycle, we sometimes have to alter existing Lucene indexes. The way I have done that in the past is to open the existing index for reading, read each Document, modify it and write the Document to a new index. At the end of the process, I delete the old index and rename the new index to the old name.

    Full disclosure: I do not do any tokenizing and use no analyzers.

    I recently upgraded from Lucene 3.x to 4.10.4. Now I have the following problem: Suppose the existing document has 10 fields in it and there's one I have to modify. I remove that field and re-add it with the new settings. Then I add the Document to the new index. I get exceptions thrown for the fields I don't even touch. That's because their FieldType has tokenized set to true and it fails because I am using no analyzers.

    The curious thing is that when I created the original document, none of the FieldTypes for StringFields had tokenized set to true. Yet when I read it back it is set to true. Why is this?

    I tried a fix whereby I set each fieldType's tokenized to false. But that doesn't work either because I have a LongField and when reading back the LongField's their FieldTypes are set as 'frozen'

    Am I not understanding something here? This is not very usable in the newer versions of Lucene. What can I do to fix this? Is this a Lucene bug?
    I am going to add one more piece of information and partially answer my own question.

    The scenario I describe is worse than I described. For some fields that were indexed, the metadata comes back saying it was not indexed. This causes my refactored index to have fields in it that were indexed and that cease to be. They no longer are searchable. This is a show stopper for me.

    Here's my parital answer: Looking at the javadoc to IndexReader.document, there is a comment that says that field metadata (i.e. whether field is indexed, tokenized, etc.) is not returned with the Field. I am almost positive that it never used to be that way! In fact the disclaimer about lack of metadata is not present in the 3.X javadoc of the same method!

    There is an API called FieldInfos. I tried that route as well and it also returns the same inaccurate metadata. If this is not a bug, then how do you have a class called FieldInfos that contains wrong information?!

Similar Threads

  1. Lucene Index with file system
    By bibhas.paul in forum Lucene
    Replies: 1
    Last Post: 12-16-2014, 01:42 PM
  2. Lucene Index Writing on SSD Drive
    By Jigna Joshi in forum Lucene
    Replies: 0
    Last Post: 11-04-2011, 07:02 AM
  3. How do I index a database file using lucene
    By pranav123 in forum Lucene
    Replies: 2
    Last Post: 05-23-2011, 01:14 PM
  4. Replies: 0
    Last Post: 10-29-2010, 08:15 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •