we are trying to build a semantic index of documents and already have a tool to extract all semantic contexts from a text document. Our next step is to build an index using these contexts (words) and Lucene came to mind. Now I have never used Lucene or any other indexer, therefore my question here:

How hard is it to swap out Lucene's analyzer and replace it with our program, where should we start? We can pre-process the words representing the contexts very easily into any needed format.

Thanks for any advice!