I couldn't find an answer on my question in Internet for several days. May be I can't to formulate a correct query into google.
Let's get to the point...
Usual scenario when Lucene can be used is following:
1) The system indexes documents content
2) User searches documents using phrases
3) Lucene returns documents which contains input phrases
Can Lucene be used for the scenario when:
1) system indexes phrases,
2) user provides a Text
3) Lucene returns all indexed phrases which text includes
Indexed phrases: "like", "kids", "table". User inputs: "I like children and kids". Lucene returns: "like", "kids".
Because of big number (~14000) of phrases Regex approach does not work.
Please, let me know if this scenario can be implemented using Lucene.
I think this is a common problem while searching I would suggest you try something like this
1. store the phrases in a field in index.
2. This may require you to store many phrases per document.
3. Also store a field with document location so that you can point to the document link for search result.
4. Also store line numbers in index for each phrase, this way you can point your user directly to a specific line number if needed.
Doing the same regex approach on a index on phrases would fetch you more relevant results then doing it on a full document content.
Let me know if you have some questions.