Results 1 to 1 of 1
- 12-11-2009, 06:03 PM #1
Member
- Join Date
- Dec 2009
- Posts
- 1
- Rep Power
- 0
Lucene Analyzer that can handle C++ vs C#
Can someone please point me in the right direction.
We are creating an application that needs to beable to search on C++ and get
back doc's that have C++ in it. The StandardAnalyzer does not seem to index
the "+", so a search for "C++" will bring back docs that contain, C++, C,
C#, etc..... The WhiteSpaceAnalyzer will index the "+", but if we have the
term "C++." that is, if C++ is at the end of a sentence, it will index
"C++." so a search for "C++" will not return the doc. I have heard of maybe
a CustomAnalyzer; however, it seems like there would actually need to be a
CustomFilter/CustomTokenizer, I looked at:
- StandardAnalyzer.java
- StandardFilter.java
- StandardTokenizer.java
- StandardTokenizerImpl.java
- StandardTokenizerImpl.jflex
I would guess that the StandardTokenizer is where the changes would need to
be made to allow the "+" character, but I am unclear as to how.
Any and all help is greatly appreciated.
Going thru all the documents, stripping out "+" for the word "plus" is not really an option for us.
Similar Threads
-
Using Memory Analyzer to examine the heap
By Bolo33 in forum EclipseReplies: 0Last Post: 06-30-2009, 02:28 AM -
Apache Lucene 2.3.2
By Java Tip in forum Java SoftwareReplies: 0Last Post: 05-08-2008, 06:49 PM -
Work On Lucene
By peiceonly in forum LuceneReplies: 1Last Post: 08-07-2007, 05:47 PM -
Dependency Analyzer 1.0.3-rc0
By levent in forum Java SoftwareReplies: 0Last Post: 07-30-2007, 04:34 PM -
Apache Lucene 2.2.0
By JavaBean in forum Java SoftwareReplies: 0Last Post: 06-22-2007, 12:47 AM


LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks