Results 1 to 1 of 1
- 11-18-2011, 04:18 AM #1Member
- Join Date
- Nov 2011
- Rep Power
Bug in Spanish Analyzer with (some) accented words?
I'm using Lucene 3.4. I'm using the SpanishAnalyzer class, and it's having a strange behavior. After parsing some accented words, as "comunicación" or "también", I'm getting different results compared to the same words without accents ("comunicacion", "tambien").
If I parse "comunicación", the stem I get is "comun". If I instead parse "comunicacion", the result is "comunicacion".
A similar thing happens if I parse "también" and "tambien". The first one doesn't produce any result, since it is a stop word. The second one, Lucene takes it as a no stop word, and produces "tambi".
It doesn't happen to the most of Spanish accented words, so I'm wondering why it happens to these two words, and if it is a bug in SpanishAnalyzer class.
So, is it a bug? Is it recommended, first of all, to remove any accent in the word, or is there any other way to get a successful behavior?
Thank you very much in advance,
- By kishan in forum JavaServer Faces (JSF)Replies: 0Last Post: 11-02-2010, 01:47 PM
- By Phenomena in forum New To JavaReplies: 2Last Post: 04-29-2010, 04:06 PM
- By Nicholas Jordan in forum Jobs OfferedReplies: 5Last Post: 08-11-2009, 03:43 PM
- By koko10ar in forum Reviews / AdvertisingReplies: 1Last Post: 08-12-2008, 11:56 PM
- By aleplgr in forum AWT / SwingReplies: 0Last Post: 08-06-2007, 10:12 AM