Bug in Spanish Analyzer with (some) accented words?
Hi everyone!
I'm using Lucene 3.4. I'm using the SpanishAnalyzer class, and it's having a strange behavior. After parsing some accented words, as "comunicación" or "también", I'm getting different results compared to the same words without accents ("comunicacion", "tambien").
If I parse "comunicación", the stem I get is "comun". If I instead parse "comunicacion", the result is "comunicacion".
A similar thing happens if I parse "también" and "tambien". The first one doesn't produce any result, since it is a stop word. The second one, Lucene takes it as a no stop word, and produces "tambi".
It doesn't happen to the most of Spanish accented words, so I'm wondering why it happens to these two words, and if it is a bug in SpanishAnalyzer class.
So, is it a bug? Is it recommended, first of all, to remove any accent in the word, or is there any other way to get a successful behavior?
Thank you very much in advance,
Sabbia