You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, "empêchaient" (verb "empêcher" conjugated in past) will be indexed as "empêch" (instead of "empech").
I'm not familiar with http://snowball.tartarus.org/ nor stemmer algorithms but according to http://snowball.tartarus.org/algorithms/french/stemmer.html this is the expected behavior.
For instance, maître will produce maîtr not maitr. I find it odd, because most of the time French people will not type accented letters when searching (because it's quicker to type and most search engine will replace accented letters anyway).
Currently, "empêchaient" (verb "empêcher" conjugated in past) will be indexed as "empêch" (instead of "empech").
I'm not familiar with http://snowball.tartarus.org/ nor stemmer algorithms but according to http://snowball.tartarus.org/algorithms/french/stemmer.html this is the expected behavior.
For instance,
maître
will producemaîtr
notmaitr
. I find it odd, because most of the time French people will not type accented letters when searching (because it's quicker to type and most search engine will replace accented letters anyway).For reference, here's the Lucene implementation: https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchLightStemmer.java
The text was updated successfully, but these errors were encountered: