Accented letter ê should be replaced by e in the french stemmer #68

ggrossetie · 2020-10-28T10:04:31Z

Currently, "empêchaient" (verb "empêcher" conjugated in past) will be indexed as "empêch" (instead of "empech").

I'm not familiar with http://snowball.tartarus.org/ nor stemmer algorithms but according to http://snowball.tartarus.org/algorithms/french/stemmer.html this is the expected behavior.
For instance, maître will produce maîtr not maitr. I find it odd, because most of the time French people will not type accented letters when searching (because it's quicker to type and most search engine will replace accented letters anyway).

For reference, here's the Lucene implementation: https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchLightStemmer.java

The text was updated successfully, but these errors were encountered:

dhdaines · 2024-07-06T16:14:44Z

Hi, you can do this separately by doing Unicode folding, as detailed here: https://github.com/dhdaines/lunr.py/blob/fix_skip_docs/docs/languages.md#folding-to-ascii

Or by using lunr-folding

DavidBruant mentioned this issue Mar 23, 2021

lunr-languages/lunr.fr.js fails to find common words like "équipement" #71

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accented letter ê should be replaced by e in the french stemmer #68

Accented letter ê should be replaced by e in the french stemmer #68

ggrossetie commented Oct 28, 2020

dhdaines commented Jul 6, 2024

Accented letter ê should be replaced by e in the french stemmer #68

Accented letter ê should be replaced by e in the french stemmer #68

Comments

ggrossetie commented Oct 28, 2020

dhdaines commented Jul 6, 2024