You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let's say I have an index created. the spanish word "Respiración" is stemmed as:
"respir"
Thats correct.
Now, I make a search, but the user doesn't use the accent mark, and he types: "respiracion" (without acent on last "o"). So lunr won't stem that word and it will let it as "respiracion", so no matches will be found.
I know that a basis around stemming is that the word is correctly spelled, BUT as nearly no user type accents correctly when searching for a string, this is really making lunr useless for many words.
The text was updated successfully, but these errors were encountered:
I made a workaround, that is removing accents before stemmer in the pipeline (I remove accents with the use of normalize-strings.
But this also removes lot of benefits from stemming, because those words will never be stemmed.
var normalize = require('normalize-strings');
var normalizeLunrPlugin = function(builder, stemmer) {
var pipelineFunction = function(token) {
return token.update(function(word) {
var normalized = normalize(word);
return normalized;
});
};
// Register the pipeline function so the index can be serialised
lunr.Pipeline.registerFunction(pipelineFunction, 'normalizeLunrPlugin');
// Add the pipeline function to both the indexing pipeline and the
// searching pipeline
builder.pipeline.before(stemmer, pipelineFunction);
builder.searchPipeline.before(stemmer, pipelineFunction);
};
My suggestion is that two stemmers, with both accented and no-accented words run in the pipeline, so that the word "respiracion" without accents, that the first stemmer will leave intact, is picked by the second one and stemmed correctly...
Let's say I have an index created. the spanish word "Respiración" is stemmed as:
"respir"
Thats correct.
Now, I make a search, but the user doesn't use the accent mark, and he types: "respiracion" (without acent on last "o"). So lunr won't stem that word and it will let it as "respiracion", so no matches will be found.
I know that a basis around stemming is that the word is correctly spelled, BUT as nearly no user type accents correctly when searching for a string, this is really making lunr useless for many words.
The text was updated successfully, but these errors were encountered: