Problem in spanish: doesn't work if word isn't using accent mark. #59

jigarzon · 2019-10-07T16:07:05Z

Let's say I have an index created. the spanish word "Respiración" is stemmed as:
"respir"

Thats correct.

Now, I make a search, but the user doesn't use the accent mark, and he types: "respiracion" (without acent on last "o"). So lunr won't stem that word and it will let it as "respiracion", so no matches will be found.

I know that a basis around stemming is that the word is correctly spelled, BUT as nearly no user type accents correctly when searching for a string, this is really making lunr useless for many words.

jigarzon · 2019-10-07T16:21:47Z

I made a workaround, that is removing accents before stemmer in the pipeline (I remove accents with the use of normalize-strings.

But this also removes lot of benefits from stemming, because those words will never be stemmed.

var normalize = require('normalize-strings');


var normalizeLunrPlugin = function(builder, stemmer) {
  var pipelineFunction = function(token) {
    return token.update(function(word) {
      var normalized = normalize(word);
      return normalized;
    });
  };

  // Register the pipeline function so the index can be serialised
  lunr.Pipeline.registerFunction(pipelineFunction, 'normalizeLunrPlugin');

  // Add the pipeline function to both the indexing pipeline and the
  // searching pipeline
  builder.pipeline.before(stemmer, pipelineFunction);
  builder.searchPipeline.before(stemmer, pipelineFunction);
};

jigarzon · 2019-10-07T16:23:37Z

My suggestion is that two stemmers, with both accented and no-accented words run in the pipeline, so that the word "respiracion" without accents, that the first stemmer will leave intact, is picked by the second one and stemmed correctly...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem in spanish: doesn't work if word isn't using accent mark. #59

Problem in spanish: doesn't work if word isn't using accent mark. #59

jigarzon commented Oct 7, 2019

jigarzon commented Oct 7, 2019 •

edited

Loading

jigarzon commented Oct 7, 2019

Problem in spanish: doesn't work if word isn't using accent mark. #59

Problem in spanish: doesn't work if word isn't using accent mark. #59

Comments

jigarzon commented Oct 7, 2019

jigarzon commented Oct 7, 2019 • edited Loading

jigarzon commented Oct 7, 2019

jigarzon commented Oct 7, 2019 •

edited

Loading