-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing and search pipelines are mismatched with language support #149
Comments
For (1) I can just extract them from the Node code, it's quite easy to do... |
For (2), it seems like this might be on purpose: https://github.com/yeraydiazdiaz/lunr.py/blob/master/lunr/lunr.py#L66 Can you explain why? Bug-compatibility with lunr.js? (EDIT: yes, bug-compatibility, it appears) |
After digging a bit more it appears this is due to the difficulty of registering the necessary trimmers and stopword filters when the serialized index is reloaded? Only the stemmers are registered: https://github.com/yeraydiazdiaz/lunr.py/blob/master/lunr/languages/__init__.py#L99 The workaround I found is to explicitly add them to for funcname in ("lunr-multi-trimmer-fr", "stopWordFilter-fr",):
builder.search_pipeline.before(
builder.search_pipeline.registered_functions["stemmer-fr"],
builder.search_pipeline.registered_functions[funcname],
) ... get_nltk_builder(["fr"])
index = Index.load(...) |
(2) is addressed in #151 now |
I've submitted a PR to lunr-langugages to fix the problem with the trimmer missing important characters (it wasn't passing its own test suite): MihaiValentin/lunr-languages#115 I think that we can re-use the same JS code that generates the lunr-languages trimmers, stemmers, and stopword filters to generate Python code for lunr.py, I hope to make a new PR to address this issue which does that soon! |
I notice that when using language support, some words cannot be searched:
This would seem to be due to the missing trimmer in the search pipeline:
Not sure really why, but it seems the trimmer thinks
ô
should be trimmed:So, there are really two problems:
The text was updated successfully, but these errors were encountered: