Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: extract wordchars from lunr-languages #150

Closed
wants to merge 2 commits into from

Conversation

dhdaines
Copy link
Contributor

@dhdaines dhdaines commented Jul 4, 2024

See #149 (doesn't fix the whole thing)

@dhdaines
Copy link
Contributor Author

dhdaines commented Jul 4, 2024

Note also that you could also just add {r'\w'} to all_word_characters in the same way as you do for the default pipeline.

@dhdaines
Copy link
Contributor Author

dhdaines commented Jul 4, 2024

In actual fact we should add \w to them, because otherwise they will remove numbers at the end of search terms, which is almost certainly not what you want for a lot of applications! But... this is not bug-compatible with lunr-languages, so it might just need a documented workaround.

@dhdaines
Copy link
Contributor Author

dhdaines commented Jul 6, 2024

You may not really want to do this, it seems the trimmers in lunr-languages are full of weird junk: MihaiValentin/lunr-languages#66

@dhdaines
Copy link
Contributor Author

dhdaines commented Jul 6, 2024

Hmm. It turns out, actually, that lunr-languages code is generated programmatically as well. So it doesn't make a lot of sense to parse it to create these. I'm closing this PR and will come up with a better way to do this.

@dhdaines dhdaines closed this Jul 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant