You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I would like to add support for the Esperanto language, as several projects downstream I use depend on lunr-languages. It is a constructed languaged invented in 1887 by Dr. L.L. Zamenhof, and has over 2-million speakers worldwide.
Fortunately due to the extreme regularity of the language (it only has 16 rules), implementing this should be a lot easier than for other languages.
Advice Needed:
I don't normally work with JavaScript, so I was wondering if anyone involved with the project can help me out with a few things:
Does the stop-words function run before the stemmer? It would greatly reduce the burden if stop-words are filtered out before they get to the stemmer. Otherwise, I will basically wind up having to reimplement the stop-words list again in the stemmer, as most of the stop-words are grammatical prepositions and the like that have irregular endings.
Many other languages have very complicated hundred-line stemmer functions, but in Esperanto, once you filter the special grammatical words, every word ends with either: -is, -as, -os, -us, -u, -e, -en, -a, -an-aj, -ajn, -o, -on, -oj, or -ojn. With that said, my stemmer function can be as simple as just returning a string with the end cut off (this always results in a valid word root). I wasn't sure if I needed to use the SnowballFunction or not.
I'm currently working on Esperanto support on my fork if anyone has any advice, or wants to point out any obvious JS flaws I missted.
The text was updated successfully, but these errors were encountered:
Hello, I would like to add support for the Esperanto language, as several projects downstream I use depend on
lunr-languages
. It is a constructed languaged invented in 1887 by Dr. L.L. Zamenhof, and has over 2-million speakers worldwide.Fortunately due to the extreme regularity of the language (it only has 16 rules), implementing this should be a lot easier than for other languages.
Advice Needed:
I don't normally work with JavaScript, so I was wondering if anyone involved with the project can help me out with a few things:
Does the stop-words function run before the stemmer? It would greatly reduce the burden if stop-words are filtered out before they get to the stemmer. Otherwise, I will basically wind up having to reimplement the stop-words list again in the stemmer, as most of the stop-words are grammatical prepositions and the like that have irregular endings.
Many other languages have very complicated hundred-line stemmer functions, but in Esperanto, once you filter the special grammatical words, every word ends with either: -is, -as, -os, -us, -u, -e, -en, -a, -an -aj, -ajn, -o, -on, -oj, or -ojn. With that said, my stemmer function can be as simple as just returning a string with the end cut off (this always results in a valid word root). I wasn't sure if I needed to use the SnowballFunction or not.
I'm currently working on Esperanto support on my fork if anyone has any advice, or wants to point out any obvious JS flaws I missted.
The text was updated successfully, but these errors were encountered: