-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Common names interpreted as verbs #24
Comments
Ef ég geri >>> sentence = greynir.parse_single('Forstjórinn heitir Örn.')
>>> sentence.terminals
[Terminal(text='Forstjórinn', lemma='forstjóri', category='no', variants=['et', 'gr', 'kk', 'nf'], index=0), Terminal(text='heitir', lemma='heita', category='so', variants=['1', 'nf
', 'et', 'fh', 'gm', 'nt', 'p3'], index=1), Terminal(text='Örn.', lemma='Örn.', category='no', variants=['et', 'hk', 'nf'], index=2)] |
This is not a surprise, really, as Greynir has a preference for recognizing sentences (with verbs) rather than noun phrases, if both are possible. But for this use case, I would recommend using |
Having said that, the |
Could this be related to "örn." being an abbreviation recognised by the tokenizer? |
I'm parsing search queries. I've resorted to just searching both the lemma and the original query. |
I ran the 100 most common first names in Iceland through
greynir.parse
. No female names are interpreted as verbs but there are a few male ones. See this gist for the code.https://gist.github.com/jokull/2c1048bbc845feb46c717ac7c77e0cc5
The text was updated successfully, but these errors were encountered: