Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support inflection of nouns #19

Open
grhoten opened this issue Mar 14, 2024 · 4 comments
Open

Support inflection of nouns #19

grhoten opened this issue Mar 14, 2024 · 4 comments
Labels
discuss Discussion item

Comments

@grhoten
Copy link
Member

grhoten commented Mar 14, 2024

We should be able to inflect common nouns and proper nouns. This would typically include being able to modify the grammatical gender, grammatical number and grammatical cases in a lot of languages.

Prepositions in English take on grammatical case in many other languages. Typically in the form of suffixes to nouns. So this makes it related to issue #17.

English possessive/genitive forms of nouns typically need to add 's or just ', but that algorithmic logic is a lot harder in other languages, like German, Danish, Dutch, Russian and so forth.

For example, you should be able to turn "city" into "city's" or turn "cities" into "cities'". For a language like Russian, you can look at кот for an example.

Here's a more compact declension table for looking at such information.

cat (кот) singular plural
nominative кот коты
genitive кота котов
accusative коту котам
dative кота котов
instrumental котом котами
prepositional коте котах
@grhoten grhoten added the discuss Discussion item label Mar 14, 2024
@nciric
Copy link
Contributor

nciric commented Mar 15, 2024

+1

I think this was a main use case when we started discussing this project as most of the placeholders in messages are nouns (proper or common).

The solution will probably range from simple/complex algorithms + lexicon exceptions, to potentially ML models for some languages. I feel this is the first problem we should tackle, as it intersects well with common needs.

@grhoten
Copy link
Member Author

grhoten commented Mar 15, 2024

I say that for single words or very few words, ML is undesirable. From experience, it’s very resource intensive, which makes it undesirable for resource constrained environments. There are many languages where a traditional algorithmic solution for out of vocabulary words is cheaper, faster, smaller, quicker to implement and more accurate than an ML solution. I have some horror stories around this topic.

If you start handling many words or a whole sentence, ML starts looking more appealing because such solutions thrive on context.

I’d say the only exception to this rule are agglutinative languages, like Finnish and perhaps Turkish. A generally ML approach is more likely accurate in such languages. That requires a lengthier overview and education session on the topic.

The ML versus rule based approach will probably involve a discussion to find the right balance.

@nciric
Copy link
Contributor

nciric commented Mar 15, 2024

I’d say the only exception to this rule are agglutinative languages, like Finnish and perhaps Turkish. A generally ML approach is more likely accurate in such languages. That requires a lengthier overview and education session on the topic.

I expect most languages will be fine with the algorithmic + lexicon approach (and we should focus on those first). I would use ML only when necessary, as you mentioned in Finnish/Turkish. So this is not a decision we need to make a head of time, just a reminder that we need to organize our code to allow different implementations.

grhoten pushed a commit that referenced this issue Oct 30, 2024
…CENSE.txt for copyright and permission details.

This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19
This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25
This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
grhoten added a commit that referenced this issue Oct 30, 2024
…CENSE.txt for copyright and permission details.

This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19
This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25
This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
grhoten added a commit that referenced this issue Nov 30, 2024
This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19
This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25
This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
nciric pushed a commit that referenced this issue Dec 10, 2024
This contribution should resolve the following issues: #5, #6, #7, #11, #12, #13, #15, #17, #18, #19
This contribution is also related to the following issues without fully resolving the issues: 3, 4, 8, 10, 21, 23, 24, 25
This contribution also has an implementation that addresses these CLDR issues: 13025, 13563
@grhoten
Copy link
Member Author

grhoten commented Dec 10, 2024

I'd like to nominate this to be resolved with pull request #35.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Discussion item
Projects
Status: In Progress
Development

No branches or pull requests

2 participants