Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation considerations #4

Open
nciric opened this issue Feb 28, 2024 · 1 comment
Open

Implementation considerations #4

nciric opened this issue Feb 28, 2024 · 1 comment
Labels
discuss Discussion item

Comments

@nciric
Copy link
Contributor

nciric commented Feb 28, 2024

While we aim at a unified API solution in #3 , we should also recognize that different companies could have partial and proprietary solutions they would like to reuse.

ICU already has prior art solving the problem, e.g. transliterator and break iterator.

There are two cases we should consider:

  1. User is fine with defaults, uses code & lexicons provided by inflection project
  2. User has a better implementation for a set of languages, and overrides defaults for those languages only, The rest falls back on inflection defaults. User solutions can range from pure lexicon lookup, heuristics to ML models for more complex cases.

Inflection code shouldn't depend on user libraries, but it should provide registration APIs where they can hook up their implementation to be used with our APIs.

@nciric nciric added the discuss Discussion item label Feb 28, 2024
@grhoten
Copy link
Member

grhoten commented Dec 10, 2024

The SemanticFeatureModel of the code in pull request #35 does allow replacing an inflection engine on a per instance basis. It doesn't do a global registration. Doing a global default replace does require considerations for memory ownership, framework reuse, data reloading, thread safety, and competing implementations in the same process space.

From experience, the transliterator model is nice when you own everything in a single process space. As soon as you try global registration, other frameworks that don't want your default behavior nor for it to stick around after a language change, this model becomes hard to coordinate.

When implementing these considerations, security around who is able to intercept this content should be considered too. There may be sensitive data that goes through the inflection process. So how to handle logging the data or even a person in the middle attack should be considered.

Typically simple customizations should be handled with a SemanticConcept. That would address case 1, and that's a common case. Case 2 is a more complex consideration that requires much more thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Discussion item
Projects
None yet
Development

No branches or pull requests

2 participants