Replies: 8 comments 2 replies
-
Let's start from the use case of supporting Message Format 2.0, as a next localization standard. That's not taking us far from Apple's AttributedString. Both formats annotate parts of the string in need of grammatical support, e.g. plurals, gender etc just in technically different ways and require runtime information to be passed, e.g. names, values. From the link above, use pattern for MF2.0 in Java is: // 1. Build the immutable MF2.0 with a message and locale provided.
MessageFormatter mf2 = MessageFormatter.builder()
.setPattern("{Hello {$name :inflect case=vocative number=singular},"
+"your card expires on {$exp :datetime skeleton=yMMMdE}!}")
.setLocale("en-GB")
.build();
// 2. Get the runtime arguments.
Map<String, Object> arguments = new HashMap<>();
arguments.put("name", "John");
arguments.put("exp", new Date(1679971371000L));
// 3. Resolve arguments and format the string for UI rendering.
assertEquals(
"Hello John, your card expires on Mon, 27 Mar 2023!",
mf2.formatToString(arguments)); The The default parameters for Since My proposal:
|
Beta Was this translation helpful? Give feedback.
-
I do want to make the existing following less than ideal choices very clear. They have stuck around due to backwards compatibility and stability reasons. The code contribution uses "count" instead of "number" for the grammatical category called grammatical number, which is similar but not exactly the same as grammatical count. Grammatical number is really meant. The Morphology.GrammaticalNumber made the unfortunate choice of conflating CLDR plural rules with grammatical number. When you get to a language like Arabic or Russian, this distinction becomes very clear. I recommend that "number" is used to refer to grammatical number, and it should not refer to the CLDR plural rules. CLDR plural rules should be referenced completely separately. Those CLDR plural rules can reference combinations of case, animacy, number and so forth, but not the other way around. To highlight the importance of this distinction, please see the table in this comment in CLDR-11981. |
Beta Was this translation helpful? Give feedback.
-
Another design choice that is different in this API uses a concept called lemmaless inflection. This public usage requires this concept. The older implementation used to use a concept of lemma based inflection where you had to specify all of the grammemes to switch the lemma to the desired surface form. With lemmaless inflection, you're starting with any surface form (any cell in a declension table for a lemma) and modifying only the relevant grammemes. So if you're using a word in a sentence, you may want to keep the case the same but change the gender or number of the word. This is helpful when each surface form is unique in spelling and easy to deduce. It becomes harder when there is ambiguity. In such cases, you have to guide the inflection to what the current form is, and what the desired form is. Usually, this helps keeps the translation simpler so that you don't have to know all of the grammatical cases for a translator that knows their language well, but may not know the name for a given case in English. |
Beta Was this translation helpful? Give feedback.
-
A design choice that remains in transition is the notion of semantic features. For example, you may want to add a definite article to a word or a preposition. These semantic features do not scale. You can't chain them, and they're very specific to specific prepositions and articles, which makes the concept hard to scale. What does scale much more is the ability to use grammatical categories. You can chain them, and you don't have to hand craft rules for each semantic feature. It doesn't mean that there isn't a need for it, but I'd discourage exposing it. |
Beta Was this translation helpful? Give feedback.
-
A design choice that remains incompatible with both MF2 and this public API is the notion of the spoken form. You have print and speak strings. When you have a quantity, like "1 kilometer" and "2 kilometers", you need to be able to disambiguate the pronunciation of the number in numerous languages. You need to be able to have a spoken and printed form. This is important for a VUI. It's not as important in a GUI where the reader can infer it while reading. This code contribution can handle this situation, but it's not exposed in the API being referenced. MF2 can discard this information, but I'd prefer to keep that functionality around. This functionality relies heavily on ICU and CLDR RBNF. When you get to numbers, a data resource using Wikidata becomes unscalable, and you have to use RBNF. Based on previous discussions, this design point was initially different between Google and Apple when naming rules for RBNF. It might be good to note this non-obvious point of difference. |
Beta Was this translation helpful? Give feedback.
-
@mihnita could you confirm my expectation on how custom functions and parameters work in MF2.0 or point me to the docs? |
Beta Was this translation helpful? Give feedback.
-
Thanks for the clarifications, George. A few questions.
- The lemmaless inflections sound good. I'm curious how the
disambiguation would work. If X could be the genitive of word X or the
dative of word Y, how would the software disambiguate?For the
- For the definiteness, can your API treat that as an option to the
inflection function, eg (with MF2 syntax) I see {$toy :inflect
definiteness=definite count=other}. => "I see the teddy bears."
- The GUI vs VUI distinction is interesting. Is the issue that "1" needs
to be inflected differently for speech? Eg, in German, where it would be
pronounced as "ein" vs "eine" (or other forms in oblique cases).
…On Tue, Jun 18, 2024 at 10:43 AM Nebojša Ćirić ***@***.***> wrote:
@mihnita <https://github.com/mihnita> could you confirm my expectation
<#33 (comment)>
on how custom functions and parameters work in MF2.0 or point me to the
docs?
—
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMHPWDEQXXVSPT3L2OLZIBWTDAVCNFSM6AAAAABJOVDWNKVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TQMBZGIZDC>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Although the spec proper does not mention grammatical inflections, they were always part of our use cases, and often used as examples / theoretical testing for the design. I strongly believe that the existing extensibility mechanisms should be able to handle most inflections.
We also played with the idea of agreement, between placeholders (one placeholder accessing grammatical properties of another placeholder) There was some opossition to that, because it would mean that a placeholder (in the example So this is not in the spec. But it is not explicitly forbidden either, so it should be possible to add it later, and it would not be against the spec. One of the test units for MF2 in ICU4J implements (a very naive) inflection for Romanian names: But the inflection algorithm is not the important part, of course. |
Beta Was this translation helpful? Give feedback.
-
This is the surface for the review (from George's email to the group):
Here are previous presentations that involve this wrapper code:
UTW Automatic Grammar Agreement in Message Formatting
WWDC23: Unlock the power of grammatical agreement | Apple
I'll add my input to this thread as I go through the APIs.
Beta Was this translation helpful? Give feedback.
All reactions