Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match numbers numerically #842

Closed
wants to merge 1 commit into from
Closed

Match numbers numerically #842

wants to merge 1 commit into from

Conversation

eemeli
Copy link
Collaborator

@eemeli eemeli commented Jul 29, 2024

Closes #675

The current text depends on "the JSON string representation of the numeric value of resolvedSelector", which is rather fuzzy: There is no the JSON string representation of a number, as each of the following represent the same numeric value:

  • 1
  • 1.0
  • 1.00
  • 1e0
  • 1e+0
  • 1e-0
  • 1e0.0
  • 1.0e0
  • ...

So rather than stringifying a number and testing that against the keys, let's instead parse the key, and compare that to the resolved value, numerically.

This also makes it clear that comparing the equality of fractional values is pretty risky in a binary world, no matter how you do it.

This choice does mean that a message like

.match {$foo :number maximumFractionDigits=$bar}
1.0 {{=1.0}}
1 {{=1}}
* {{other}}

will never format as =1, even for { foo: 1, bar: 0 }, because the 1.0 key is listed before the 1 key.

In the rare case that a selection as above is necessary, it can be done like this:

.match {$bar :integer} {$foo :number maximumFractionDigits=$bar}
1 1 {{=1.0}}
0 1 {{=1}}
* * {{other}}

@eemeli eemeli added the registry Issue pertains to the function registry label Jul 29, 2024
Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, as I noted in my comment, that there can be cases in which fractional matching might be something users desire or expect. I have occasionally found a use for fractional matching--by occasionally, here I mean "pretty rarely"--and it remains the one place where the heavily deprecated ChoiceFormat of MF1 fame is useful.

We can forbid it in :number/:integer (especially the latter).

A virtue of the existing text is that it (tries to) avoid the problem of there being many kinds of number types (byte, short, int, long, float, double, ...) the vagaries of matching operands of these various types in a type-less templating language.

In any case, I think the first step is to decide about normative text about numeric key matching.

@aphillips aphillips added normative Issue affects normative text in the specification LDML46 LDML46 Release (Tech Preview - October 2024) labels Jul 29, 2024
@eemeli
Copy link
Collaborator Author

eemeli commented Jul 29, 2024

One alternative that could also work is being more exact about how the JSON string is generated from the resolved numerical value, i.e. something like "decimal form, with no trailing zeros in the fractional part", and then keeping the string comparison.

@aphillips
Copy link
Member

Yes, that could work. On the other hand, a virtue of your proposal is that it avoids making implementers create a JSON numeric serializer if they don't already have one--they can use more readily available numeric parsers they already have. If we also chose to forbid fractional keys in :number/:integer, that would go a long way towards simplifying the problem.

@eemeli
Copy link
Collaborator Author

eemeli commented Jul 29, 2024

If we also chose to forbid fractional keys in :number/:integer, that would go a long way towards simplifying the problem.

I'd be fine with that. We'd need to find a replacement for the current "key matches the production number-literal" language, as we don't have a handy ABNF rule for integers.

@aphillips
Copy link
Member

We'd need to find a replacement for the current "key matches the production number-literal" language, as we don't have a handy ABNF rule for integers.

That text is in the numeric selection part of the registry bit. We would certainly change that if we forbid fractional keys in :number/:integer selection. The ABNF rule seems straightforward--it's just the integer part of the number-literal production:

["-"] (%x30 / (%x31-39 *DIGIT))

Note that we do not want to eliminate number-literal for key, since other selectors might need it.

@macchiati
Copy link
Member

macchiati commented Jul 30, 2024 via email

@aphillips
Copy link
Member

@macchiati I agree with you, except: this is talking specifically about the :number and :integer functions, not about any other possible number consuming ones (we should make clear, if it isn't already, that we only mean these functions). Every implementation must exactly implement these functions and their options and matching behavior.

I support the idea of allowing matches in these functions, if we can figure out a way to do so reliably. That means defining how a key like 1.5 is compared to a number value to see if they match. Consider:

.local $aNum = {1.5003 :number maximumFractionDigits=1}
.match {$aNum}
1.5    {{Does this match? The annotation suggests it should.}}
1.5003 {{Does this match?}}
one    {{Would this match if the annotation were {$aNum :integer}?}}
*      {{...}}

@eemeli
Copy link
Collaborator Author

eemeli commented Jul 30, 2024

I support the idea of allowing matches in these functions, if we can figure out a way to do so reliably. That means defining how a key like 1.5 is compared to a number value to see if they match. Consider:

.local $aNum = {1.5003 :number maximumFractionDigits=1}
.match {$aNum}
1.5    {{Does this match? The annotation suggests it should.}}
1.5003 {{Does this match?}}
one    {{Would this match if the annotation were {$aNum :integer}?}}
*      {{...}}

With the currently proposed language, the second variant would be chosen. Note the first step of the algorithm:

  1. Let value be the numeric value of resolvedSelector.

This explicitly does not consider the formatting options, only the numeric value. In this case that's some floating-point approximation of 1.5003, which we can presume is parsed to the exact same value as the 1.5003 key.

The one variant would not be selected by an :integer annotation, as the value would be rounded up to 2, which does not match that category in English.

@macchiati
Copy link
Member

macchiati commented Jul 30, 2024 via email

@eemeli
Copy link
Collaborator Author

eemeli commented Jul 30, 2024

The selection process must not use the resolvedSelector, if that value is
1.5003 while the formatted value is "1.5" (or equivalent in native format).
It must instead match a value that takes into account the fractions that
will appear. For example, for numbers, 1.0 selects differently from 1. An
example is English, where you say "1 book per person" but "1.0 books per
person". There are many other languages that have the same behavior. (This
is a long-standing requirement.)

If we want to have the 1.5 literal key match a number value 1.5003 when it is annotated with :number maximumFractionDigits=1 and we care about having literal keys 1 and 1.0 be treated differently, then we probably need to define exactly what the following options mean, and how they interact with each other:

  • notation
  • minimumFractionDigits
  • maximumFractionDigits
  • minimumSignificantDigits
  • maximumSignificantDigits

Currently, we don't define these because we don't need to, but if we want to have the value 1.5 somehow available during the selection, then we rather need to. Note that this isn't limited to fractional numbers either, as

{1234 :integer maximumSignificantDigits=2}

formats as 1,200 in English and so by the same logic should not match a literal key 1234. This means that it's not quite enough to only match on integral keys.

And then of course there are the interactions, i.e. what to do when both minimumFractionDigits and minimumSignificantDigits are set.

I think that's way too much work for us to do realistically, and in a way that does not add restrictions on implementations, given that we have thus far not required that number formatting works exactly as defined by LDML Part 3. Especially as in the real world selection on fractional numbers is really rare.

The two other alternatives that leaves us with are:

  1. Leave exact matching as implementation-defined behaviour at least when any of the above options are set.
  2. Ignore the options when matching exact keys.

This PR proposes that we choose the latter option of the above, just like MF1 does with its offset being ignored when comparing values against exact keys.

That is why the selection process needs to be left up to the :number
function itself, so that it can assess whether a literal key — or plural
category — matches a value and what the preferential order is, taking into
account all of the formatting options that could have an effect on that
matching.

This is what we're doing, though? This PR is about defining the :number and :integer selection behaviour, and how they assess matches against literal keys.

@aphillips
Copy link
Member

@eemeli I agree with @macchiati about what is wanted (hence my example). The resolved value should not be altered, since it might be needed later. I don't agree with your conclusion to ignore options when matching exact keys, because that doesn't match what authors want/need.

I fully agree that this is complicated! In addition to the items you call out, note that we do not want some options involved in value matching, e.g. useGrouping. We don't want the key literals in numbers to themselves be localized or depend on locale data!!

I also agree that we should not exactly specify how it is implemented. But in order to work and be interoperable (the same results appear on all implementations and thus can be authored portably--which is a goal of ours) we need to describe what literal to use to match a given value.

So it needs to be as complicated as necessary--no more and no less. The resulting key would use rules something like the following (I am certain to be overlooking some details, but feel confident that I am not overlooking so many as to make the project hopeless):

  • use a sign only if negative and then only if signDisplay is not never
  • use the maximum number of significant digits
  • use scientific notation (exponents) only if notation is scientific or for values above/below certain sizes (0.1e-6 seems pretty common on the small side??)
  • use a decimal point and fraction digits only if the formatted number of fraction digits is greater than zero

I think this gives us:

Selector Key
{-1 :number} -1
{-1.0 :number} -1.0
{-1.234 :integer} -1
{-1.234 :number maximumFractionDigits=2} -1.23
{0.002 :number} 0.002
{0.000002 :number} 0.2e-6
{1234 :number maximumSignificantDigits=2} 1200
{1234.5678 :number maximumSignficantDigits=2 minimumFractionDigits=2} 1200.00
{123e19 :number} 1.23e21

Because the key MUST be a string, the current specification, i.e. "the JSON string representation of the numeric value of resolvedSelector" with the addition of specifying certain details about the serialization makes sense to me. What my goal was above was to describe what a JSON serializer would do by default, with some tweaks for MF2 user's convenience.

(Probably someone has done what I'm reconstructing above and done it better)

One thing I like about what we have (noting that, as above, it needs some work) is that we need to specify "the format of the key" in addition to what it matches. Matching can be implemented by parsing the key into an appropriate numeric representation and comparing values, or, as we describe, by computing the key string from the value--so long as the matching works.

@eemeli
Copy link
Collaborator Author

eemeli commented Jul 31, 2024

@aphillips I think we're arguing about corner cases that don't actually exist. Of the keys in the table you show, the only ones I can believe will ever show up in real-world messages are the -1 ones, and even those are a far stretch -- and supported by the current as well as the proposed language.

I do not believe we need to actually care about messages that provide special variants for values that format as -1.23 or 1200. But as before, I'm happy to be proven wrong: What would such a message look like? Can we show that there is a localization user story for the complexity required for serving these edge cases?

@aphillips
Copy link
Member

?? I'm trying to think about the problem of describing an exact key match serialization for a wide variety of numbers. A technical specification should color all the way into the corners or have big enough limits to present no technical obstacle.

I agree that there are a bunch of corner cases that we need not optimize for. Modern numeric types allow a huge number of digits, such that, presumably, one could have keys like 11111111111111111111111.11111111111111111111 which our serialization scheme might not reliably handle. However, most values that you can type as number literals in code should work in a selector (because we don't want people to write code instead of using a selector).

I have seen a reasonably large number of real world message cases in which locally non-arbitrary numbers have special significance. Product managers and user experience designers seem to delight in planting such Easter eggs 🫨 into messages. What's more, we need to enable developers and translators to understand the "theory of exact selection" so that they can write "normal" messages--especially with limits that we didn't think of.

Anyway, you asked for an example. Driving directions are always good for arbitrarily chosen numbers bigger than 10:

.input {$distanceRemaining :unit unit=meter}
.local $distanceThreshold = {$distanceRemaining :number maximumSignificantDigits=2}
.match {$distanceThreshold} {$distanceRemaining}
1200 *   {{You are coming up on your destination in about: {$distanceRemaining!}}
*    -50 {{You missed it! Recalculating route...}}
*    one {{You have {$distanceRemaining} to go.}}
*    *   {{You have {$distanceRemaining} to go.}}
*    0   {{You have arrived!}}

I can probably think of cases where I used fractions with three or four digits in a selector (three isn't a problem: there are currencies with three fraction digits). Maybe I'll come back later and add one.

@macchiati
Copy link
Member

​> we need to describe what literal to use to match a given value.

I agree. We don't have to specify exactly how :number is implemented, just how the matching works. However, I think we can describe a logical process that produces the right results for matching. That is, I think we can describe it as relating to the integer and fraction digits that result, instead of focusing too much on exactly the options used to produce the result.

Just to emphasize, what is important is the actual visible formatted integer/fraction digits, discounting any grouping separators, and variations in the characters used for those digits and the decimal.

Suppose that the source number is the following, and it is formatted for locale X, which has plural rules PR. (I'm using a large number just to include some grouping separators.)

123456.789

and this formats as

१२’३४५६•७९०

Those digits correspond to the following (removing grouping separators), and that is what needs to be considered when matching.

123456.790

Notes:

  • I say "correspond to", because the :number function doesn't have to generate ASCII digits with a decimal internally, it just has to behave as if that is what is used for matching.
  • It is rounded to 2 decimals, but shows a final zero, according to some bag of options (which could be extended in the future).

When matching, that is what the :number function must be using to match against the keys. And which keys match and what the preferential order is will depend on the plural rules. So the final zero is significant in some locales (because of their plural rules) and not significant in other locales.

Note: if :number function's locale specifies a non-decimal number format (which can happen, although rare), then I think we should spec that :number function then matches keys according to the numeric value of the formatted result.

operand: 1.414
formatted result: "1⅖"
Match status, in preferential order:
1.4 ==> matches
other ==> matches
1.414 ==> no match
one ==> no match

@eemeli
Copy link
Collaborator Author

eemeli commented Aug 1, 2024

Anyway, you asked for an example. Driving directions are always good for arbitrarily chosen numbers bigger than 10: [...]

Ugh, that's ugly, I did not consider that maximumSignificantDigits could be used as a hacky way to do choiceformat. But sure, I'll grant you that's a possible use case for selecting on a formatted rather than exact numerical value. But we probably should not advertise it.

​> we need to describe what literal to use to match a given value.

I agree. We don't have to specify exactly how :number is implemented, just how the matching works. However, I think we can describe a logical process that produces the right results for matching. That is, I think we can describe it as relating to the integer and fraction digits that result, instead of focusing too much on exactly the options used to produce the result.

Just to emphasize, what is important is the actual visible formatted integer/fraction digits, discounting any grouping separators, and variations in the characters used for those digits and the decimal.

[...] the :number function doesn't have to generate ASCII digits with a decimal internally, it just has to behave as if that is what is used for matching.

One aspect we should keep in mind here is that in many implementations, a number formatter does not provide access to any intermediate results, and so to get at them, a second separate formatter instance needs to be built with some subset of the input options, and something like a formatted-parts result produced in order to determine what the ASCII digits would look like. That's a lot of work.

As we already have the matching process well defined in Pattern Selection, and that if indeed we're fine not defining exactly how exact selection works, then we could use something like the current "JSON string representation" language, as long as we note that implementations are free to determine whether and how to take into account any formatting options when generating it.

@macchiati
Copy link
Member

a number formatter does not provide access to any intermediate results, and so to get at them, a second separate formatter instance needs to be built with some subset of the input options, and something like a formatted-parts result produced in order to determine what the ASCII digits would look like.

Such an implementation would fail with plural categories anyway; and certainly would give results that are incompatible with number formatters that correctly handled plural categories. Unless you want the results of numeric selection to be ambiguous across implementations, that's not the way to go.

What you are talking about is building plural selection on top of a number formatter NF that either doesn't do plural selection, or doesn't do it correctly. If you have to use that code for your MF2 implementation, you could hack a shim on top of that, where if NF produces decimal numbers, and they are not compact decimals; if you can find out the following:

  • which numberSystem is used (Latin digits, Devanagari digits, etc...)
  • what the decimal separator is

You would then quickly scan the formatted string (eg "१२’३४५६•७९०") and convert to the Latin numberSystem equivalent (123456.790). Not pretty, but gets the job done.

Compact decimals would take a bit more work, but it is unlikely that such a basic number formatter would implement that anyway.

@eemeli
Copy link
Collaborator Author

eemeli commented Aug 2, 2024

One example of the type of situation I'm describing is JavaScript, which provides Intl.NumberFormat and Intl.PluralRules as two separate APIs. Together, they allow for proper number formatting as well as proper plural category selection, but do not provide access to a decimal representation of the number being formatted or matched to a plural category.

Also note that even if exact number selection is left as implementation-defined in this spec, the JS Intl.MessageFormat spec will need to define that exactly. Right now, it stringifies numbers in step 16.b. of this algorithm as follows:

Let str be ? ToString(input).

where input is a Number or a BigInt. I would very much prefer the MF2 spec not require changing this one line into a sequence of steps that involve constructing a separate Intl.NumberFormat instance with a custom options bag and using that to format a number to use for these comparisons, which in nearly all cases will only feature single-digit integer keys that are already matched by the current implementation.

@mihnita
Copy link
Collaborator

mihnita commented Aug 5, 2024

We should not do this "because json"
If there are other arguments, maybe (I would like to hear them).

If we go this way, where do we stop? (Yes, the slippery slope argument)
But will we do something like this because of yaml?
As in, "20:80" is time, but 80:80 is a map entry? And "True", "Yes", "On", "y" in any values because they are unified to a boolean?


If this tries to deal with a serialization to json, then one can do "|1.00|" or "=1.00"

The current text depends on "the JSON string representation of the numeric value of resolvedSelector",

Then we should change the text.
The MF2 spec should not depend on JSON in any way.

@mihnita
Copy link
Collaborator

mihnita commented Aug 5, 2024

Also note that a similar proposal was previously rejected:
#712

@macchiati
Copy link
Member

Right now, it stringifies numbers in step 16.b. of this algorithm as follows:

What we are concerned with is not the stringification of numbers, but rather how the number selector function does matches. I agree that most cases will not have decimals, but we have to get them right if they are allowed as selection keys.

So we have a few choices:

  1. Specify that matching correctly handles decimals according to plural rules
  2. Forbid decimals in selection (at least for now).
    1. Add number-key-literal = ["-"] (%x30 / (%x31-39 *DIGIT)),
    2. Promulgate that through so key = name / number-key-literal
  3. Specify that the matching of decimal literals is implementation-defined.

@mihnita
Copy link
Collaborator

mihnita commented Aug 9, 2024

What do you think though about differentiating between |123| and 123 ?

One resolves to a string, compares as a string, serializes as a string.
And the second resolves / compares / serializes as a number?

And then we can describe matching of decimal literals and not have it implementation-defined.

@aphillips
Copy link
Member

What do you think though about differentiating between |123| and 123 ?

We specifically forbid differentiating between these. And we should not differentiate these representations. The literal quotes are optional and we did a lot of work to permit unquoted literals where reasonable. Implementations and users should be free to quote any literal for whatever reasons suits them and to unquote any literal that fits our syntax's requirements.

I prefer this of @macchiati's options:

Specify that matching correctly handles decimals according to plural rules

This doesn't seem that hard to me. It does mean, to @eemeli's point, that we need enough information to format the number (so that we can trim significant digits and ensure including the correct number of fraction digits). We might need to permit/require that implementations support both regular and scientific notation in keys (at least for some values). The key can be checked to know if scientific is required.

I agree that it might be inconvenient for the implementer, but that inconvenience is in service of making message author and translator lives easier. That seems much more important to me than avoiding the need for a selector to compute the serialization of the digits as formatted.

@mihnita
Copy link
Collaborator

mihnita commented Aug 12, 2024

We specifically forbid differentiating between these.

And we should not differentiate these representations.
Opinion?

The literal quotes are optional and we did a lot of work to permit unquoted literals where reasonable.
The fact that we did a lot of work to do something didn't prevent us from changing it.

Note that what I suggested is only a change in behavior.
Does not change the syntax.
And the values are interpreted as numbers anyway.
When one says "...{$val :number minFractionalDigits=2" that option is a number, not a string.

I think Eemeli's change proposes that we only do that string -> number conversion earlier, in the resolve phase.
So that we have numbers there in the data model.


About plurals: I don't think there there is a need to match against 1.00 (with zeros).
The plural rules already tell us numbers like that match against other.

It is a bit like specifying an explicit 21 in ordinals.
It is not needed, and it only adds a case that might not even be relevant in other languages.
We put it there only because we think in terms of English.


I agree that it might be inconvenient for the implementer, but that inconvenience is in service of making message author and translator lives easier. That seems much more important to me than avoiding the need for a selector to compute the serialization of the digits as formatted.

I don't think there is a need to do that.
The match can easily be done just by looking at the flags of the formatter.

Note that we should not think of this as "format to string and the and compare"

Because the formatted "1.234,56" and "١٬٢٣٤٫٥٦" don't match "1234.56" in the selector.
So we must format with ASCII digits, no thousand separator, dot as decimal separator in order to match.

making message author and translator lives easier

There is no reason that I can think of for someone to add a 1.00 match case.
It is already covered by the plural rules of the language.
And it is locale dependent.


Note: I am split about this change.
But comparing strings is highly problematic to implement, and useless for a real use case.

@echeran
Copy link
Collaborator

echeran commented Aug 12, 2024

+1 to what @macchiati and @aphillips said. We have to support correct i18n, and thus we should support CLDR/LDML plural rules on numbers. This also requires formatting fixed decimals, for which @macchiati gave good examples, and this is important to users, as @aphillips said.

We've talked as a group at least 5-6 times at this point about the need for plural selection in message format to format a number before selection. This point shouldn't be surprising to us.

Both ICU4C and ICU4J have their own implementation of fixed decimals because C++ and Java either don't have it, or it's buggy, or it's slow, or some combination. ICU4C vendored in (copied in) 3rd party open source code to support fixed decimals, while ICU4J's newer number formatter wrote an implementation from scratch because of shortcomings of usability & speed in Java's BigDecimal.

@aphillips
Copy link
Member

@mihnita noted:

But comparing strings is highly problematic to implement, and useless for a real use case.

Comparing strings isn't that problematic, as long as the format is clearly established. Also, note well: the actual comparison does not have to be on strings. It only has to produce equivalent results. Not every language or runtime environment has fixed decimal types (as noted by @echeran), so we cannot require that. IEEE formats have inherent limits too. And MF2 is typeless.

At the same time, user's intentions are generally pretty clear, as long as we give them the right tools. Exact matching numbers will generally be for reasonable values that don't test numeric type boundary conditions (matching the value of pi or e is probably not the right use case.

Anyway, our call is starting so won't finish this thought.......

@echeran
Copy link
Collaborator

echeran commented Aug 12, 2024

Comparing strings isn't that problematic, as long as the format is clearly established.

The LDML spec on plural operands has a syntax that includes a format for the values, and how to interpret the different parts of a formatted number (aka "plural rule operands") for the purpose of looking through the plural rules of a locale to return the matching plural category. It also includes how to represent numbers in scientific notation and compact notation.

To put a fine point on it, I think this is the format we should be using. I assume that this is the format that @macchiati and @aphillips are referring to, as well.

Also, note well: the actual comparison does not have to be on strings. It only has to produce equivalent results.

Interesting, I suppose we could string compare, but it does require that each implementation be able to format to string correctly (correct = according to the agreed upon format (see above)).

Not every language or runtime environment has fixed decimal types (as noted by @echeran), so we cannot require that. IEEE formats have inherent limits too. And MF2 is typeless.

The unstated other part of my point is that, if necessary, implementations could add support fixed decimals, even when their language doesn't support it, or support it well.

However, the question remains whether an implementation needs to support fixed decimals internally. I think the answer is "technically, no, but you should care if you want to provide a good number formatter". It looks like ECMA-402 has BigInt (arbitrary precision integers), not a "BigDecimal" that provides an arbitrary precision fixed decimal, but instead of a "BigDecimal" it allows a string input that conveys that fixed decimal instead. However, that affordance via a string is not very usable for users -- it's a slight chicken-and-egg problem to turn a number into a proper string in the first place, and regardless you can't perform arithmetic on it like you would a proper number type.

In short, if implementations can support the matching correctly based on the LDML format for values, then let them choose whatever implementation approach works, but it needs to be consistently correct.

@aphillips
Copy link
Member

We seem to not be making progress using this PR.

There is a design doc, but it's just the text in the specification, plus some requirements and such. I'll make a PR to refine the requirements/use cases to try to capture what the problems are and begin to document alternatives.

aphillips added a commit that referenced this pull request Aug 13, 2024
This is to build up and capture technical considerations for how to address the issues raised by @eemeli's PR #842.
@mihnita
Copy link
Collaborator

mihnita commented Aug 16, 2024

I did some more thinking about this.

The matches against number-like keys are currently used for exact plurals.

So that one can do things like "1 => You won the gold medal" or "this is the last day"
as opposed to "You ended in the #st place" or "You have # more days"

The "1 dollar" vs "1.00 dollars" is already handled by the standard CLDR plural rules, and does not need an exact match,
or compare against a string.

In fact the ICU4J implementation of plural right now does not need to do any formatting.
The plural rules inform how many zeros we will have without formatting (one = i = 1 and v = 0)
And the decision is one or other (* in MF2).

So I don't really see a compelling use case for plurals as we have them today.

If a formatter in the future (or even plurals) finds a good use case to compare exact against strings, then that selector can decide to match against |=1.00| (instead of 1.00). Or some other syntax wrapped in |...|.

The keys used for matching and how matching is done is a selector function decision, and it is not part of the mf2 spec.
That would also serialize properly to json (and other formats).

TLDR: I support this change.

@mihnita
Copy link
Collaborator

mihnita commented Aug 16, 2024

Comparing strings isn't that problematic, as long as the format is clearly established

Comparing is not the problematic part, formatting the value to string is.

An implementation would have to create 2 formatters.

Taking 1234567.98765 both formatters would have to honor the options (min/max digits, min/max fractional digits, etc).

One, used for selection, would have to generate 1234567.98.

The other would generate the locale sensitive form to use for rendering, which we can't compare against the key.
With thousand separators, different decimal separator, even different digits ("1.234.567,98" or "١٬٢٣٤٬٥٦٧٫٩٨").

That is expensive.

@mihnita
Copy link
Collaborator

mihnita commented Aug 16, 2024


Also an implementation friction:
since many formatters / selectors can take "options that look like numbers" (that are now strings in the data model), it means that to do anything useful with them they will have to parse them to a number.
(think maxFractionalDigits and so on).

For a 3rd party the rules would have to be:

  • If you take numeric parameters, they MUST follow the MF2 syntax.
  • You will get them in a string format, and IT IS ON YOU to convert them to numbers.
  • IT IS UNSAFE to use your own platform APIs to convert string to number, because there is no guarantee that the result will be the same as what MF2 intends to parse. Because "as long as the format is clearly established" applies to MF2. That "clearly established" might not match what the hosting platform does.

This means:

  • all implementations will have to provide some kind of public API for string-to-number parsing
  • and all 3rd party formatters / selectors will have to be careful to use that method to do the parsing

That is error prone, and an extra burden on implementations (extra public APIs) and users.

If the implementation already parses the numbers and they are stored in the data model as numbers, then the parsing part does not have to be public, and developers writing custom formatters / selectors don't have to do anything special.


And that is also affects the json serialization of the data model

"parts": [
   {"text":"The price is "},
   {"type":"placeholder", "function":"number", "options": {"minFractionalDigits":2}}
]

Note the "minFractionalDigits":2 option.
That has the same problem (that 2.00 and 2 will parse the same from json)


TLDR: I would argue that not only the selection should be done on numbers, but also the data model should store things as numbers.

@mihnita mihnita added the blocker-candidate The submitter thinks this might be a block for the Technology Preview label Aug 28, 2024
@aphillips aphillips removed blocker-candidate The submitter thinks this might be a block for the Technology Preview LDML46 LDML46 Release (Tech Preview - October 2024) labels Sep 17, 2024
@catamorphism
Copy link
Collaborator

In the example in Eemeli's initial description:

.match {$foo :number maximumFractionDigits=$bar}
1.0 {{=1.0}}
1 {{=1}}
* {{other}}

wouldn't it be consistent with the changes in this PR to also make this message emit a "Duplicate Variant Key" error? If the selector is compared numerically against the numeric values of the keys, then the keys 1.0 and 1 are effectively duplicates.

(Sorry if this was already asked, I haven't caught up with all the comments.)

@@ -421,6 +421,11 @@ numeric selectors perform as described below.
> Implementations are not required to implement this exactly as written.
> However, the observed behavior must be consistent with what is described here.

> [!IMPORTANT]
> The binary representation of floating point numbers is not always exact.
> Users should avoid using keys with fractional values,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a "SHOULD"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we should fix numeric matching as in #859 and abandon this PR 😉

@eemeli
Copy link
Collaborator Author

eemeli commented Oct 12, 2024

wouldn't it be consistent with the changes in this PR to also make this message emit a "Duplicate Variant Key" error? If the selector is compared numerically against the numeric values of the keys, then the keys 1.0 and 1 are effectively duplicates.

No, because the equation of 1.0 with 1 is internal to :number, and Duplicate Variant Key is a data model error that can't depend on functions.

aphillips added a commit that referenced this pull request Nov 4, 2024
* [DESIGN] Number selection design refinements

This is to build up and capture technical considerations for how to address the issues raised by @eemeli's PR #842.

* Update examples to match changes to syntax

Also responds to the long discussion with @eemeli about significant digits by removing from the example.

* Address 2024-09-16 call comments

This changes the status to "Re-Opened" and adds a link to the PR. Expect to merge this imminently, although discussion on number selection remains.

* Update exploration/number-selection.md

Co-authored-by: Eemeli Aro <[email protected]>

* Update from main (#914)

* Create notes-2024-08-19.md

* Accept attributes design & remove spec note (#845)

* Accept attributes design & remove spec note

* Disallow duplicate attribute names (closes #756)

* Add link to contextual options PR

* Add more prose to tag example text

Co-authored-by: Addison Phillips <[email protected]>

* Mention attribute validity condition in the **_valid_** definition

---------

Co-authored-by: Addison Phillips <[email protected]>

* Update selection-declaration design doc based on mtg / issue discussion (#867)

* Add tests for pattern selection (#863)

* Add tests for pattern selection

* Add missing errors

* Apply suggestions from code review

Co-authored-by: Addison Phillips <[email protected]>

---------

Co-authored-by: Addison Phillips <[email protected]>

* Add Duplicate Variant to table in test/README.md (#861)

* Add new selection-declaration alternative: Require annotation of selector variables in placeholders (#860)

* Add new selection-declaration alternative: Require annotation of selector variables in placeholders

* Improve examples

* Switch example order

* Update the stability policy (#834)

* Update the stability policy

Based on discussion in the 2024-07-22 call and in PR #829, update the stability policy.

* A deeper, more thorough rewrite

- Standardizes the phrasing completely.
- Moves all potential future changes (which are not, after all, stability policies) to an "important" block
- Removes duplication
- Separates functions, options, and option values into separate guarantees
- Clarifies the note about formatting changing over time

* Update spec/README.md

Co-authored-by: Tim Chevalier <[email protected]>

* Update spec/README.md

Co-authored-by: Eemeli Aro <[email protected]>

* remove well-formed

* Update spec/README.md

---------

Co-authored-by: Tim Chevalier <[email protected]>
Co-authored-by: Eemeli Aro <[email protected]>

* Refine error handling text (#816)

* Refine error handling text

* Apply suggestions from code review

Co-authored-by: Addison Phillips <[email protected]>

* Update fallback text

* Turn bullet point list into paragraphs

* Be more mighty

Co-authored-by: Addison Phillips <[email protected]>

---------

Co-authored-by: Addison Phillips <[email protected]>

* Create notes-2024-08-26.md

* Select "Match on variables instead of expressions" for selection-declarations (#824)

* Select "Match on variables instead of expressions" for selection-declarations

* Add hybrid option to selection-declaration.md (#870)

* Add hybrid option to selection-declaration.md

* Update selection-declaration.md

fixed glitch in original edit

* Update selection-declaration.md

* Apply suggestions from code review

Fixing typos

Co-authored-by: Addison Phillips <[email protected]>

* Update selection-declaration.md

* Update exploration/selection-declaration.md

Co-authored-by: Eemeli Aro <[email protected]>

* Update exploration/selection-declaration.md

Co-authored-by: Eemeli Aro <[email protected]>

* Update exploration/selection-declaration.md

Co-authored-by: Eemeli Aro <[email protected]>

---------

Co-authored-by: Addison Phillips <[email protected]>
Co-authored-by: Eemeli Aro <[email protected]>

* Update selection-declaration.md

---------

Co-authored-by: Mark Davis <[email protected]>
Co-authored-by: Addison Phillips <[email protected]>

* Fix "Allow immutable input declarative selectors" example (#874)

* Update README.md (#875)

* Update README.md

* Update README.md

* [DESIGN] Update bidi design document to show proposed design (#871)

* [DESIGN] Update bidi design document to show proposed design

The design I actually think we should adopt is the "hybrid approaches" one. This is a necessary first step on the highway to UAX31 compliance and I think is responsibly contained/managed. It is a hybrid approach, in that it permits testable strict implementations to be created (particularly for message serialization).

This PR consists of moving text around. I added one "pro" to one option also.

* Address comments

* Miscellaneous test fixes (#862)

* Add missing expected bad-selector errors

* Fix expected parts for unsupported-statement test

* Add a few new tests for leading-whitespace and duplicate-variant

* Add tests for escaped-char changes made in #743

* Fix tests for attributes with variable values

* Update contributing and joining info (#876)

* Update contributing and joining info

* Update README.md

* Update CONTRIBUTING.md

* Restore CLA copy

* Clarify error & fallback handling (#879)

* Clarify error & fallback handling

* Apply suggestions from code review

Co-authored-by: Addison Phillips <[email protected]>

* Select last rather than first attribute

* Drop mention of "starting with Pattern Selection"

* Attributes can't change the formatted output

* Use "nor" instead of "or" regarding attribute restrictions

---------

Co-authored-by: Addison Phillips <[email protected]>

* Clarify rule selection (#878)

* Clarify rule selection

Fixes #868 

This adds normative SHOULD language to using CLDR plural and ordinal data, which was intended originally.

- clarifies that keyword selection follows exact match
- clarifies the purpose of rule-based selection
- makes non-CLDR-based implementation permitted

* Update spec/registry.md

Co-authored-by: Eemeli Aro <[email protected]>

* Update spec/registry.md

Co-authored-by: Eemeli Aro <[email protected]>

* Update spec/registry.md

Co-authored-by: Eemeli Aro <[email protected]>

---------

Co-authored-by: Eemeli Aro <[email protected]>

* [DESIGN] Maintaining the Standard, Optional and Unicode Namespace Function Sets (#634)

* Design doc to capture registry maintenance

* Update maintaining-registry.md

* Update exploration/maintaining-registry.md

Co-authored-by: Tim Chevalier <[email protected]>

* Update exploration/maintaining-registry.md

Co-authored-by: Tim Chevalier <[email protected]>

* Add user stories, small updates to RGI

* Update exploration/maintaining-registry.md

* Adding additional detail

* Remove machine readable registry; update prose

* Update maintaining-registry.md

* Further development work

* Update to change format and naming

Per the 2024-08-19 call, we decided to switch towards a specification-per-function model, with statuses. This commit includes the initial set of changes to try and implement this.

* Address some comments.

---------

Co-authored-by: Tim Chevalier <[email protected]>

* Create notes-2024-09-09.md

* Fix a typo in an example (#880)

The upcoming work to implement resolved value might make this patch unnecessary or obsolete, but fixing the typo (missing `{`/`}` around the variable in the pattern) just in case

* Remove forward-compatibility promise and all reserved & private syntax (#883)

* Remove forwards compatibility from stability guarantee

* Drop reserved statements and expressions

* Drop private-use annotations

* Update tests

* Clarify that deprecation is not removal

* Match on variables instead of expressions (#877)

* Match on variables instead of expressions

* Apply suggestions from code review

Co-authored-by: Addison Phillips <[email protected]>

* Apply suggestions from code review

* Add missing test changes noticed during implementation

* Empty commit to re-trigger CLA check

---------

Co-authored-by: Addison Phillips <[email protected]>

* Create notes-2024-09-10.md

* Add bidi support and address UAX31/UTS55 requirements (#884)

* Add bidi support and address UAX31/UTS55 requirements

Adds the bidi strong marks ALM, RLM, and LRM plus the bidi isolate controls LRI, RLI, FSI, and PDI to the syntax.

Formally defines optional vs. non-optional whitespace.

Non-optional whitespace must include at least one whitespace character. Optional whitespace may contain only bidi marks (which are invisible)

* Update syntax.md including text from previous PR

* Repair the guidance on strongly directional marks

Include ALM and better specify how to use the marks.

* Fix formatting of the "important"

* Add bidi characters to description of whitespace.

* Permit bidi in a few more places

Add optional whitespace at the start of `variant`

Add optional whitespace around `quoted-pattern`

These changes result in allowing bidi around keys and quoted patterns as intended.

* Update syntax.md ABNF

* Update formatting.md

- Add a note about the difference between formatting and message syntax.
- Clarify the sentence about message directionality.

* Address comment about name/identifier

* Address comments related to bidi in `name`

* Fix variable's location

* Address comment about the list of LRI/PDI targets

* One character typo :-P

* Update spec/syntax.md

Co-authored-by: Eemeli Aro <[email protected]>

* Address comments about rule R3a-1

* Update spec/syntax.md

Co-authored-by: Eemeli Aro <[email protected]>

* Address comment about U+061C

* Change [o]wsp => `o` or `s`

* Match syntax spec to abnf

* Remove *

* Update syntax.md

* Update spec/syntax.md

Co-authored-by: Eemeli Aro <[email protected]>

* Update spec/message.abnf

Co-authored-by: Eemeli Aro <[email protected]>

* Update spec/message.abnf

Co-authored-by: Eemeli Aro <[email protected]>

* Update syntax.md

* Update spec/message.abnf

Co-authored-by: Eemeli Aro <[email protected]>

* Update spec/syntax.md

Co-authored-by: Eemeli Aro <[email protected]>

* Update spec/syntax.md

Co-authored-by: Eemeli Aro <[email protected]>

---------

Co-authored-by: Eemeli Aro <[email protected]>

* Specify `bad-option` for bad digit size option values (#882)

* Specify `bad-option` for bad digit size option values

Fixes #739

* adopt 'non-negative integer'

* Create notes-2024-09-16.md

* Address name and literal equality (#885)

* Address name and literal equality

This change defines equality as discussed in the 2024-09-09 teleconference in the following ways:

- It defines _name_ equality as being under NFC
- It defines _literal_ equality as explicitly **not** under NFC
- It moves _name_ before _identifier_ in that section of text to avoid a forward definition.

Note that this deviates from discussion in 2024-09-09's call in that we didn't discuss literals at length. It also doesn't discuss non-name/non-literal values, which I'll point out are limited to ASCII sequences such as keywords.

* Typo fix

* Add a note about not requiring implementations to actually normalize

* Implement changes dicussed in 2024-09-16 call.

- Make _key_ require NFC for uniqueness/comparison
- Add a note about NFC
- Make _literal_ **_not_** define equality
- Make text in _name_ identical to that in _key_ for consistency

* Update formatting.md to include keys in NFC

* Address comments

* Update spec/syntax.md

Co-authored-by: Eemeli Aro <[email protected]>

* Update spec/syntax.md

Co-authored-by: Eemeli Aro <[email protected]>

---------

Co-authored-by: Eemeli Aro <[email protected]>

* Update list of normative changes during the LDML45 period (#890)

* Fix typos in data-model-errors tests (#892)

Fix #886

* Update note on exact numeric match for v46 (#891)

Addresses #887 

Non-normative changes to the notes specifically part of LDML46

* Fix attribute value to be literal (#894)

Fixes #893

* Create notes-2024-09-30.md

* Add Resolved Values and Function Handler sections to formatting (#728)

* Add Resolved Values section to formatting

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Tim Chevalier <[email protected]>

* Linkify "resolved value"

* Add some examples & explicitly allow wrapping input values

* No throw, only emit

Co-authored-by: Tim Chevalier <[email protected]>

* Add section on Function Handlers, defining the term

* Apply suggestions from code review

* Rephrase initial resolved value definition

* Update spec/formatting.md

Co-authored-by: Eemeli Aro <[email protected]>

* Update resolved value definition again

Co-authored-by: Addison Phillips <[email protected]>

---------

Co-authored-by: Tim Chevalier <[email protected]>
Co-authored-by: Addison Phillips <[email protected]>

* Define function composition for :number and :integer values (#823)

* Define function composition for :number and :integer values

* Apply suggestions from code review

Co-authored-by: Addison Phillips <[email protected]>

* Add operand option priority example

* Add apostrophes'

Co-authored-by: Tim Chevalier <[email protected]>

* Update spec/registry.md

Co-authored-by: Eemeli Aro <[email protected]>

* Update spec/registry.md

Co-authored-by: Eemeli Aro <[email protected]>

---------

Co-authored-by: Addison Phillips <[email protected]>
Co-authored-by: Tim Chevalier <[email protected]>

* Create notes-2024-10-07.md

* Apply NFC normalization during :string key comparison (#905)

* Apply NFC normalization during :string key comparison

* Add link to UAX#15

Co-authored-by: Addison Phillips <[email protected]>

---------

Co-authored-by: Addison Phillips <[email protected]>

* Add tests for changes due to bidi/whitespace (#902)

* Add tests for changes due to bidi/whitespace

* Correct output

* Make erroneous test a syntax error

* Define function composition for date/time values (#814)

* Define function composition for date/time values

* Apply suggestions from code review

Co-authored-by: Stanisław Małolepszy <[email protected]>

* Drop the "only"

* Update spec/registry.md

* Update spec/registry.md

Co-authored-by: Eemeli Aro <[email protected]>

* Update spec/registry.md

Co-authored-by: Eemeli Aro <[email protected]>

* Update spec/registry.md

Co-authored-by: Eemeli Aro <[email protected]>

* Make :date and :time composition implementation-defined

---------

Co-authored-by: Stanisław Małolepszy <[email protected]>
Co-authored-by: Addison Phillips <[email protected]>

* DESIGN: Add alternative designs to the design doc on function composition (#806)

* DESIGN: Add a sequel to the design doc on function composition

This document sketches out some alternatives for the machinery
provided to enable function composition.

The goal is to provide an exhaustive list of alternatives.

* Remove 'part 2' document and move contents to the end of part 1

* Revise introduction to reflect the changed goal

* Edited for conciseness

* Further edits for conciseness

* Give a name to InputType and use it

* Refer to motivating examples

* Update function-composition-part-1.md status

Per 2024-10-14 telecon

* Create notes-2024-10-14.md

* Add test for :integer and :number composition (#907)

* Fix `:integer` option `useGrouping` values (#912)

I noticed that `:integer` does not include the "never" value for the option `useGrouping`. This is a bug.

* Drop syntax note on additional bidi changes (#910)

Drop syntax note on addition bidi changes

* Add tests for changes due to #885 (name/literal equality) (#904)

* Add tests for changes due to #885 (name/literal equality)

* Update test/tests/functions/string.json

Co-authored-by: Eemeli Aro <[email protected]>

* Update test/tests/syntax.json

Co-authored-by: Eemeli Aro <[email protected]>

* Update test/tests/functions/string.json

Co-authored-by: Eemeli Aro <[email protected]>

* Added tests for reordering and special case mapping

* Add another selection test

---------

Co-authored-by: Eemeli Aro <[email protected]>

* Add u: options namespace (#846)

* Move spec/registry.md -> spec/registry/default.md

* Add Unicode Registry definition

* Refer to BCP47, add note about only requiring normal tags

* Call it a namespace

* Apply suggestions from code review

Co-authored-by: Addison Phillips <[email protected]>

* Fix test file reference

Co-authored-by: Tim Chevalier <[email protected]>

* Apply suggestions from code review

* Update spec/u-namespace.md

Co-authored-by: Eemeli Aro <[email protected]>

* Apply suggestions from code review

Co-authored-by: Addison Phillips <[email protected]>

* Apply suggestions from code review

Co-authored-by: Addison Phillips <[email protected]>

* Add mention of functions to namespace description

---------

Co-authored-by: Addison Phillips <[email protected]>
Co-authored-by: Tim Chevalier <[email protected]>

* Define function composition for :string values (#798)

* Define function composition for :string values

* Update spec/registry.md as suggested by @stasm in #814

* Drop the "only"

* Update text following code review comments

---------

Co-authored-by: Addison Phillips <[email protected]>

* Drop data model request for feedback on "name" (#909)

* Allow surrogates in content, issue #895 (#906)

* Allow surrogates in content, issue #895

* Grammar and typos, linkify terms, make into a note, and fix 2119 keywords

Thanks Addison!

Co-authored-by: Addison Phillips <[email protected]>

* Not using "localizable elements"

Co-authored-by: Addison Phillips <[email protected]>

* Keep syntax.md in sync with message.abnf

* Added note about surrogates to quoted literals

* Moved the note about surrogates from Security Considerations to The Message

* Update spec/syntax.md

* Update spec/syntax.md

* Italicize  in a couple of places

* Implemeted more (all?) feedback from review

---------

Co-authored-by: Addison Phillips <[email protected]>

---------

Co-authored-by: Eemeli Aro <[email protected]>
Co-authored-by: Elango Cheran <[email protected]>
Co-authored-by: Tim Chevalier <[email protected]>
Co-authored-by: Mark Davis <[email protected]>
Co-authored-by: Danny Gleckler <[email protected]>
Co-authored-by: Steven R. Loomis <[email protected]>
Co-authored-by: Stanisław Małolepszy <[email protected]>
Co-authored-by: Eemeli Aro <[email protected]>
Co-authored-by: Mihai Nita <[email protected]>

* Add serialization proposal

* Revert "Add serialization proposal"

This reverts commit 17af553.

* Revert "Update from main (#914)"

This reverts commit da9377b.

* Add serialization proposal

---------

Co-authored-by: Eemeli Aro <[email protected]>
Co-authored-by: Elango Cheran <[email protected]>
Co-authored-by: Tim Chevalier <[email protected]>
Co-authored-by: Mark Davis <[email protected]>
Co-authored-by: Danny Gleckler <[email protected]>
Co-authored-by: Steven R. Loomis <[email protected]>
Co-authored-by: Stanisław Małolepszy <[email protected]>
Co-authored-by: Eemeli Aro <[email protected]>
Co-authored-by: Mihai Nita <[email protected]>
@aphillips aphillips closed this Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
normative Issue affects normative text in the specification registry Issue pertains to the function registry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Review non-integral exact number selection algorithm
6 participants