Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-15954 Update spec for unit preferences #3576

Merged

Conversation

macchiati
Copy link
Member

@macchiati macchiati commented Mar 26, 2024

CLDR-15954

Clarify the algorithm for unit preferences to catch edge cases.

  1. What to do with unitpreferences if there is no quantity for a unit, or no preferences for a quantity. Emphasizes the use of likelysubtags. It more explicit about edge cases when comparing input values to geq values (negative numbers, infinity, NaN). Provides more of a step-by-step approach to handling the subtags that have an impact (no change in effect, but much clearer). Moves examples to better places.
  2. Edge cases (but important ones) for mixed units: handling negatives and rounding. One change is to be less prescriptive about the way that precision is handled, giving us a chance to work that our more in the future (because the current advice really doesn't work well.
  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

@macchiati macchiati requested review from sffc and pedberg-icu March 26, 2024 06:29
@macchiati
Copy link
Member Author

Still in draft, but if you have any feedback...

docs/ldml/tr35-info.md Outdated Show resolved Hide resolved
Comment on lines 1129 to 1133
If a maximum/minimum number of decimals or signficant digits is desired in formatting,
those values should should be adapted to the mixed radix used, starting with the largest unit,
and discarding smaller units where they would be overly accurate.
For example, if the goal is 1 part in 100, then "6 miles 1 foot" is overly precise,
while "6 foot 1 inch" has approximately the right accuracy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be much clearer and easier to implement if the decimal formatting options were applied to the smallest unit, with larger units always displayed as an integer.

This appears to be what you write farther down in the doc:

  • If the unit is mixed (eg foot-and-inch) the skeleton applies to the final subunit; the higher subunits are formatted as integers.

Also consider the case "1.9999999... ft" formatted with foot-and-inch. This should result in "2 feet", not "1 foot 12 inches".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, when I wrote "while "6 foot 1 inch" has approximately the right accuracy." I had flubbed the example. So disregard that bit.

I think it would be much clearer and easier to implement if the decimal formatting options were applied to the smallest unit, with larger units always displayed as an integer.

I agree it is simpler for us to implement. And we're in agreement on the higher units always showing as integers.
But it gives less satisfactory results. Take the following example.

The programmer supplies: source unit is meters, and amount is 1616, and desired precision is 3 significant digits.

For meters, that turns into "1.62 kilometers". That reflects internal values 1.62±0.005 km, or within 5 meters.

In miles, that turns into 1.004135847 decimal. If we take the floor*, we get 1 mile, with 21.83727034 feet remaining. If we apply 3 sig. digits to the last field, then we get "1 mile 21.8 feet". That accuracy is excessive: 21.8±0.05 feet is within 0.015 meters. What we really want in that case is something that reflects the original accuracy: a rounded number that is within 5 meters of the original amount (=~16 feet), which is something like "1 mile 20 feet", not "1 mile 22 feet" or "1 mile 21.8 feet".

* As you point out, if we get satisfactory precision out of a higher unit, then we don't want to include the lower unit at all. And if that is the case, we would want to use round(x) and not floor(x).

What I was trying to express with the "should" was more of a warning: (a) there is an issue with mixed units, and (b) we don't have a definitive algorithm (yet) for what to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the desired behavior, but a hand-wavy algorithm can't be implemented. I would rather offer a suggestion of a mechanism that can be revised later than try to hitchhike on an existing mechanism (maximum significant and fractional digits) which are already hard to implement correctly.

Implementations may offer mechanisms to limit the precision of the formatted mixed unit. For example, "1.62143 meters" with three significant figures could be rendered in meter-and-millimeter as "1 meter, 620 millimeters". This mechanism should be distinct from decimal digit options, such as those provided in a number formatting skeleton, which should be applied to only the smallest unit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, I don't like the handwaviness either. I'll try some wording, but because implementations and (and IMO should) have other mechanisms, we need to not require that the precision be applied to the final unit.

Copy link
Member Author

@macchiati macchiati Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leery of having yet another precision type for mixed units. Like in other cases, we want to remove the burden of knowing everything about every language from the programmer as much as possible. And having a set of options in number formatting like mixed units where the 2nd subunit is less than 1/10 of the top unit use X precision ...

BTW, what I was thinking of having is something like the following (wording is currently clumsy). I'm not going to include this wording, because we don't have time to work out any kinks, but just to give you a sense of what I'm thinking.

  1. The original source (say meters) is first converted to the topmost unit, eg miles.
  2. The precision is applied to that topmost unit to find an "accuracy" value.
  3. Eg, 12.345678 miles with 3 sig. digits gives 12.3 miles, so the accuracy is ±0.05 miles.
  4. As you convert to successively smaller units, stop when the value could be rounded to be within that range.
  5. The first unit as an integer is not within that accuracy, so we use the integer value 12 miles, and the remainder to carry over is 0.345678 miles
  6. The carry-over to feet is 1825.17984 feet, and accuracy is ±264. Since the accuracy is > 1, we know we are going to stop at this value, rounding to some amount, and not go on to inches.
  7. Rounding to 1 digit (2000 feet) would be within the accuracy, so we stop, and round to that value, producing "12 miles 2000 feet".

This is not by any means complete, just to give you a sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the current text now, @sffc and @younies .

docs/ldml/tr35-info.md Show resolved Hide resolved
docs/ldml/tr35-info.md Show resolved Hide resolved
@macchiati macchiati requested review from sffc and younies March 27, 2024 01:10
@macchiati macchiati marked this pull request as ready for review March 27, 2024 16:55
@macchiati
Copy link
Member Author

We're going to branch soon, so I'd appreciate if we can get it approved today or tomorrow.

Copy link
Member

@srl295 srl295 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. Seems to address comments.

@srl295 srl295 merged commit f117648 into unicode-org:main Mar 29, 2024
7 checks passed
@macchiati macchiati deleted the CLDR-15954-Update-spec-for-unit-preferences branch March 31, 2024 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants