CLDR-15954 Update spec for unit preferences #3576

macchiati · 2024-03-26T01:20:34Z

CLDR-15954

Clarify the algorithm for unit preferences to catch edge cases.

What to do with unitpreferences if there is no quantity for a unit, or no preferences for a quantity. Emphasizes the use of likelysubtags. It more explicit about edge cases when comparing input values to geq values (negative numbers, infinity, NaN). Provides more of a step-by-step approach to handling the subtags that have an impact (no change in effect, but much clearer). Moves examples to better places.
Edge cases (but important ones) for mixed units: handling negatives and rounding. One change is to be less prescriptive about the way that precision is handled, giving us a chance to work that our more in the future (because the current advice really doesn't work well.

This PR completes the ticket.

ALLOW_MANY_COMMITS=true

macchiati · 2024-03-26T06:29:40Z

Still in draft, but if you have any feedback...

docs/ldml/tr35-info.md

sffc · 2024-03-26T17:12:14Z

docs/ldml/tr35-info.md

+If a maximum/minimum number of decimals or signficant digits is desired in formatting, 
+those values should should be adapted to the mixed radix used, starting with the largest unit, 
+and discarding smaller units where they would be overly accurate. 
+For example, if the goal is 1 part in 100, then "6 miles 1 foot" is overly precise, 
+while "6 foot 1 inch" has approximately the right accuracy.


I think it would be much clearer and easier to implement if the decimal formatting options were applied to the smallest unit, with larger units always displayed as an integer.

This appears to be what you write farther down in the doc:

If the unit is mixed (eg foot-and-inch) the skeleton applies to the final subunit; the higher subunits are formatted as integers.

Also consider the case "1.9999999... ft" formatted with foot-and-inch. This should result in "2 feet", not "1 foot 12 inches".

First, when I wrote "while "6 foot 1 inch" has approximately the right accuracy." I had flubbed the example. So disregard that bit.

I think it would be much clearer and easier to implement if the decimal formatting options were applied to the smallest unit, with larger units always displayed as an integer.

I agree it is simpler for us to implement. And we're in agreement on the higher units always showing as integers.
But it gives less satisfactory results. Take the following example.

The programmer supplies: source unit is meters, and amount is 1616, and desired precision is 3 significant digits.

For meters, that turns into "1.62 kilometers". That reflects internal values 1.62±0.005 km, or within 5 meters.

In miles, that turns into 1.004135847 decimal. If we take the floor*, we get 1 mile, with 21.83727034 feet remaining. If we apply 3 sig. digits to the last field, then we get "1 mile 21.8 feet". That accuracy is excessive: 21.8±0.05 feet is within 0.015 meters. What we really want in that case is something that reflects the original accuracy: a rounded number that is within 5 meters of the original amount (=~16 feet), which is something like "1 mile 20 feet", not "1 mile 22 feet" or "1 mile 21.8 feet".

* As you point out, if we get satisfactory precision out of a higher unit, then we don't want to include the lower unit at all. And if that is the case, we would want to use round(x) and not floor(x).

What I was trying to express with the "should" was more of a warning: (a) there is an issue with mixed units, and (b) we don't have a definitive algorithm (yet) for what to do.

I understand the desired behavior, but a hand-wavy algorithm can't be implemented. I would rather offer a suggestion of a mechanism that can be revised later than try to hitchhike on an existing mechanism (maximum significant and fractional digits) which are already hard to implement correctly.

Implementations may offer mechanisms to limit the precision of the formatted mixed unit. For example, "1.62143 meters" with three significant figures could be rendered in meter-and-millimeter as "1 meter, 620 millimeters". This mechanism should be distinct from decimal digit options, such as those provided in a number formatting skeleton, which should be applied to only the smallest unit.

Yah, I don't like the handwaviness either. I'll try some wording, but because implementations and (and IMO should) have other mechanisms, we need to not require that the precision be applied to the final unit.

I'm leery of having yet another precision type for mixed units. Like in other cases, we want to remove the burden of knowing everything about every language from the programmer as much as possible. And having a set of options in number formatting like mixed units where the 2nd subunit is less than 1/10 of the top unit use X precision ...

BTW, what I was thinking of having is something like the following (wording is currently clumsy). I'm not going to include this wording, because we don't have time to work out any kinks, but just to give you a sense of what I'm thinking.

The original source (say meters) is first converted to the topmost unit, eg miles.

The precision is applied to that topmost unit to find an "accuracy" value.

Eg, 12.345678 miles with 3 sig. digits gives 12.3 miles, so the accuracy is ±0.05 miles.

As you convert to successively smaller units, stop when the value could be rounded to be within that range.

The first unit as an integer is not within that accuracy, so we use the integer value 12 miles, and the remainder to carry over is 0.345678 miles

The carry-over to feet is 1825.17984 feet, and accuracy is ±264. Since the accuracy is > 1, we know we are going to stop at this value, rounding to some amount, and not go on to inches.

Rounding to 1 digit (2000 feet) would be within the accuracy, so we stop, and round to that value, producing "12 miles 2000 feet".

This is not by any means complete, just to give you a sense.

Please take a look at the current text now, @sffc and @younies .

docs/ldml/tr35-info.md

macchiati · 2024-03-27T16:59:38Z

We're going to branch soon, so I'd appreciate if we can get it approved today or tomorrow.

…nit.

srl295

Generally LGTM. Seems to address comments.

macchiati added 3 commits March 25, 2024 18:19

CLDR-15954 Update spec for unit preferences

244b5ed

CLDR-15954 touchups

ebecc37

CLDR-15954 Additional tweaks, Modifications section

53acac2

macchiati requested review from sffc and pedberg-icu March 26, 2024 06:29

younies reviewed Mar 26, 2024

View reviewed changes

docs/ldml/tr35-info.md Outdated Show resolved Hide resolved

sffc reviewed Mar 26, 2024

View reviewed changes

macchiati added 3 commits March 26, 2024 17:17

CLDR-15954 Refine based on comments from Shane and Younies

9ac8bbf

CLDR-15954 Copyediting

ba8ed94

CLDR-15954 Copyediting

c25c417

macchiati requested review from sffc and younies March 27, 2024 01:10

macchiati marked this pull request as ready for review March 27, 2024 16:55

CLDR-15954 Add an example for using the precision on the last mixed u…

c35de6f

…nit.

srl295 approved these changes Mar 29, 2024

View reviewed changes

srl295 merged commit f117648 into unicode-org:main Mar 29, 2024
7 checks passed

macchiati deleted the CLDR-15954-Update-spec-for-unit-preferences branch March 31, 2024 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLDR-15954 Update spec for unit preferences #3576

CLDR-15954 Update spec for unit preferences #3576

macchiati commented Mar 26, 2024 •

edited

Loading

macchiati commented Mar 26, 2024

sffc Mar 26, 2024

macchiati Mar 26, 2024

sffc Mar 26, 2024

macchiati Mar 26, 2024

macchiati Mar 26, 2024 •

edited

Loading

macchiati Mar 27, 2024

macchiati commented Mar 27, 2024

srl295 left a comment

CLDR-15954 Update spec for unit preferences #3576

CLDR-15954 Update spec for unit preferences #3576

Conversation

macchiati commented Mar 26, 2024 • edited Loading

macchiati commented Mar 26, 2024

sffc Mar 26, 2024

Choose a reason for hiding this comment

macchiati Mar 26, 2024

Choose a reason for hiding this comment

sffc Mar 26, 2024

Choose a reason for hiding this comment

macchiati Mar 26, 2024

Choose a reason for hiding this comment

macchiati Mar 26, 2024 • edited Loading

Choose a reason for hiding this comment

macchiati Mar 27, 2024

Choose a reason for hiding this comment

macchiati commented Mar 27, 2024

srl295 left a comment

Choose a reason for hiding this comment

macchiati commented Mar 26, 2024 •

edited

Loading

macchiati Mar 26, 2024 •

edited

Loading