diff --git a/docs/ldml/tr35-info.md b/docs/ldml/tr35-info.md index 85499613a32..5470524df45 100644 --- a/docs/ldml/tr35-info.md +++ b/docs/ldml/tr35-info.md @@ -1116,15 +1116,48 @@ The examples in #4 are due to the following ordering of the `unitQuantity` eleme ## Mixed Units -Mixed units, or unit sequences, are units with the same base unit which are listed in sequence. Common examples are feet and inches, meters and centimeters, and hours, minutes, and seconds. Mixed unit identifiers are expressed using the "-and-" infix, as in "foot-and-inch", "meter-and-centimeter", and "hour-and-minute-and-second". - -Scalar values for mixed units are expressed in the largest unit, according to the sort order discussed above in "Normalization". For example, numbers for "foot-and-inch" are expressed in feet. - -Mixed units are expected to be rendered in the order of the tokens in the identifier. For example, the value 1.25 with the identifier "foot-and-inch" should be rendered as "1 foot and 3 inches" and 1.25 inch-and-foot should be rendered as “3 inches and 1 foot". **NOTE:** the correct application of this may require adding locales to the regions attribute set. +Mixed units, or unit sequences, are units with the same base unit which are listed in sequence. +Common examples are feet and inches; meters and centimeters; hours, minutes, and seconds; degrees, minutes, and seconds. +Mixed unit identifiers are expressed using the "-and-" infix, as in "foot-and-inch", "meter-and-centimeter", "hour-and-minute-and-second", "degree-and-arc-minute-and-arc-second." + +Scalar values for mixed units are expressed in the largest unit, according to the sort order discussed above in "Normalization". +For example, numbers for "foot-and-inch" are expressed in feet. + +Mixed unit identifiers should be from highest to lowest (eg foot-and-inch instead of inch-and-foot), and that is reflected in the display. +If it turns out that some locales present certain mixed units in a different order, additional structure will be needed in CLDR. + +Only the lowest unit can have decimal fractions; the higher units will be integers, so no "3.5 feet 3 inches". +If a number is negative, then only the highest unit shows the minus sign: eg, "-3 hours 27 minutes". +If one of the units is zero, then it is normally omitted: eg, "3 feet" instead of "3 feet 0 inches". +However, when all of the units would be omitted, then the highest unit is shown with zero: eg "0 feet". + +Implementations may offer mechanisms to control the precision of the formatted mixed unit. Examples include, but are not limited to: +* An implementation could apply the precision of a number formatter to the final unit. + However, this mechanisim has a couple of disadvantages, such as matching precision across user preferences. For example, suppose the input amount is 1.5254 and the precision is 2 decimals. + * Locale A uses decimal degrees and gets 1.53°. + * Locale B uses degrees, minutes, seconds, and gets 1° 31′ 31.44″ + * Locale B has an unnecessarily precise result: the equivalent of 1.52540 in precision. +* An implementation could allow a percentage precision; + thus 1612 meters with ±1% precision would be represented by **1 mile** rather than **1 mile 9 feet**. + +The default behavior is to round the lowest unit to the nearest integer. +Thus 1.99959 degree-and-arc-minute-and-arc-second would be (before rounding) **1 degree 59 minutes 58.524 seconds**. +After rounding it would be **1 degree 59 minutes 59 seconds**. + +If the lowest unit would round to zero, or round up to the size of the next higher unit, then the next higher unit is rounded instead, recursively. +Thus 1.999862 degree-and-arc-minute-and-arc-second would be (before rounding) **1 degree 59 minutes 59.5032 degrees**. +After rounding the last unit it would be **1 degree 59 minutes 60 seconds**, which rounds up to **1 degree 60 minutes**, which rounds up to **2 degrees**. +This behavior can be determined before having to compute the lower units: +for example, where rounding to the second, if the remainder in degrees is below 1/120 degrees or above 119/120 degrees, then the degrees can be rounded without computing the minutes or seconds. ## Testing -The [unitsTest.txt](https://github.com/unicode-org/cldr/blob/main/common/testData/units/unitsTest.txt) file supplies a list of all the CLDR units with conversions, for testing implementations. Instructions for use are supplied in the header of the file. +The files in the directory [cldr/common/testData/units/](https://github.com/unicode-org/cldr/tree/main/common/testData/units) are provided for testing implementations. +1. The [unitsTest.txt](https://github.com/unicode-org/cldr/blob/main/common/testData/units/unitsTest.txt) file supplies a list of all the CLDR units with conversions +2. The [unitPreferencesTest.txt](https://github.com/unicode-org/cldr/blob/main/common/testData/units/unitPreferencesTest.txt) file supplied tests for user preferences +3. The [unitLocalePreferencesTest.txt](https://github.com/unicode-org/cldr/blob/main/common/testData/units/unitLocalePreferencesTest.txt) file provides examples for testing the interactions between locale identifiers and unit preferences. + +Instructions for use are supplied in the header of the file. ## Unit Preferences @@ -1132,22 +1165,53 @@ Different locales have different preferences for which unit or combination of un ### Unit Preferences Overrides -The determination of preferred units depends on the locale identifer: the keys mu, ms, rg, their values, the base locale (language, script, region) and the user preferences data. +The determination of preferred units uses the user preference data together with **input unit**, the **input usage**, and the **input locale identifer**. +Within the locale identifier, the subtags that can affect the result are: + * the value of the keys mu, ms, and rg + * the region in the locale identifier (if there is one) + * and otherwise the likely region subtag for the locale identifier -The strongest is the mu key, then the ms key, then the rg key. Beyond that the region of the locale identifer is used, and if not present, the likely-subtag region. For example: +The strongest priority is the mu key, then the ms key, then the rg key. +Beyond that the region of the locale identifer is used, and if not present, the likely-subtag region. +For example: | | Locale | Result | Comment | |---|---------------------------------------|------------|--------------------------------------------------------------------| | 1 | en-u-rg-uszzzz-ms-ussystem-mu-celsius | Celsius | despite the rg and ms settings for US, and the likely region of US | | 2 | en-u-rg-uszzzz-ms-metric | Celsius | despite the rg setting for US, and the likely region of US | -| 3 | en-u-rg-dezzzz. | Celsius | despite the likely region of US | -| 4 | en | Fahrenheit | because the likely region for en with no region is US | +| 3 | en-u-rg-dezzzz. | Celsius | despite the likely region of US | +| 4 | en-DE | Celsius | because explicit region is DE | +| 5 | en | Fahrenheit | because the likely region for en with no region is US | + +If any key-values are invalid, then they are ignored. Thus the following constructs are ignored: + +| subtags | reason | +| --- | --- | +| -mu-smoot | invalid unit | +| -ms-stanford | invalid unit system | +| -rg-aazzzz | invalid region 'AA' ‡| +| -AA | invalid region 'AA'| + +‡ Only the region portion is currently used, so in -rg-usabcdef the "abcdef" is ignored, whether or not it is valid. -The **ms** value is used in the following way. +The following algorithm is used to compute the override units, regions, and category. +The latter two items are used in the [Unit Preferences Data](#Unit_Preferences_Data). -1. Find the corresponding Key-Value row in the table below. -2. Get the unit preferences for the **locale**, **category**, and **usage**. -3. If any of the units in that set have a measurement system that doesn’t match the -u-ms- value, get unit preferences again, but using the fallback region instead of the locale's region. +#### Compute override units +If there is a valid -mu value then let the **output unit** be the that value, and return it. +This terminates the algorithm; there is no need to use the unit preferences information. + +#### Compute regions +If there is no valid -mu value, the following steps are used to determine a region R from the **input locale identifer**. +(and optionally a Unit Systems Match (USM)): + +1. If there is a valid -ms value then let USM be the corresponding value in column 2 of the table below. +Otherwise FR is not used. In either case continue with step 2. +2. If there is a valid -rg region, let R be that region, and go to Compute the category. +3. If there is a valid region in the locale, let R be that region, and go to Compute the category. +4. Otherwise, compute the likely subtags for the locale. + 1. If there is a likely region, then let R be that region, and go to Compute the category. + 2. Otherwise, let R be 001, and go to Compute the category | Key-Value | Unit Systems Match | Fallback Region for Unit Preferences | |-------------|-----------------------------|--------------------------------------| @@ -1155,49 +1219,31 @@ The **ms** value is used in the following way. | ms-ussystem | ussystem | US | | ms-uksystem | uksystem | UK | -**Example A: xx-SE-u-ms-metric, length, road** -1. Fetch the data from `` for xx-SE -``` -mile-scandinavian -kilometer -meter -meter -meter -``` -2. Meter is **metric**, mile-scandinavian is **metric_adjacent** so they both match the key-value ms-**metric**, so no change is made. - -**Example B: xx-GB-u-ms-ussystem, volume, fluid** -1. Fetch the data from `` for xx-GB -``` -gallon-imperial -fluid-ounce-imperial -``` -2. At least one of {gallon-imperial, fluid-ounce-imperial} does not match ms-**ussystem** so the locale is shifted to xx-**US**, and uses the following: -``` -gallon -quart -pint -cup -fluid-ounce -tablespoon -teaspoon -``` - -APIs should clearly allow for both the use of unit preferences with the above process, and for the _invariant use_ of a unit measure. -That is, while an application will usually want to obey the preferences for the locale or in the locale ID, there will definitely be instances where it will want to not use them. -For example, in showing the weather, an application may want to show: +#### Compute the category -High today: 68°F (20°C) +A **category** is determined as follows from the input unit: -To do that, the application needs to show the first value with the locale information, and then (a) query what the alternative is, and show the temperature in that. -As an example, ICU only uses the unit preferences (with rg, ms, and/or mu and the likely region) in formatting units when a usage parameter is set. +1. From the input unit, use the conversion data in [baseUnit](tr35-info.html#Unit_Conversion) and let the **input base unit** be the baseUnit attribute value. + * eg, for `pound-force` the baseUnit is `kilogram-meter-per-square-second`. +2. If there is no such base unit (such as for a an unusual unit like `ampere-pound-per-foot-square-minute`), + convert the input unit to a combination of base units, reduce to lowest terms, and normalize. + Let the **input base unit** be that value. + * eg, `ampere-pound-per-foot-square-minute` ⇒ `kilogram-ampere-per-meter-square-second` +3. If the **input base unit** has a unitQuantity element, then let the **category** be the quantity attribute value. + * eg, `force` from `` +4. If the **input base unit** does not have a unitQuantity, let the output unit be the input base unit. + An implementation may also set it to an equivalent metric/SI unit, as in the example below. + This terminates the algorithm; there is no need to use the unit preferences information. + * For example, for `ampere-pound-per-foot-square-minute` an implementation could return `kilogram-ampere-per-meter-square-second` or `pascal-ampere`. + * That is, an implementation can use shorter metric/SI units as long as long as the combination is equivalent in value. ### Unit Preferences Data The CLDR data is intended to map from a particular usage — e.g. measuring the height of a person or the fuel consumption of an automobile — to the unit or combination of units typically used for that usage in a given region. Considerations for such a mapping include: -* The list of possible usages large and open-ended. The intent here is to start with a small set for which there is an urgent need, and expand as necessary. -* Even for a given usage such a measuring a road distance, there are multiple ranges in use. For example, one set of units may be used for indicating the distance to the next city (kilometers or miles), while another may be used for indicating the distance to the next exit (meters, yards, or feet). +* The list of possible usages is large and open-ended, and will be extended in the future. +* Even for a given usage such a measuring a road distance, there are different choices of units based on the particular distance. + For example, one set of units may be used for indicating the distance to the next city (kilometers or miles), while another may be used for indicating the distance to the next exit (meters, yards, or feet). * There are also differences between more formal usage (official signage, medical records) and more informal usage (conversation, texting). * For some usages, the measurement may be expressed using a sequence of units, such as “1 meter, 78 centimeters” or “12 stone, 2 pounds”. @@ -1216,17 +1262,20 @@ The DTD structure is as follows: ``` - - - - - - -
categoryA unit quantity, such as “area” or “length”. See Unit Conversion
usageA type of usage, such as person-height.
regionsOne or more region identifiers (macroregions or regions), subdivision identifiers, or language identifiers, such as 001, US, usca, and de-CH.
geqA threshold value, in a unit determined by the unitPreference element value. The unitPreference element is only used for values higher than this value (and lower than any higher value).
The value must be non-negative. For picking negative units (-3 meters), use the absolute value to pick the unit.
skeletonA skeleton in the ICU number format syntax, that can be used to format unit
+| Term | Description | +|---|---| +| category | A unit quantity, such as “area” or “length”. See [Unit Conversion](#Unit_Conversion) | +| usage | A type of usage, such as person-height. | +| regions | One or more region identifiers (macroregions or regions), such as 001, US. (Note that this field may be extended in the future to also include subdivision identifiers and/or language identifiers, such as usca, and de-CH.) | +| geq | A threshold value, in a unit determined by the unitPreference element value. The unitPreference element is only used for values higher than this value (and lower than any higher value).
The value must be non-negative. For picking negative units (-3 meters), use the absolute value to pick the unit. | +| skeleton | A skeleton in the ICU number format syntax, that is to be used to format the output unit amount. | + + +Logically, the unit preferences data is a map from categories to a map of usages to a map of regions to a list of ranked units and optional formats. **Note:** As of CLDR 37, the `` `geq` attribute replaces the now-deprecated `` `scope` attribute. -Example: +#### Examples: ```xml @@ -1257,75 +1306,101 @@ The above information says that for default usage, in the US people use mile, fo ``` -The intended usage is to take the measure to be formatted, and the desired category, usage, and region and find the best match as follows. +The following is the algorithm for computing the preferred output unit from the category, usage, region, and USM. + +#### Compute the preferred output unit + +1. Let category preferences be the result of a lookup of **category** in the unit preferences. + 1. If the lookup fails, let the **output unit** be the input base unit or an equivalent metric/SI unit, and return. This terminates the algorithm. +2. Let category-usage preferences be the result of a lookup of **input usage** in the category preferences. + 1. If the lookup fails, let the **input usage** be its containing usage, and repeat. (This will always terminate is always a 'default' usage for each category.) + 2. The containing usage is the result of truncating the last '-' and following text, if there is a '-', and other wise 'default' + * For example, land-agriculture-grain ⊂ land-agriculture ⊂ land ⊂ default +3. Let ranked units be the result of a lookup of R in the category-usage preferences. There may be both region values and [containment regions](https://www.unicode.org/cldr/charts/latest/supplemental/territory_containment_un_m_49.html). + 1. If the lookup of R fails, set R to its containing region and repeat. (This will always terminate because 001 is always present.) + * For example, CH (Switzerland) ⊂ 155 (Western Europe) ⊂ 150 (Europe) ⊂ 001 (World). + * This loop can be optimized to only include containing regions that occur in the data (eg, only 001 in LDML 45). +4. If there is a USM, and the corresponding Fallback Region is different than R, and any of the units in the ranked list don't match the USM, then let the ranked units be the result of a lookup of the Fallback Region in the category-usage preferences. -* First, see if there is an exact match, producing a list of one or more `unitPreference` elements. For example, length/road/GB has a match above, giving +#### Search the ranked units +The ranked units will be of the following form: ```xml mile yard yard ``` - -* If there is no match for the category, then the data is not available. -* Otherwise, given the category: - * If there is an exact match for the usage, but not for the region, try region="001". -* The specification allows for [containment regions](https://unicode-org.github.io/cldr-staging/charts/38/supplemental/territory_containment_un_m_49.html), [region subdivisions](https://unicode-org.github.io/cldr-staging/charts/38/supplemental/territory_subdivisions.html). -* While in version 37 only 001 is used, in the future the data may contain others. -* The fallback is: subdivision2 ⇒ subdivision1 ⇒ region/country ⇒ subcontinent ⇒ continent ⇒ world -* Example: - - | Region/subdivision | Code | - | ------------------ | ----- | - | Blackpool | gbbpl | - | England | gbeng | - | United Kingdom | GB | - | Northern Europe | 154 | - | Europe | 150 | - | World | 001 | - -* If there is an exact match for the region, but not for the usage, - * If the usage has multiple parts (eg land-agriculture-grain) drop the last part (eg land-agriculture) - * Repeat dropping the last part and trying the result (eg land) - * If you eliminate all of them, try usage="default". - * If there is no exact match for either one, try usage="default", region="001". That will always match. - -Once you have a list of `unitPreference` elements, find the applicable unitPreference. For a given category, usage, and set of regions (eg “US GB”), the units are ordered from largest to smallest. - + * The geq item gives the value for the unit in the element value (or for the largest unit for mixed units). For example, - * `...geq="0.5">mile<...` means 0.9 kilometers - * `...geq="100.0">foot:inch<...` means 100 feet + * `...geq="0.5">mile<...` is ≥ 0.5 miles + * `...geq="100.0">foot-and-inch<...` is ≥ 100 feet * If there is no `geq` attribute, then the implicit value is 1.0. * Implementations will probably convert the values into the base units, so that the comparison is fast. Thus the above would be converted internally to something like: * ≥ 804.672 meters ⇒ mile - * ≥ 30.48 meters ⇒ foot:inch -* Search for the first matching unitPreference for the input measure. If there is no match (eg < 100 feet in the above example), take the last unitPreference. That is, the last unitPreference is effectively geq="0" + * ≥ 30.48 meters ⇒ foot-and-inch + +1. Search for the first matching unitPreference for the absolute value of the input measure. If there is no match (eg < 100 feet in the above example), take the last unitPreference. That is, the last unitPreference is effectively geq="0". In the above example, `yard` is equivalent to `yard` For completeness, when comparing doubles to the geq values: -* Negative numbers are treated as if they were positive. -* _infinity_ is treated as being the largest possible value. -* NaN is treated as the smallest possible value. +* Negative numbers are treated as if they were positive, so in the above example -804.672 meters will format as "-0.5 mile". +* _infinity_, NaN, and -_infinity_ match the largest possible value. Thus -∞ meters will format as "-∞ miles", not "-∞ yards". -Once a matching `unitPreference` element is found: +2. Once a matching `unitPreference` element is found: * The unit is the element value * The skeleton (if there is one) supplies formatting information for the unit. API settings may allow that to be overridden. * The syntax and semantics for the skeleton value are defined by the [ICU Number Skeletons](https://unicode-org.github.io/icu/userguide/format_parse/numbers/skeletons.html) document. -* If the unit is mixed (eg foot:inch) the skeleton applies to the final subunit; the higher subunits are formatted as integers. * If the skeleton is missing, the default is skeleton="**precision-integer/@@\***". However, the client can also override or tune the number formatting. +* If the unit is mixed (eg foot-and-inch) the skeleton applies to the final subunit; the higher subunits are formatted as integers. ### Constraints * For a given category, there is always a “default” usage. -* For a given category, and usage: +* For a given category and usage: * There is always a 001 region. * None of the sets of regions can overlap. That is, you can’t have “US” on one line and “US GB” on another. You _can_ have two lines with “US”, for different sizes of units. * For a given category, usage, and region-set * The unitPreferences are in descending order. -### Caveats +#### Examples -The extended unit support is still being developed further. See the Known Issues on the release page for futher information. +**Example A: xx-SE-u-ms-metric, length, road** +1. Fetch the data from `` for xx-SE +``` +mile-scandinavian +kilometer +meter +meter +meter +``` +2. Meter is **metric**, mile-scandinavian is **metric_adjacent** so they both match the key-value ms-**metric**, so no change is made. + +**Example B: xx-GB-u-ms-ussystem, volume, fluid** +1. Fetch the data from `` for xx-GB +``` +gallon-imperial +fluid-ounce-imperial +``` +2. At least one of {gallon-imperial, fluid-ounce-imperial} does not match ms-**ussystem** so the locale is shifted to xx-**US**, and uses the following: +``` +gallon +quart +pint +cup +fluid-ounce +tablespoon +teaspoon +``` + +## Unit APIs +APIs should clearly allow for both the use of unit preferences with the above process, and for the _invariant use_ of a unit measure. +That is, while an application will usually want to obey the preferences for the locale or in the locale ID, there will definitely be instances where it will want to not use them. +For example, in showing the weather, an application may want to show: + +High today: 68°F (20°C) + +To do that, the application needs to show the first value with the locale information, and then (a) query what the alternative is, and show the temperature in that. +As an example, ICU only uses the unit preferences (with rg, ms, and/or mu and the likely region) in formatting units when a **usage** parameter is set. * * * diff --git a/docs/ldml/tr35.md b/docs/ldml/tr35.md index f1c38a22b52..7841587c9b9 100644 --- a/docs/ldml/tr35.md +++ b/docs/ldml/tr35.md @@ -4006,7 +4006,13 @@ Other contributors to CLDR are listed on the [CLDR Project Page](https://www.uni * Part 3: [Numbers](tr35-numbers.md#Contents) * In [Supplemental Currency Data](tr35-numbers.md#Supplemental_Currency_Data), for the `currency` element, added attributes `tz` and `to-tz` to clarify the `from` and `to` dates. -* Part 6: [Supplemental][Supplemental](tr35-info.md#Contents) +* Part 6: [Supplemental](tr35-info.md#Contents) + * In [Mixed Units](tr35-info.html#mixed-units), clarified many aspects of mixed units (such as foot-and-inch), + including how to handle rounding and precision. + * In [Testing](tr35-info.html#testing), listed the additional test files. + * In [Unit Preferences Overrides](tr35-info.html#Unit_Preferences_Overrides), added handling of edge cases, + such as where there is no quantity for a unit, or no preference data for a quantity. + Also clarified how to handle invalid subtags, and the usage of each of the subtags that affect unit preferences. * In [Conversion Data](tr35-info.md#conversion-data), added the `special` attribute for `convertUnit`, used for handling beaufort. * Part 7: [Keyboards]