Skip to content

Commit

Permalink
CLDR-17251 Fixes to likely subtags spec
Browse files Browse the repository at this point in the history
  • Loading branch information
macchiati committed Apr 2, 2024
1 parent e09948e commit a0ec058
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions docs/ldml/tr35.md
Original file line number Diff line number Diff line change
Expand Up @@ -2236,7 +2236,13 @@ This operation is performed in the following way.

1. **Canonicalize.**
1. Make sure the input locale is in canonical form: uses the right separator, and has the right casing.
2. Replace any deprecated subtags with their canonical values using the `<alias>` data in supplemental metadata. Use the first value in the replacement list, if it exists. Language tag replacements may have multiple parts, such as "sh" ➞ "sr_Latn" or "mo" ➞ "ro_MD". In such a case, the original script and/or region are retained if there is one. Thus "sh_Arab_AQ" ➞ "sr_Arab_AQ", not "sr_Latn_AQ".
2. Replace any deprecated subtags with their canonical values using the `<alias>` data in supplemental metadata. Use the first value in the replacement list, if it exists.
Language tag replacements may have multiple parts, such as "sh" ➞ "sr_Latn" or "mo" ➞ "ro_MD". In such a case, the original script and/or region are retained if there is
one. Thus "sh_Arab_AQ" ➞ "sr_Arab_AQ", not "sr_Latn_AQ".
* There are certain exceptions to this: some implementations still use three obsolete language subtags: iw, in, and yi.
The likely subtags data currently supports those implementations by providing elements that handle them,
with the deprecated code on both sides: `<likelySubtag from="iw"to="iw_Hebr_IL"/>`
Such implementations may refrain from replacing those deprecated tags.
3. If the tag is a legacy language tag (marked as “Type: grandfathered” in BCP 47; see `<variable id="$grandfathered" type="choice">` in the supplemental data), then return it.
4. Remove the script code 'Zzzz' and the region code 'ZZ' if they occur.
5. Get the components of the cleaned-up source tag _(language<sub>s</sub>, script<sub>s</sub>,_ and _region<sub>s</sub>_), plus any variants and extensions.
Expand All @@ -2249,7 +2255,7 @@ This operation is performed in the following way.
3. **Return**
1. If there is no match, signal an error and stop.
2. Otherwise there is a match = _language<sub>m</sub>\_script<sub>m</sub>\_region<sub>m</sub>_
3. Let x<sub>r</sub> = x<sub>s</sub> if x<sub>s</sub> is neither empty nor a macroregion, and x<sub>m</sub> otherwise.
3. Let x<sub>r</sub> = x<sub>s</sub> if x<sub>s</sub> is neither empty nor 'und', and x<sub>m</sub> otherwise.
4. Return the language tag composed of _language<sub>r</sub>\_script<sub>r</sub>\_region<sub>r</sub>_ + variants + extensions.

Signalling an error can be done in various ways, depending on the most consistent approach for APIs in the module. For example:
Expand All @@ -2259,6 +2265,9 @@ Signalling an error can be done in various ways, depending on the most consisten
4. return the input, but "Zzzz", and/or "ZZ" substituted for empty fields.
5. "und"

One by-product of this algorithm is that an element such as `<likelySubtag from="fr_IR "to="en_Arab"/>` would be misleading: the 'fr' can never be replaced by 'en'.
The only subtags that can be replaced are deprecated ones, empty, und, Zzzz, and ZZ.

The lookup can be optimized. For example, if any of the tags in Step 2 are the same as previous ones in that list, they do not need to be tested.

_Example1:_
Expand Down

0 comments on commit a0ec058

Please sign in to comment.