CLDR-17251 Fixes to likely subtags spec

unicode-org · Apr 2, 2024 · a0ec058 · a0ec058
1 parent e09948e
commit a0ec058
Showing 1 changed file with 11 additions and 2 deletions.
diff --git a/docs/ldml/tr35.md b/docs/ldml/tr35.md
@@ -2236,7 +2236,13 @@ This operation is performed in the following way.
 
 1. **Canonicalize.**
    1. Make sure the input locale is in canonical form: uses the right separator, and has the right casing.
-   2. Replace any deprecated subtags with their canonical values using the `<alias>` data in supplemental metadata. Use the first value in the replacement list, if it exists. Language tag replacements may have multiple parts, such as "sh" ➞ "sr_Latn" or "mo" ➞ "ro_MD". In such a case, the original script and/or region are retained if there is one. Thus "sh_Arab_AQ" ➞ "sr_Arab_AQ", not "sr_Latn_AQ".
+   2. Replace any deprecated subtags with their canonical values using the `<alias>` data in supplemental metadata. Use the first value in the replacement list, if it exists.
+      Language tag replacements may have multiple parts, such as "sh" ➞ "sr_Latn" or "mo" ➞ "ro_MD". In such a case, the original script and/or region are retained if there is
+      one. Thus "sh_Arab_AQ" ➞ "sr_Arab_AQ", not "sr_Latn_AQ".
+      * There are certain exceptions to this: some implementations still use three obsolete language subtags: iw, in, and yi.
+        The likely subtags data currently supports those implementations by providing elements that handle them, 
+        with the deprecated code on both sides: `<likelySubtag from="iw"to="iw_Hebr_IL"/>`
+        Such implementations may refrain from replacing those deprecated tags.
    3. If the tag is a legacy language tag (marked as “Type: grandfathered” in BCP 47; see `<variable id="$grandfathered" type="choice">` in the supplemental data), then return it.
    4. Remove the script code 'Zzzz' and the region code 'ZZ' if they occur.
    5. Get the components of the cleaned-up source tag _(language<sub>s</sub>, script<sub>s</sub>,_ and _region<sub>s</sub>_), plus any variants and extensions.
@@ -2249,7 +2255,7 @@ This operation is performed in the following way.
 3. **Return**
    1. If there is no match, signal an error and stop.
    2. Otherwise there is a match = _language<sub>m</sub>\_script<sub>m</sub>\_region<sub>m</sub>_
-   3. Let x<sub>r</sub> = x<sub>s</sub> if x<sub>s</sub> is neither empty nor a macroregion, and x<sub>m</sub> otherwise.
+   3. Let x<sub>r</sub> = x<sub>s</sub> if x<sub>s</sub> is neither empty nor 'und', and x<sub>m</sub> otherwise.
    4. Return the language tag composed of _language<sub>r</sub>\_script<sub>r</sub>\_region<sub>r</sub>_ + variants + extensions.
 
 Signalling an error can be done in various ways, depending on the most consistent approach for APIs in the module. For example:
@@ -2259,6 +2265,9 @@ Signalling an error can be done in various ways, depending on the most consisten
    4. return the input, but "Zzzz", and/or "ZZ" substituted for empty fields.
    5. "und"
 
+One by-product of this algorithm is that an element such as `<likelySubtag from="fr_IR "to="en_Arab"/>` would be misleading: the 'fr' can never be replaced by 'en'.
+The only subtags that can be replaced are deprecated ones, empty, und, Zzzz, and ZZ. 
+
 The lookup can be optimized. For example, if any of the tags in Step 2 are the same as previous ones in that list, they do not need to be tested.
 
 _Example1:_