diff --git a/docs/ldml/tr35.md b/docs/ldml/tr35.md index 810652c2180..6253ab1bd28 100644 --- a/docs/ldml/tr35.md +++ b/docs/ldml/tr35.md @@ -2236,7 +2236,13 @@ This operation is performed in the following way. 1. **Canonicalize.** 1. Make sure the input locale is in canonical form: uses the right separator, and has the right casing. - 2. Replace any deprecated subtags with their canonical values using the `` data in supplemental metadata. Use the first value in the replacement list, if it exists. Language tag replacements may have multiple parts, such as "sh" ➞ "sr_Latn" or "mo" ➞ "ro_MD". In such a case, the original script and/or region are retained if there is one. Thus "sh_Arab_AQ" ➞ "sr_Arab_AQ", not "sr_Latn_AQ". + 2. Replace any deprecated subtags with their canonical values using the `` data in supplemental metadata. Use the first value in the replacement list, if it exists. + Language tag replacements may have multiple parts, such as "sh" ➞ "sr_Latn" or "mo" ➞ "ro_MD". In such a case, the original script and/or region are retained if there is + one. Thus "sh_Arab_AQ" ➞ "sr_Arab_AQ", not "sr_Latn_AQ". + * There are certain exceptions to this: some implementations still use three obsolete language subtags: iw, in, and yi. + The likely subtags data currently supports those implementations by providing elements that handle them, + with the deprecated code on both sides: `` + Such implementations may refrain from replacing those deprecated tags. 3. If the tag is a legacy language tag (marked as “Type: grandfathered” in BCP 47; see `` in the supplemental data), then return it. 4. Remove the script code 'Zzzz' and the region code 'ZZ' if they occur. 5. Get the components of the cleaned-up source tag _(languages, scripts,_ and _regions_), plus any variants and extensions. @@ -2249,7 +2255,7 @@ This operation is performed in the following way. 3. **Return** 1. If there is no match, signal an error and stop. 2. Otherwise there is a match = _languagem\_scriptm\_regionm_ - 3. Let xr = xs if xs is neither empty nor a macroregion, and xm otherwise. + 3. Let xr = xs if xs is neither empty nor 'und', and xm otherwise. 4. Return the language tag composed of _languager\_scriptr\_regionr_ + variants + extensions. Signalling an error can be done in various ways, depending on the most consistent approach for APIs in the module. For example: @@ -2259,6 +2265,9 @@ Signalling an error can be done in various ways, depending on the most consisten 4. return the input, but "Zzzz", and/or "ZZ" substituted for empty fields. 5. "und" +One by-product of this algorithm is that an element such as `` would be misleading: the 'fr' can never be replaced by 'en'. +The only subtags that can be replaced are deprecated ones, empty, und, Zzzz, and ZZ. + The lookup can be optimized. For example, if any of the tags in Step 2 are the same as previous ones in that list, they do not need to be tested. _Example1:_