CLDR-00000 Automated Build of Pages

unicode-org · Apr 2, 2024 · 8926d84 · 8926d84
1 parent e140952
commit 8926d84
Show file tree

Hide file tree

Showing 6 changed files with 27 additions and 18 deletions.
diff --git a/ldml/tr35-dates.html b/ldml/tr35-dates.html
@@ -1971,6 +1971,7 @@
     <tr><td>3</td><td>Abbreviated (e.g. MMM)</td></tr>
     <tr><td>4</td><td colspan="2">Wide / Long / Full (e.g. MMMM, EEEE)</td></tr>
     <tr><td>5</td><td colspan="2">Narrow (e.g. MMMMM, EEEEE)<br>(The counter-intuitive use of 5 letters for this is forced by backwards compatibility)</td></tr>
+    <tr><td>&gt;16</td><td colspan="2">Private Use<br>(Reserved for use by implementations using CLDR; will never be otherwise used by CLDR.)</td></tr>
 </tbody></table><p>Notes for the table below:</p><ul>
 <li>Any sequence of pattern characters other than those listed below is invalid. Invalid pattern fields should be handled for formatting and parsing as described in <a href="tr35.html#Invalid_Patterns">Handling Invalid Patterns</a>.</li>
 <li>The examples in the table below are merely illustrative and may not reflect current actual data.</li>

diff --git a/ldml/tr35-dates.md b/ldml/tr35-dates.md
@@ -2002,6 +2002,7 @@ The Date Field Symbol Table below shows the pattern characters (Sym.) and associ
     <tr><td>3</td><td>Abbreviated (e.g. MMM)</td></tr>
     <tr><td>4</td><td colspan="2">Wide / Long / Full (e.g. MMMM, EEEE)</td></tr>
     <tr><td>5</td><td colspan="2">Narrow (e.g. MMMMM, EEEEE)<br/>(The counter-intuitive use of 5 letters for this is forced by backwards compatibility)</td></tr>
+    <tr><td>&gt;16</td><td colspan="2">Private Use<br/>(Reserved for use by implementations using CLDR; will never be otherwise used by CLDR.)</td></tr>
 </table>
 
 Notes for the table below:

diff --git a/ldml/tr35-personNames.html b/ldml/tr35-personNames.html
@@ -345,8 +345,6 @@
 
 
 
-
-
 
 
 
@@ -515,11 +513,7 @@
 </li>
 </ul>
 </li>
-</ul><h3 id="api-implementation">API Implementation</h3><p>A draft API for formatting personal names was first included in ICU4J 73 and has been updated for ICU4J 74 to reflect updates in this specification and associated data. (“Draft” means that the full functionality is present, but the API might be refined before it is stabilized.) The implementation can be found at the following:</p><ul>
-<li><a href="https://github.com/unicode-org/icu/blob/main/icu4j/main/core/src/main/java/com/ibm/icu/text/PersonName.java">PersonName.java</a></li>
-<li><a href="https://github.com/unicode-org/icu/blob/main/icu4j/main/core/src/main/java/com/ibm/icu/text/PersonNameFormatter.java">PersonNameFormatter.java</a></li>
-<li><a href="https://github.com/unicode-org/icu/blob/main/icu4j/main/core/src/main/java/com/ibm/icu/text/SimplePersonName.java">SimplePersonName.java</a></li>
-</ul><p>In addition to the settings in this document, it is recommended that implementations provide some additional features in their APIs to allow more control for clients, notably:</p><ol>
+</ul><h3 id="api-implementation">API Implementation</h3><p>In addition to the settings in this document, it is recommended that implementations provide some additional features in their APIs to allow more control for clients, notably:</p><ol>
 <li>forceGivenFirst — no matter what the values are in nameOrderLocales or in the NameObject, display the name as givenFirst.</li>
 <li>forceSurnameFirst — no matter what the values are in nameOrderLocales or in the NameObject, display the name as surnameFirst.</li>
 <li>forceNativeOrdering — no matter what the values are in nameOrderLocales or in the NameObject, display the name with the same ordering as the native locale.</li>

diff --git a/ldml/tr35-personNames.md b/ldml/tr35-personNames.md
@@ -138,12 +138,6 @@ The following features are currently out of scope for Person Names formating:
 
 ### API Implementation
 
-A draft API for formatting personal names was first included in ICU4J 73 and has been updated for ICU4J 74 to reflect updates in this specification and associated data. (“Draft” means that the full functionality is present, but the API might be refined before it is stabilized.) The implementation can be found at the following:
-
-* [PersonName.java](https://github.com/unicode-org/icu/blob/main/icu4j/main/core/src/main/java/com/ibm/icu/text/PersonName.java)
-* [PersonNameFormatter.java](https://github.com/unicode-org/icu/blob/main/icu4j/main/core/src/main/java/com/ibm/icu/text/PersonNameFormatter.java)
-* [SimplePersonName.java](https://github.com/unicode-org/icu/blob/main/icu4j/main/core/src/main/java/com/ibm/icu/text/SimplePersonName.java)
-
 In addition to the settings in this document, it is recommended that implementations provide some additional features in their APIs to allow more control for clients, notably:
 
 1. forceGivenFirst — no matter what the values are in nameOrderLocales or in the NameObject, display the name as givenFirst.

diff --git a/ldml/tr35.html b/ldml/tr35.html
@@ -975,6 +975,7 @@
 
 
 
+
 
 
 <script src="./js/anchor.min.js"></script><div class="header"><table class="header" cellpadding="0" cellspacing="0" width="100%">
@@ -3292,7 +3293,15 @@ <h5 id="PRIVATE_USE"><a name="private_use" href="#PRIVATE_USE">PRIVATE_USE</a></
 Like other CLDR operations, these operations can also be used with language tags having [<a href="#BCP47">BCP47</a>] syntax, with the appropriate changes to the data.</p><p>An implementation may choose to exclude language tags with the language subtag "und" from the following operation. In such a case, only the canonicalization is done. An implementation can declare that it is doing the exclusion, or can take a parameter that controls whether or not to do it.</p><p><em><strong>Add Likely Subtags:</strong></em> <em>Given a source locale X, to return a locale Y where the empty subtags have been filled in by the most likely subtags.</em> This is written as X ⇒ Y ("X maximizes to Y").</p><p>A subtag is called <em>empty</em> if it is a missing script or region subtag, or it is a base language subtag with the value "und". In the description below, a subscript on a subtag <em>x</em> indicates which tag it is from: <em>xs</em> is in the source, <em>xm</em> is in a match, and <em>xr</em> is in the final result.</p><p>This operation is performed in the following way.</p><ol>
 <li><strong>Canonicalize.</strong><ol>
 <li>Make sure the input locale is in canonical form: uses the right separator, and has the right casing.</li>
-<li>Replace any deprecated subtags with their canonical values using the <code>&lt;alias&gt;</code> data in supplemental metadata. Use the first value in the replacement list, if it exists. Language tag replacements may have multiple parts, such as "sh" ➞ "sr_Latn" or "mo" ➞ "ro_MD". In such a case, the original script and/or region are retained if there is one. Thus "sh_Arab_AQ" ➞ "sr_Arab_AQ", not "sr_Latn_AQ".</li>
+<li>Replace any deprecated subtags with their canonical values using the <code>&lt;alias&gt;</code> data in supplemental metadata. Use the first value in the replacement list, if it exists.
+Language tag replacements may have multiple parts, such as "sh" ➞ "sr_Latn" or "mo" ➞ "ro_MD". In such a case, the original script and/or region are retained if there is
+one. Thus "sh_Arab_AQ" ➞ "sr_Arab_AQ", not "sr_Latn_AQ".<ul>
+<li>There are certain exceptions to this: some implementations still use three obsolete language subtags: iw, in, and yi.
+The likely subtags data currently supports those implementations by providing elements that handle them, 
+with the deprecated code on both sides: <code>&lt;likelySubtag from="iw"to="iw_Hebr_IL"/&gt;</code>
+Such implementations may refrain from replacing those deprecated tags.</li>
+</ul>
+</li>
 <li>If the tag is a legacy language tag (marked as “Type: grandfathered” in BCP 47; see <code>&lt;variable id="$grandfathered" type="choice"&gt;</code> in the supplemental data), then return it.</li>
 <li>Remove the script code 'Zzzz' and the region code 'ZZ' if they occur.</li>
 <li>Get the components of the cleaned-up source tag <em>(language<sub>s</sub>, script<sub>s</sub>,</em> and <em>region<sub>s</sub></em>), plus any variants and extensions.</li>
@@ -3309,7 +3318,7 @@ <h5 id="PRIVATE_USE"><a name="private_use" href="#PRIVATE_USE">PRIVATE_USE</a></
 <li><strong>Return</strong><ol>
 <li>If there is no match, signal an error and stop.</li>
 <li>Otherwise there is a match = <em>language<sub>m</sub>_script<sub>m</sub>_region<sub>m</sub></em></li>
-<li>Let x<sub>r</sub> = x<sub>s</sub> if x<sub>s</sub> is neither empty nor a macroregion, and x<sub>m</sub> otherwise.</li>
+<li>Let x<sub>r</sub> = x<sub>s</sub> if x<sub>s</sub> is neither empty nor 'und', and x<sub>m</sub> otherwise.</li>
 <li>Return the language tag composed of <em>language<sub>r</sub>_script<sub>r</sub>_region<sub>r</sub></em> + variants + extensions.</li>
 </ol>
 </li>
@@ -3319,7 +3328,8 @@ <h5 id="PRIVATE_USE"><a name="private_use" href="#PRIVATE_USE">PRIVATE_USE</a></
 <li>return the input (with missing fields)</li>
 <li>return the input, but "Zzzz", and/or "ZZ" substituted for empty fields.</li>
 <li>"und"</li>
-</ol><p>The lookup can be optimized. For example, if any of the tags in Step 2 are the same as previous ones in that list, they do not need to be tested.</p><p><em>Example1:</em></p><ul>
+</ol><p>One by-product of this algorithm is that an element such as <code>&lt;likelySubtag from="fr_IR "to="en_Arab"/&gt;</code> would be misleading: the 'fr' can never be replaced by 'en'.
+The only subtags that can be replaced are deprecated ones, empty, und, Zzzz, and ZZ. </p><p>The lookup can be optimized. For example, if any of the tags in Step 2 are the same as previous ones in that list, they do not need to be tested.</p><p><em>Example1:</em></p><ul>
 <li>Input is ZH-ZZZZ-SG.</li>
 <li>Normalize to zh_SG.</li>
 <li>Look up in table. No match.</li>

diff --git a/ldml/tr35.md b/ldml/tr35.md
@@ -2237,7 +2237,13 @@ This operation is performed in the following way.
 
 1. **Canonicalize.**
    1. Make sure the input locale is in canonical form: uses the right separator, and has the right casing.
-   2. Replace any deprecated subtags with their canonical values using the `<alias>` data in supplemental metadata. Use the first value in the replacement list, if it exists. Language tag replacements may have multiple parts, such as "sh" ➞ "sr_Latn" or "mo" ➞ "ro_MD". In such a case, the original script and/or region are retained if there is one. Thus "sh_Arab_AQ" ➞ "sr_Arab_AQ", not "sr_Latn_AQ".
+   2. Replace any deprecated subtags with their canonical values using the `<alias>` data in supplemental metadata. Use the first value in the replacement list, if it exists.
+      Language tag replacements may have multiple parts, such as "sh" ➞ "sr_Latn" or "mo" ➞ "ro_MD". In such a case, the original script and/or region are retained if there is
+      one. Thus "sh_Arab_AQ" ➞ "sr_Arab_AQ", not "sr_Latn_AQ".
+      * There are certain exceptions to this: some implementations still use three obsolete language subtags: iw, in, and yi.
+        The likely subtags data currently supports those implementations by providing elements that handle them, 
+        with the deprecated code on both sides: `<likelySubtag from="iw"to="iw_Hebr_IL"/>`
+        Such implementations may refrain from replacing those deprecated tags.
    3. If the tag is a legacy language tag (marked as “Type: grandfathered” in BCP 47; see `<variable id="$grandfathered" type="choice">` in the supplemental data), then return it.
    4. Remove the script code 'Zzzz' and the region code 'ZZ' if they occur.
    5. Get the components of the cleaned-up source tag _(language<sub>s</sub>, script<sub>s</sub>,_ and _region<sub>s</sub>_), plus any variants and extensions.
@@ -2250,7 +2256,7 @@ This operation is performed in the following way.
 3. **Return**
    1. If there is no match, signal an error and stop.
    2. Otherwise there is a match = _language<sub>m</sub>\_script<sub>m</sub>\_region<sub>m</sub>_
-   3. Let x<sub>r</sub> = x<sub>s</sub> if x<sub>s</sub> is neither empty nor a macroregion, and x<sub>m</sub> otherwise.
+   3. Let x<sub>r</sub> = x<sub>s</sub> if x<sub>s</sub> is neither empty nor 'und', and x<sub>m</sub> otherwise.
    4. Return the language tag composed of _language<sub>r</sub>\_script<sub>r</sub>\_region<sub>r</sub>_ + variants + extensions.
 
 Signalling an error can be done in various ways, depending on the most consistent approach for APIs in the module. For example:
@@ -2260,6 +2266,9 @@ Signalling an error can be done in various ways, depending on the most consisten
    4. return the input, but "Zzzz", and/or "ZZ" substituted for empty fields.
    5. "und"
 
+One by-product of this algorithm is that an element such as `<likelySubtag from="fr_IR "to="en_Arab"/>` would be misleading: the 'fr' can never be replaced by 'en'.
+The only subtags that can be replaced are deprecated ones, empty, und, Zzzz, and ZZ. 
+
 The lookup can be optimized. For example, if any of the tags in Step 2 are the same as previous ones in that list, they do not need to be tested.
 
 _Example1:_