CLDR-16999 Change Annex C Preprocessing (#3579)

unicode-org · Mar 28, 2024 · dd4be0a · dd4be0a
1 parent c7e39f1
commit dd4be0a
Showing 1 changed file with 53 additions and 20 deletions.
diff --git a/docs/ldml/tr35.md b/docs/ldml/tr35.md
@@ -3847,30 +3847,63 @@ The data from supplementalMetadata is (logically) preprocessed as follows.
    2. `<territoryAlias type="und-AAA" replacement="und-AA" reason="overlong" />`
 4. Change the _type_ and _replacement_ values in the remaining rules into multimap rules, as per _Definition 1. Multimap Interpretation_.
    1. Note that the “und” value disappears.
-5. Order the set of rules by the following levels
-   1. First order by the size of the union of all field value sets, with larger sizes before smaller sizes.
-     * So V={hepburn, heploc}} is before {R={CA}}
-	 * V={hepburn, heploc}} and {L={en}, R={GB}} are not ordered at this level
-   2. And then order by field, where L < S < R < V. Thus L is first and V is last.
-     * So {L={fr}, R={CA}} is before {V={fonipa, heploc}}.
-	 * V={hepburn, heploc}} and {V={hepburn, heploc}} are not ordered at this level
-     * After this point we are guaranteed to have the same set of fields, with possibly different field value sets.
-   3. And then order by field value sets, traversing also in the order of their fields L < S < R < V.
-     * To determine the ordering between a field value set A and B, traverse each in parallel
-     * If the corresponding field value sets for A and B are identical, then the next pair of field value sets is processed
-     * Otherwise at the first pair of differing field values, A is before B if its field value is alphabetically less, otherwise B is before.
+5. Order the set of rules using the following comparison logic:
+   1. For each rule, count the number of items in each field value set (L, S, R, V) and sum the four counts.
+      If two rules have differing sums, order the rule with the greater sum before the rule with the smaller sum.
+        * For example:
+        * {V={hepburn,heploc}} is tied with 
+        * {L={en}, R={GB}} (because both have 2 total field value items) and both precede 
+        * {R={CA}} (which has 1).
+   2. For rule pairs that are not differentiated by the previous step, consider the value set for each field in the order L, then S, then R, then V.
+      If one rule has a non-empty value set for that field and the other rule does not, 
+      then order the rule with the non-empty value set for that field before the other rule and disregard all later fields. 
+      Otherwise, consider the next field.
+        * For example:
+        * {L={zh}, S={Hant}, R={CN}} is tied with 
+        * {L={en}, S={Latn}, R={GB}} (because both have non-empty sets for L, S, and R but not for V),
+          and both precede
+        * {L={zh}, S={Hans}, V={pinyin}} (because it lacks values for R), 
+          which precedes 
+        * {L={en}, R={GB}, V={scouse}} (because it lacks values for S), 
+          which precedes 
+        * {V={fonipa,hepburn,heploc}} (because it lacks values for L),
+          which is tied with 
+        * {V={hepburn,heploc,simple}} (because both have non-empty sets for V but not for L, S, or R).
+   3. For rule pairs that are not differentiated by the previous step,
+      consider the value set for each field in the order L, then S, then R, then V as a sequence of subtags. 
+      If those lists for the same field of two rules differ, 
+      then consider the first position of difference in the two lists and order the rules by code-point order
+      of the field value at that position and disregard all later fields.
+      Otherwise, consider the next field.
+        * For example:
+        * {L={ja}, V={hepburn, heploc}} precedes 
+        * {L={zh}, V={1996, pinyin}} 
+          (because it has a different field value set for L and "ja" precedes "zh" at the first position of difference),
+          which precedes 
+        * {L={zh}, V={hepburn, heploc}}
+          (because it has the same field value set for L and a different field value set for V in which "1996" precedes "hepburn" at the first position of difference),
+          which precedes
+        * {L={zh}, V={hepburn, simple}}
+          (because it has the same field value set for L and a different field value set for V in which "heploc" precedes "simple" at the first position of difference).
 6. The result is the set of **Alias Rules**
 
 So using the examples above, we get the following order:
 
-| languageId            | i. size of union | ii. field order | iii. field value sets |
-| --------------------- | ---------------- | --------------- | --------------------- |
-| {L={en}, R={GB}}      | 2                | n/a             |                       |
-| {L={fr}, R={CA}}      | 2                | n/a             | en < fr               |
-| {V={fonipa, heploc}}  | 2                | L < V           |                       |
-| {V={hepburn, heploc}} | 2                | n/a             | fonipa < hepburn      |
-| {R={CA}}              | 1                | n/a             |                       |
-
+| languageId | 5.1 total field value set item count | 5.2 non-empty field value set | 5.3 field value set items |
+| --- | --- | --- | --- |
+| {L={en}, S={Latn}, R={GB}} | 3 | n/a | n/a |
+| {L={zh}, S={Hant}, R={CN}} | 3 | match (L, S, R) | in L, “en” before “zh” |
+| {L={zh}, S={Hans}, V={pinyin}} | 3 | (L, S, R, …) before (L, S, V) |  |
+| {L={en}, R={GB}, V={scouse}} | 3 | (L, S, …) before (L, R, …) |  |
+| {L={ja}, V={hepburn,heploc}} | 3 | (L, R, …) before (L, V) |  |
+| {L={zh}, V={1996,pinyin}} | 3 | match (L, V) | in L, “ja” before “zh” |
+| {L={zh}, V={hepburn,heploc}} | 3 | match (L, V) | in V, “1996” before “hepburn” |
+| {L={zh}, V={hepburn,simple}} | 3 | match (L, V) | in V, “heploc” before “simple” |
+| {V={fonipa,hepburn,heploc}} | 3 | (L, …) before (V) |  |
+| {V={hepburn,heploc,simple}} | 3 | match (V) | in V, “fonipa” before “hepburn” |
+| {L={en}, R={GB}} | 2 |  |  |
+| {V={hepburn,heploc}} | 2 | (L, …) before (V) |  |
+| {R={CA}} | 1 |  |  |
 
 ### <a name="processing-languageids" href="#processing-languageids">Processing LanguageIds</a>