From 8689ca7b704b1061681a1aea49d9927b9e5ff6b3 Mon Sep 17 00:00:00 2001 From: macchiati Date: Thu, 28 Mar 2024 09:19:04 -0700 Subject: [PATCH] CLDR-16999 Change Annex C Preprocessing to use the suggested text (after verifying that the results are identical), with only minor edits. --- docs/ldml/tr35.md | 60 +++++++++++++++++++++++++++++++---------------- 1 file changed, 40 insertions(+), 20 deletions(-) diff --git a/docs/ldml/tr35.md b/docs/ldml/tr35.md index f1c38a22b52..e75a4abfa81 100644 --- a/docs/ldml/tr35.md +++ b/docs/ldml/tr35.md @@ -3847,30 +3847,50 @@ The data from supplementalMetadata is (logically) preprocessed as follows. 2. `` 4. Change the _type_ and _replacement_ values in the remaining rules into multimap rules, as per _Definition 1. Multimap Interpretation_. 1. Note that the “und” value disappears. -5. Order the set of rules by the following levels - 1. First order by the size of the union of all field value sets, with larger sizes before smaller sizes. - * So V={hepburn, heploc}} is before {R={CA}} - * V={hepburn, heploc}} and {L={en}, R={GB}} are not ordered at this level - 2. And then order by field, where L < S < R < V. Thus L is first and V is last. - * So {L={fr}, R={CA}} is before {V={fonipa, heploc}}. - * V={hepburn, heploc}} and {V={hepburn, heploc}} are not ordered at this level - * After this point we are guaranteed to have the same set of fields, with possibly different field value sets. - 3. And then order by field value sets, traversing also in the order of their fields L < S < R < V. - * To determine the ordering between a field value set A and B, traverse each in parallel - * If the corresponding field value sets for A and B are identical, then the next pair of field value sets is processed - * Otherwise at the first pair of differing field values, A is before B if its field value is alphabetically less, otherwise B is before. +5. Order the set of rules using the following comparison logic: + 1. For each rule, count the number of items in each field value set (L, S, R, V) and sum the four counts. + If two rules have differing sums, order the rule with the greater sum before the rule with the smaller sum. + * For example, {V={hepburn,heploc}} is tied with {L={en}, R={GB}} (because both have 2 total field value items) and both precede {R={CA}} (which has 1). + 2. For rule pairs that are not differentiated by the previous step, consider the value set for each field in the order L, then S, then R, then V. + If one rule has a non-empty value set for that field and the other rule does not, + then order the rule with the non-empty value set for that field before the other rule and disregard all later fields. + Otherwise, consider the next field. + * For example, {L={zh}, S={Hant}, R={CN}} is tied with {L={en}, S={Latn}, R={GB}} (because both have non-empty sets for L, S, and R but not for V), + and both precede {L={zh}, S={Hans}, V={pinyin}} (because it lacks values for R), + which precedes {L={en}, R={GB}, V={scouse}} (because it lacks values for S), + which precedes {V={fonipa,hepburn,heploc}} (because it lacks values for L), + which is tied with {V={hepburn,heploc,simple}} (because both have non-empty sets for V but not for L, S, or R). + 3. For rule pairs that are not differentiated by the previous step, + consider the value set for each field in the order L, then S, then R, then V as a sequence of subtags. + If those lists for the same field of two rules differ, + then consider the first position of difference in the two lists and order the rules by code-point order + of the field value at that position and disregard all later fields. + Otherwise, consider the next field. + * For example, {L={ja}, V={hepburn, heploc}} precedes {L={zh}, V={1996, pinyin}} + (because it has a different field value set for L and "ja" precedes "zh" at the first position of difference), + which precedes {L={zh}, V={hepburn, heploc}} + (because it has the same field value set for L and a different field value set for V in which "1996" precedes "hepburn" at the first position of difference), + which precedes {L={zh}, V={hepburn, simple}} + (because it has the same field value set for L and a different field value set for V in which "heploc" precedes "simple" at the first position of difference). 6. The result is the set of **Alias Rules** So using the examples above, we get the following order: -| languageId | i. size of union | ii. field order | iii. field value sets | -| --------------------- | ---------------- | --------------- | --------------------- | -| {L={en}, R={GB}} | 2 | n/a | | -| {L={fr}, R={CA}} | 2 | n/a | en < fr | -| {V={fonipa, heploc}} | 2 | L < V | | -| {V={hepburn, heploc}} | 2 | n/a | fonipa < hepburn | -| {R={CA}} | 1 | n/a | | - +| languageId | 5.1 total field value set item count | 5.2 non-empty field value set | 5.3 field value set items | +| --- | --- | --- | --- | +| {L={en}, S={Latn}, R={GB}} | 3 | n/a | n/a | +| {L={zh}, S={Hant}, R={CN}} | 3 | match (L, S, R) | in L, “en” before “zh” | +| {L={zh}, S={Hans}, V={pinyin}} | 3 | (L, S, R, …) before (L, S, V) | | +| {L={en}, R={GB}, V={scouse}} | 3 | (L, S, …) before (L, R, …) | | +| {L={ja}, V={hepburn,heploc}} | 3 | (L, R, …) before (L, V) | | +| {L={zh}, V={1996,pinyin}} | 3 | match (L, V) | in L, “ja” before “zh” | +| {L={zh}, V={hepburn,heploc}} | 3 | match (L, V) | in V, “1996” before “hepburn” | +| {L={zh}, V={hepburn,simple}} | 3 | match (L, V) | in V, “heploc” before “simple” | +| {V={fonipa,hepburn,heploc}} | 3 | (L, …) before (V) | | +| {V={hepburn,heploc,simple}} | 3 | match (V) | in V, “fonipa” before “hepburn” | +| {L={en}, R={GB}} | 2 | | | +| {V={hepburn,heploc}} | 2 | (L, …) before (V) | | +| {R={CA}} | 1 | | | ### Processing LanguageIds