Skip to content

Commit

Permalink
CLDR-16999 Change Annex C Preprocessing (#3579)
Browse files Browse the repository at this point in the history
  • Loading branch information
macchiati authored Mar 28, 2024
1 parent c7e39f1 commit dd4be0a
Showing 1 changed file with 53 additions and 20 deletions.
73 changes: 53 additions & 20 deletions docs/ldml/tr35.md
Original file line number Diff line number Diff line change
Expand Up @@ -3847,30 +3847,63 @@ The data from supplementalMetadata is (logically) preprocessed as follows.
2. `<territoryAlias type="und-AAA" replacement="und-AA" reason="overlong" />`
4. Change the _type_ and _replacement_ values in the remaining rules into multimap rules, as per _Definition 1. Multimap Interpretation_.
1. Note that the “und” value disappears.
5. Order the set of rules by the following levels
1. First order by the size of the union of all field value sets, with larger sizes before smaller sizes.
* So V={hepburn, heploc}} is before {R={CA}}
* V={hepburn, heploc}} and {L={en}, R={GB}} are not ordered at this level
2. And then order by field, where L < S < R < V. Thus L is first and V is last.
* So {L={fr}, R={CA}} is before {V={fonipa, heploc}}.
* V={hepburn, heploc}} and {V={hepburn, heploc}} are not ordered at this level
* After this point we are guaranteed to have the same set of fields, with possibly different field value sets.
3. And then order by field value sets, traversing also in the order of their fields L < S < R < V.
* To determine the ordering between a field value set A and B, traverse each in parallel
* If the corresponding field value sets for A and B are identical, then the next pair of field value sets is processed
* Otherwise at the first pair of differing field values, A is before B if its field value is alphabetically less, otherwise B is before.
5. Order the set of rules using the following comparison logic:
1. For each rule, count the number of items in each field value set (L, S, R, V) and sum the four counts.
If two rules have differing sums, order the rule with the greater sum before the rule with the smaller sum.
* For example:
* {V={hepburn,heploc}} is tied with
* {L={en}, R={GB}} (because both have 2 total field value items) and both precede
* {R={CA}} (which has 1).
2. For rule pairs that are not differentiated by the previous step, consider the value set for each field in the order L, then S, then R, then V.
If one rule has a non-empty value set for that field and the other rule does not,
then order the rule with the non-empty value set for that field before the other rule and disregard all later fields.
Otherwise, consider the next field.
* For example:
* {L={zh}, S={Hant}, R={CN}} is tied with
* {L={en}, S={Latn}, R={GB}} (because both have non-empty sets for L, S, and R but not for V),
and both precede
* {L={zh}, S={Hans}, V={pinyin}} (because it lacks values for R),
which precedes
* {L={en}, R={GB}, V={scouse}} (because it lacks values for S),
which precedes
* {V={fonipa,hepburn,heploc}} (because it lacks values for L),
which is tied with
* {V={hepburn,heploc,simple}} (because both have non-empty sets for V but not for L, S, or R).
3. For rule pairs that are not differentiated by the previous step,
consider the value set for each field in the order L, then S, then R, then V as a sequence of subtags.
If those lists for the same field of two rules differ,
then consider the first position of difference in the two lists and order the rules by code-point order
of the field value at that position and disregard all later fields.
Otherwise, consider the next field.
* For example:
* {L={ja}, V={hepburn, heploc}} precedes
* {L={zh}, V={1996, pinyin}}
(because it has a different field value set for L and "ja" precedes "zh" at the first position of difference),
which precedes
* {L={zh}, V={hepburn, heploc}}
(because it has the same field value set for L and a different field value set for V in which "1996" precedes "hepburn" at the first position of difference),
which precedes
* {L={zh}, V={hepburn, simple}}
(because it has the same field value set for L and a different field value set for V in which "heploc" precedes "simple" at the first position of difference).
6. The result is the set of **Alias Rules**

So using the examples above, we get the following order:

| languageId | i. size of union | ii. field order | iii. field value sets |
| --------------------- | ---------------- | --------------- | --------------------- |
| {L={en}, R={GB}} | 2 | n/a | |
| {L={fr}, R={CA}} | 2 | n/a | en < fr |
| {V={fonipa, heploc}} | 2 | L < V | |
| {V={hepburn, heploc}} | 2 | n/a | fonipa < hepburn |
| {R={CA}} | 1 | n/a | |

| languageId | 5.1 total field value set item count | 5.2 non-empty field value set | 5.3 field value set items |
| --- | --- | --- | --- |
| {L={en}, S={Latn}, R={GB}} | 3 | n/a | n/a |
| {L={zh}, S={Hant}, R={CN}} | 3 | match (L, S, R) | in L, “en” before “zh” |
| {L={zh}, S={Hans}, V={pinyin}} | 3 | (L, S, R, …) before (L, S, V) | |
| {L={en}, R={GB}, V={scouse}} | 3 | (L, S, …) before (L, R, …) | |
| {L={ja}, V={hepburn,heploc}} | 3 | (L, R, …) before (L, V) | |
| {L={zh}, V={1996,pinyin}} | 3 | match (L, V) | in L, “ja” before “zh” |
| {L={zh}, V={hepburn,heploc}} | 3 | match (L, V) | in V, “1996” before “hepburn” |
| {L={zh}, V={hepburn,simple}} | 3 | match (L, V) | in V, “heploc” before “simple” |
| {V={fonipa,hepburn,heploc}} | 3 | (L, …) before (V) | |
| {V={hepburn,heploc,simple}} | 3 | match (V) | in V, “fonipa” before “hepburn” |
| {L={en}, R={GB}} | 2 | | |
| {V={hepburn,heploc}} | 2 | (L, …) before (V) | |
| {R={CA}} | 1 | | |

### <a name="processing-languageids" href="#processing-languageids">Processing LanguageIds</a>

Expand Down

0 comments on commit dd4be0a

Please sign in to comment.