Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-16999 Change Annex C Preprocessing #3579

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 53 additions & 20 deletions docs/ldml/tr35.md
Original file line number Diff line number Diff line change
Expand Up @@ -3847,30 +3847,63 @@ The data from supplementalMetadata is (logically) preprocessed as follows.
2. `<territoryAlias type="und-AAA" replacement="und-AA" reason="overlong" />`
4. Change the _type_ and _replacement_ values in the remaining rules into multimap rules, as per _Definition 1. Multimap Interpretation_.
1. Note that the “und” value disappears.
5. Order the set of rules by the following levels
1. First order by the size of the union of all field value sets, with larger sizes before smaller sizes.
* So V={hepburn, heploc}} is before {R={CA}}
* V={hepburn, heploc}} and {L={en}, R={GB}} are not ordered at this level
2. And then order by field, where L < S < R < V. Thus L is first and V is last.
* So {L={fr}, R={CA}} is before {V={fonipa, heploc}}.
* V={hepburn, heploc}} and {V={hepburn, heploc}} are not ordered at this level
* After this point we are guaranteed to have the same set of fields, with possibly different field value sets.
3. And then order by field value sets, traversing also in the order of their fields L < S < R < V.
* To determine the ordering between a field value set A and B, traverse each in parallel
* If the corresponding field value sets for A and B are identical, then the next pair of field value sets is processed
* Otherwise at the first pair of differing field values, A is before B if its field value is alphabetically less, otherwise B is before.
5. Order the set of rules using the following comparison logic:
1. For each rule, count the number of items in each field value set (L, S, R, V) and sum the four counts.
If two rules have differing sums, order the rule with the greater sum before the rule with the smaller sum.
* For example:
* {V={hepburn,heploc}} is tied with
* {L={en}, R={GB}} (because both have 2 total field value items) and both precede
* {R={CA}} (which has 1).
2. For rule pairs that are not differentiated by the previous step, consider the value set for each field in the order L, then S, then R, then V.
If one rule has a non-empty value set for that field and the other rule does not,
then order the rule with the non-empty value set for that field before the other rule and disregard all later fields.
Otherwise, consider the next field.
* For example:
* {L={zh}, S={Hant}, R={CN}} is tied with
* {L={en}, S={Latn}, R={GB}} (because both have non-empty sets for L, S, and R but not for V),
and both precede
* {L={zh}, S={Hans}, V={pinyin}} (because it lacks values for R),
which precedes
* {L={en}, R={GB}, V={scouse}} (because it lacks values for S),
which precedes
* {V={fonipa,hepburn,heploc}} (because it lacks values for L),
which is tied with
* {V={hepburn,heploc,simple}} (because both have non-empty sets for V but not for L, S, or R).
3. For rule pairs that are not differentiated by the previous step,
consider the value set for each field in the order L, then S, then R, then V as a sequence of subtags.
If those lists for the same field of two rules differ,
then consider the first position of difference in the two lists and order the rules by code-point order
of the field value at that position and disregard all later fields.
Otherwise, consider the next field.
* For example:
* {L={ja}, V={hepburn, heploc}} precedes
* {L={zh}, V={1996, pinyin}}
(because it has a different field value set for L and "ja" precedes "zh" at the first position of difference),
which precedes
* {L={zh}, V={hepburn, heploc}}
(because it has the same field value set for L and a different field value set for V in which "1996" precedes "hepburn" at the first position of difference),
which precedes
* {L={zh}, V={hepburn, simple}}
(because it has the same field value set for L and a different field value set for V in which "heploc" precedes "simple" at the first position of difference).
6. The result is the set of **Alias Rules**

So using the examples above, we get the following order:

| languageId | i. size of union | ii. field order | iii. field value sets |
| --------------------- | ---------------- | --------------- | --------------------- |
| {L={en}, R={GB}} | 2 | n/a | |
| {L={fr}, R={CA}} | 2 | n/a | en < fr |
| {V={fonipa, heploc}} | 2 | L < V | |
| {V={hepburn, heploc}} | 2 | n/a | fonipa < hepburn |
| {R={CA}} | 1 | n/a | |

| languageId | 5.1 total field value set item count | 5.2 non-empty field value set | 5.3 field value set items |
| --- | --- | --- | --- |
| {L={en}, S={Latn}, R={GB}} | 3 | n/a | n/a |
| {L={zh}, S={Hant}, R={CN}} | 3 | match (L, S, R) | in L, “en” before “zh” |
| {L={zh}, S={Hans}, V={pinyin}} | 3 | (L, S, R, …) before (L, S, V) | |
| {L={en}, R={GB}, V={scouse}} | 3 | (L, S, …) before (L, R, …) | |
| {L={ja}, V={hepburn,heploc}} | 3 | (L, R, …) before (L, V) | |
| {L={zh}, V={1996,pinyin}} | 3 | match (L, V) | in L, “ja” before “zh” |
| {L={zh}, V={hepburn,heploc}} | 3 | match (L, V) | in V, “1996” before “hepburn” |
| {L={zh}, V={hepburn,simple}} | 3 | match (L, V) | in V, “heploc” before “simple” |
| {V={fonipa,hepburn,heploc}} | 3 | (L, …) before (V) | |
| {V={hepburn,heploc,simple}} | 3 | match (V) | in V, “fonipa” before “hepburn” |
| {L={en}, R={GB}} | 2 | | |
| {V={hepburn,heploc}} | 2 | (L, …) before (V) | |
| {R={CA}} | 1 | | |

### <a name="processing-languageids" href="#processing-languageids">Processing LanguageIds</a>

Expand Down
Loading