Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-17535 BRS Build language containment #3980

Merged

Conversation

macchiati
Copy link
Member

@macchiati macchiati commented Aug 22, 2024

CLDR-17535

The output is impossible to diff, so the DiffLanguageGroups.java is needed to make sense of the differences.

I modified DiffLanguageGroups to make it easier to compare the changes. The current results are in https://docs.google.com/spreadsheets/d/1vx3qzX_gO_FJmNl9NgcC-_z2D4Rxp02WgylnEINQdXc/edit?gid=653556033 (the Containment sheet).

It is highlighting the changes in the TC locales and Other Basic+ locales, to focus the review on them, although all of the differences are listed. I addressed the most concerning ones, by adding overrides (also cleaned up the code around that.

The DiffLanguageGroups is also automatically run at the end of the generation.

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

@macchiati macchiati marked this pull request as ready for review August 23, 2024 00:00
@macchiati
Copy link
Member Author

Tests pass now, so I'd appreciate a review.

@macchiati macchiati requested a review from btangmu August 24, 2024 01:32
@srl295
Copy link
Member

srl295 commented Aug 24, 2024

a local run of git diff --word-diff gives a good approximation of the xml changes

Copy link
Member

@srl295 srl295 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still looking a the spreadsheet but I'm going to say LGTM for now

@@ -1,14 +1,18 @@
package org.unicode.cldr.util;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
package org.unicode.cldr.util;
package org.unicode.cldr.tool;

If this is really a tool, it should go under the tool package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I just didn't want to do that in a single PR.

@@ -603,67 +711,67 @@ private static Set<Set<String>> getAncestors(String leaf, Set<String> skipping)
}
return itemsFixed;
// TODO: delete this commented-out code?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete it instead of fixing its indent?

@srl295
Copy link
Member

srl295 commented Aug 24, 2024

Chart is helpful. This is a little concerning:

Missing Mapuche ⁅arn⁆, Betawi ⁅bew⁆, Choctaw ⁅cho⁆, Chickasaw ⁅cic⁆, Muscogee ⁅mus⁆, Papiamento ⁅pap⁆

any reason why these should be missing?

@macchiati
Copy link
Member Author

a local run of git diff --word-diff gives a good approximation of the xml changes

But that doesn't let you know what all the codes actually mean, and when they switch from one parent to another (or are just removed).

@macchiati
Copy link
Member Author

Chart is helpful. This is a little concerning:

Missing Mapuche ⁅arn⁆, Betawi ⁅bew⁆, Choctaw ⁅cho⁆, Chickasaw ⁅cic⁆, Muscogee ⁅mus⁆, Papiamento ⁅pap⁆

any reason why these should be missing?

The diffs are in the spreadsheet. https://docs.google.com/spreadsheets/d/1vx3qzX_gO_FJmNl9NgcC-_z2D4Rxp02WgylnEINQdXc/edit?gid=653556033#gid=653556033

I didn't focus on the OC cases, because they haven't reached basic; once they do, then this is worth following up on; but by that time, Wikidata might have data.

Version Key Locale Action Information
v45 OC Mapuche ⁅arn⁆ Removed Ω
v45 OC Betawi ⁅bew⁆ Removed Austronesian ⁅map⁆ ➡︎ Ω
v45 OC Choctaw ⁅cho⁆ Removed Ω
v45 OC Chickasaw ⁅cic⁆ Removed Ω
v45 OC Muscogee ⁅mus⁆ Removed Ω
v45 OC Papiamento ⁅pap⁆ Removed Creoles and pidgins, Portuguese-based ⁅cpp⁆ ➡︎ Romance ⁅roa⁆ ➡︎ Italic

FYI: the removal of Ω doesn't mean much; either way, there isn't a known parent.

@macchiati macchiati merged commit 5f091da into unicode-org:main Aug 24, 2024
12 checks passed
@macchiati macchiati deleted the CLDR-17535-BRS-Build-language-containment branch August 24, 2024 21:46
@srl295
Copy link
Member

srl295 commented Aug 24, 2024

Ok I misread I thought it has a parent before.

haytenf pushed a commit to haytenf/cldr that referenced this pull request Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants