Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-11155 Test for ST pages with too many rows #3519

Closed

Conversation

macchiati
Copy link
Member

CLDR-11155

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

@macchiati macchiati requested a review from btangmu February 24, 2024 04:27
@macchiati
Copy link
Member Author

I took the code I made to find out the page sizes, and made it into a test. Once the emoji pages are handled, the number of failures will go down. (We can also raise the error threshold or make some exceptions if we want.)

final long minError = 600; // above this, emit error
final long minLog = 500; // otherwise above this, emit warning
Factory factory = CLDRConfig.getInstance().getCommonAndSeedAndMainAndAnnotationsFactory();
// "en", "cs", "ar", "pl"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// "en", "cs", "ar", "pl"

Comment on lines +1709 to +1714
+ "\t"
+ entry.getSectionId()
+ "\t"
+ entry
+ "\thas too many entries:\t"
+ count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • String.format will avoid taking up 6 lines
  • include minError also so it's clear what the target value is

@btangmu
Copy link
Member

btangmu commented Feb 26, 2024

#3508 is merged, so I'll restart the tests on this one

@btangmu
Copy link
Member

btangmu commented Feb 26, 2024

The test still fails, including:

build: tools/cldr-code/src/test/java/org/unicode/cldr/unittest/TestPathHeader.java#L1707
(TestPathHeader.java:1707) Error: am Characters Symbols3 has too many entries: 684
build: tools/cldr-code/src/test/java/org/unicode/cldr/unittest/TestPathHeader.java#L1707
(TestPathHeader.java:1707) Error: ar Date & Time Fields has too many entries: 705
build: tools/cldr-code/src/test/java/org/unicode/cldr/unittest/TestPathHeader.java#L1707
(TestPathHeader.java:1707) Error: ar Units Volume has too many entries: 740

@macchiati should I keep dividing pages until the test passes with the max being 600? Anyway I was about to start on Volume/Volume2

@macchiati
Copy link
Member Author

I think we identified the pages that we wanted to address. At this point we probably want to set the limit a bit higher so we just knock down the bad cases.

For volume, I recommend that we split by the system. Volume Metric and Volume Other.

Some notes:

From any unit you can get its system, and check it.

make a static final UnicodeSystems METRIC = Sets.of(UnitSystem.metric, UnitSystem.metric_adjacent);

extract the unit from the path
this is the long unit ID, which has an extra prefix (historic)

get a UnitConverter from the SupplementalDataInfo
use getShortId to convert to the short form
use getSourceToSystems to get a map, then use the short ID with it to get the system set
if the set of systems intersect with the constant metric, then put it in the first batch.
Collections.disjoint can be used to test intersection

@btangmu
Copy link
Member

btangmu commented Feb 26, 2024

Volume Metric and Volume Other

OK! I was in the middle of another approach, trying to split about halfway down starting Volume2 with "Gallon" but what you suggest sounds more meaningful

@macchiati macchiati marked this pull request as ready for review February 27, 2024 23:20
@macchiati
Copy link
Member Author

@btangmu Please integrate this in, setting the error size limit to a value that works, and incorporating into your work. You can reset the error size limit down after further work.

@btangmu
Copy link
Member

btangmu commented Apr 2, 2024

@macchiati This was the basis for #3563 and #3573 -- now redundant; OK to close?

@btangmu btangmu closed this Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants