Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

un-break UCA tool: Adlam new FracUCA byte after Garay #616

Merged
merged 1 commit into from
Nov 29, 2023

Conversation

markusicu
Copy link
Member

In the CLDR/ICU FractionalUCA.txt, the addition of the new script Garay between Medefaidrin and Adlam pushed Adlam across a primary-weight lead byte boundary, which does not work for a primary-compressible script. I needed to split the sequence of scripts from Vai to Adlam across two lead bytes and decided to simply start a lead byte with Adlam.

Some of these scripts are not used widely but use a fair bit of primary weight space because they are cased, and thus stored with two-byte primary weights, so that their collation elements easily fit into 32 bits, instead of with three-byte primaries.

The CollationTest file diffs are large and hard to look at. I don't expect much of a review there.

@markusicu
Copy link
Member Author

... and the “Check UCA data” workflow passes for the first time in a while!
Let's keep it green now for Unicode 16!

@markusicu markusicu added the uca label Nov 29, 2023
Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I usually look over the generated Rules because I can make sense of those but not the plain numbers, but I guess those were unchanged for this PR.

@markusicu
Copy link
Member Author

I usually look over the generated Rules because I can make sense of those but not the plain numbers, but I guess those were unchanged for this PR.

I was looking at the generated FractionalUCA.txt while debugging and fixing, but that file is not checked into this repo, and it's too early to take Unicode 16 data into CLDR.

I am thinking that at some point I should modify the tool to directly generate a nearly complete CLDR FractionalUCA.txt file but with the weights “blanked”, and to check that into this repo, so that we see more directly what changes.

@markusicu markusicu merged commit dd1d2d6 into unicode-org:main Nov 29, 2023
10 checks passed
@markusicu markusicu deleted the fracuca-adlam-new-byte branch November 29, 2023 02:39
@macchiati
Copy link
Member

good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants