Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR CollationTest: omit simplified radicals #914

Merged

Conversation

markusicu
Copy link
Member

While updating ICU to the latest Unicode 16 files, I got collation conformance test failures. The CollationTest files are generated with the implicit-weights Han sort order and already omit Han characters that are known to sort differently in that sort order vs. radical-stroke order.

With the recent change to make the CLDR radical-stroke order match the one in UAX38 (unicodetools PR #909), we need to omit some more characters. Characters with traditional and simplified radicals are now intermingled, and some of them now sort differently in implicit-Han vs. radical-stroke order. I changed the CollationTest generator to omit all of the simplified radicals.

No DUCET CollationTest file changes.

Related:

echeran
echeran previously approved these changes Aug 19, 2024
Copy link
Contributor

@echeran echeran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@markusicu
Copy link
Member Author

On this one, I will fix a TODO comment by turning into a note with more information.

macchiati
macchiati previously approved these changes Aug 19, 2024
@markusicu markusicu dismissed stale reviews from macchiati and echeran via 62e35a5 August 19, 2024 22:42
@markusicu
Copy link
Member Author

@macchiati thanks! I just pushed a second commit explaining why we now have more characters in the original Unihan block that don't sort in the improved radical-stroke order, resolving the TODO from my previous PR #909. @echeran FYI

@markusicu markusicu force-pushed the colltest-omit-simplified-radicals branch from 62e35a5 to d074ab7 Compare August 19, 2024 22:48
@markusicu markusicu requested review from echeran and macchiati August 19, 2024 22:51
@markusicu markusicu merged commit 5471274 into unicode-org:main Aug 20, 2024
16 checks passed
@markusicu markusicu deleted the colltest-omit-simplified-radicals branch August 20, 2024 00:20
@markusicu markusicu added the uca label Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants