Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kirat Rai #445

Merged
merged 39 commits into from
Oct 25, 2023
Merged

Kirat Rai #445

merged 39 commits into from
Oct 25, 2023

Conversation

eggrobin
Copy link
Member

@eggrobin eggrobin commented Apr 18, 2023

[171-C13] Consensus: Accept 58 Kirat Rai characters in a new Kirat Rai block that extends from U+16D40..U+16D7F for a future version of the standard, with properties as amended in discussion. (Reference L2/22-043R)

[176-C25] Consensus: Generalize conjoining behavior to include Kirat Rai vowel signs, and set the Grapheme_Cluster_Break property of the Kirat Rai vowel signs {E, AI, AA, O, AU} to "V". For Unicode 16.0. See L2/23-160 item 4.2.

[176-A83] Action Item for Robin Leroy, PAG: Set the Grapheme_Cluster_Break property of the Kirat Rai vowel signs {E, AI, AA, O, AU} to "V". For Unicode 16.0. See L2/23-160 item 4.2.

RMG tracking: https://github.com/unicode-org/utc-release-management/issues/28

@markusicu
Copy link
Member

Preliminary script code: Krai

@eggrobin
Copy link
Member Author

eggrobin commented Oct 2, 2023

Checked consistency with Ken’s UnicodeData-16.0.0d6.txt and LineBreak-16.0.0d2.txt.

@eggrobin eggrobin marked this pull request as ready for review October 2, 2023 13:52
@eggrobin
Copy link
Member Author

eggrobin commented Oct 2, 2023

CI does not pass because of collation.

@eggrobin eggrobin marked this pull request as draft October 2, 2023 13:59
@eggrobin
Copy link
Member Author

eggrobin commented Oct 6, 2023

do you want to come up with them yourself and want me to review them?

I tried that (looking at the proposal and the assignments for other characters, with a couple of forays into the standard to figure out what was going on), but I am of course still mostly clueless about this, so please review carefully.

@markusicu
Copy link
Member

Re the UCA Main failure:

  • In the debugger, I see that it maps U+16D69 KIRAT RAI VOWEL SIGN O to collation elements corresponding to its Decomposition_Mapping 16D63 16D67, and because we don't have allkeys.txt data for them, the tool doesn't know what to do and barfs.
  • I don't know why it lets other pull requests through despite lack of data.

I will look at Ken's past notes and see if I can propose some data.

@markusicu
Copy link
Member

I added an initial sort order definition in unidata.txt here: 42bcaa8
@eggrobin please run your sifter and check in allkeys.txt.
@Ken-Whistler would you mind taking a look at my unidata.txt changes?

Copy link
Contributor

@roozbehp roozbehp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

unicodetools/data/ucd/dev/IndicPositionalCategory.txt Outdated Show resolved Hide resolved
unicodetools/data/ucd/dev/IndicSyllabicCategory.txt Outdated Show resolved Hide resolved
@markusicu
Copy link
Member

Looks like the allkeys.txt file generated fine 🎉

The "variable" punctuation characters are a little hard to make out because they are not in collation order, but the script block looks as expected (I think):

...
16ABE ; [.4F5B.0020.0002] # TANGSA LETTER ZA
16D40 ; [.4F5C.0020.0002] # KIRAT RAI SIGN ANUSVARA
16D41 ; [.4F5D.0020.0002] # KIRAT RAI SIGN TONPI
16D42 ; [.4F5E.0020.0002] # KIRAT RAI SIGN VISARGA
16D43 ; [.4F5F.0020.0002] # KIRAT RAI LETTER A
16D44 ; [.4F60.0020.0002] # KIRAT RAI LETTER KA
...
16D68 ; [.4F84.0020.0002] # KIRAT RAI VOWEL SIGN AI
16D67 16D67 ; [.4F84.0020.0002] # KIRAT RAI VOWEL SIGN AI
16D69 ; [.4F85.0020.0002] # KIRAT RAI VOWEL SIGN O
16D63 16D67 ; [.4F85.0020.0002] # KIRAT RAI VOWEL SIGN O
16D6A ; [.4F86.0020.0002] # KIRAT RAI VOWEL SIGN AU
16D69 16D67 ; [.4F86.0020.0002] # KIRAT RAI VOWEL SIGN AU
16D6B ; [.4F87.0020.0002] # KIRAT RAI SIGN VIRAMA
16D6C ; [.4F87.0020.0004] # KIRAT RAI SIGN SAAT
10000 ; [.4F88.0020.0002] # LINEAR B SYLLABLE B008 A
...

FYI @Ken-Whistler but definitely no urgency!

I see that there is not an explicit contraction for the recursive decomposition 16D6A --> 16D69 16D67 --> 16D63 16D67 16D67. Maybe need to tweak the sifter code to provide the full canonical closure here?

@markusicu
Copy link
Member

I see that we now have a different CI failure:

Caused by: java.lang.UnsupportedOperationException: unknown reorderCode 168
    at org.unicode.text.UCA.ReorderCodes.getSampleCharacter (ReorderCodes.java:146)
    at org.unicode.text.UCA.ReorderCodes.getScriptStartString (ReorderCodes.java:154)

This depends on me working on the script metadata. I have most of that data available, but don't have time right now to work it in.

We definitely need to change the CI so that a UCA-tool failure does not block UCD progress.

@eggrobin
Copy link
Member Author

eggrobin commented Oct 7, 2023

We definitely need to change the CI so that a UCA-tool failure does not block UCD progress.

Indeed. On it.

@eggrobin eggrobin marked this pull request as ready for review October 24, 2023 20:44
markusicu
markusicu previously approved these changes Oct 24, 2023
@eggrobin
Copy link
Member Author

Re our friends alpha dia ext: all the interesting signs are Alphabetic by their Gc so that is easy. It would probably make some sense to make the viramas Diacritic, but it is probably best to deal with that as part of https://github.com/unicode-org/properties/issues/195.

@eggrobin eggrobin merged commit 6d3768b into unicode-org:main Oct 25, 2023
9 of 10 checks passed
@roozbehp
Copy link
Contributor

roozbehp commented Nov 7, 2023

I investigated the issue of InPC thoroughly, and I now agree with Ken and Robin that we need InPC for the right side depdenent vowels of Kirat Rai. Will include them in my copy of InPC.txt and will figure out how to merge them into main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants