Kirat Rai #445

eggrobin · 2023-04-18T16:48:50Z

[171-C13] Consensus: Accept 58 Kirat Rai characters in a new Kirat Rai block that extends from U+16D40..U+16D7F for a future version of the standard, with properties as amended in discussion. (Reference L2/22-043R)

[176-C25] Consensus: Generalize conjoining behavior to include Kirat Rai vowel signs, and set the Grapheme_Cluster_Break property of the Kirat Rai vowel signs {E, AI, AA, O, AU} to "V". For Unicode 16.0. See L2/23-160 item 4.2.

[176-A83] Action Item for Robin Leroy, PAG: Set the Grapheme_Cluster_Break property of the Kirat Rai vowel signs {E, AI, AA, O, AU} to "V". For Unicode 16.0. See L2/23-160 item 4.2.

RMG tracking: https://github.com/unicode-org/utc-release-management/issues/28

markusicu · 2023-09-06T18:28:54Z

Preliminary script code: Krai

eggrobin · 2023-10-02T13:51:57Z

Checked consistency with Ken’s UnicodeData-16.0.0d6.txt and LineBreak-16.0.0d2.txt.

eggrobin · 2023-10-02T13:59:15Z

CI does not pass because of collation.

eggrobin · 2023-10-06T13:14:19Z

do you want to come up with them yourself and want me to review them?

I tried that (looking at the proposal and the assignments for other characters, with a couple of forays into the standard to figure out what was going on), but I am of course still mostly clueless about this, so please review carefully.

markusicu · 2023-10-06T17:18:31Z

Re the UCA Main failure:

In the debugger, I see that it maps U+16D69 KIRAT RAI VOWEL SIGN O to collation elements corresponding to its Decomposition_Mapping 16D63 16D67, and because we don't have allkeys.txt data for them, the tool doesn't know what to do and barfs.
I don't know why it lets other pull requests through despite lack of data.

I will look at Ken's past notes and see if I can propose some data.

markusicu · 2023-10-06T21:01:09Z

I added an initial sort order definition in unidata.txt here: 42bcaa8
@eggrobin please run your sifter and check in allkeys.txt.
@Ken-Whistler would you mind taking a look at my unidata.txt changes?

roozbehp

Partial review

unicodetools/data/ucd/dev/IndicPositionalCategory.txt

unicodetools/data/ucd/dev/IndicSyllabicCategory.txt

markusicu · 2023-10-07T00:19:08Z

Looks like the allkeys.txt file generated fine 🎉

The "variable" punctuation characters are a little hard to make out because they are not in collation order, but the script block looks as expected (I think):

...
16ABE ; [.4F5B.0020.0002] # TANGSA LETTER ZA
16D40 ; [.4F5C.0020.0002] # KIRAT RAI SIGN ANUSVARA
16D41 ; [.4F5D.0020.0002] # KIRAT RAI SIGN TONPI
16D42 ; [.4F5E.0020.0002] # KIRAT RAI SIGN VISARGA
16D43 ; [.4F5F.0020.0002] # KIRAT RAI LETTER A
16D44 ; [.4F60.0020.0002] # KIRAT RAI LETTER KA
...
16D68 ; [.4F84.0020.0002] # KIRAT RAI VOWEL SIGN AI
16D67 16D67 ; [.4F84.0020.0002] # KIRAT RAI VOWEL SIGN AI
16D69 ; [.4F85.0020.0002] # KIRAT RAI VOWEL SIGN O
16D63 16D67 ; [.4F85.0020.0002] # KIRAT RAI VOWEL SIGN O
16D6A ; [.4F86.0020.0002] # KIRAT RAI VOWEL SIGN AU
16D69 16D67 ; [.4F86.0020.0002] # KIRAT RAI VOWEL SIGN AU
16D6B ; [.4F87.0020.0002] # KIRAT RAI SIGN VIRAMA
16D6C ; [.4F87.0020.0004] # KIRAT RAI SIGN SAAT
10000 ; [.4F88.0020.0002] # LINEAR B SYLLABLE B008 A
...

FYI @Ken-Whistler but definitely no urgency!

I see that there is not an explicit contraction for the recursive decomposition 16D6A --> 16D69 16D67 --> 16D63 16D67 16D67. Maybe need to tweak the sifter code to provide the full canonical closure here?

markusicu · 2023-10-07T00:22:30Z

I see that we now have a different CI failure:

Caused by: java.lang.UnsupportedOperationException: unknown reorderCode 168
    at org.unicode.text.UCA.ReorderCodes.getSampleCharacter (ReorderCodes.java:146)
    at org.unicode.text.UCA.ReorderCodes.getScriptStartString (ReorderCodes.java:154)

This depends on me working on the script metadata. I have most of that data available, but don't have time right now to work it in.

We definitely need to change the CI so that a UCA-tool failure does not block UCD progress.

eggrobin · 2023-10-07T00:24:16Z

We definitely need to change the CI so that a UCA-tool failure does not block UCD progress.

Indeed. On it.

eggrobin · 2023-10-24T21:37:17Z

Re our friends alpha dia ext: all the interesting signs are Alphabetic by their Gc so that is easy. It would probably make some sense to make the viramas Diacritic, but it is probably best to deal with that as part of https://github.com/unicode-org/properties/issues/195.

Done.

roozbehp · 2023-11-07T11:55:57Z

I investigated the issue of InPC thoroughly, and I now agree with Ken and Robin that we need InPC for the right side depdenent vowels of Kirat Rai. Will include them in my copy of InPC.txt and will figure out how to merge them into main.

eggrobin added 11 commits April 18, 2023 15:22

UnicodeData from L2/22-043R

6fced1a

LineBreak.txt lines from L2/22-043R

ebc8989

Ignore unused scripts when generating PVA

6c1d4f9

Extend names and types with private use script codes

e794664

spotless

8fae047

Merge branch 'scripts-from-the-future' into 171-C13

dff6281

Scripts

c1923a6

Blocks, ShortBlockNames

1fb0fb0

Regenerate UCD

a1c537d

GenerateEnums

1546f8c

somewhat overzealous invariant

a288c86

eggrobin added the data-for-new label Apr 18, 2023

eggrobin added 2 commits August 16, 2023 17:38

Merge remote-tracking branch 'la-vache/main' into 171-C13

9cc3f41

UTC-176-A83

851f175

eggrobin force-pushed the 171-C13 branch from edaecf2 to 851f175 Compare August 16, 2023 16:18

eggrobin mentioned this pull request Aug 16, 2023

Test consistency of segmentation with canonical equivalence #522

Open

eggrobin added the pipeline-16.0 label Sep 22, 2023

eggrobin added 3 commits October 2, 2023 15:40

Script codes from the future

0fa5ccd

Merge remote-tracking branch 'la-vache/main' into 171-C13

fff8aa3

Regenerate UCD postmerge

62dc4df

eggrobin marked this pull request as ready for review October 2, 2023 13:52

eggrobin marked this pull request as draft October 2, 2023 13:59

eggrobin added 5 commits October 3, 2023 12:24

Merge remote-tracking branch 'la-vache/main' into 171-C13

e3d1a3b

alphabetize

ea8d58e

Merge branch 'script-codes-from-the-future' into 171-C13

67f047a

Regenerate UCD

7dbcb48

Also sort LONG_SCRIPT

32066b5

eggrobin added 2 commits October 6, 2023 16:31

Intercalate

174eee7

allkeys.txt

c19a6c8

Kirat Rai initial sort order

42bcaa8

roozbehp previously requested changes Oct 6, 2023

View reviewed changes

unicodetools/data/ucd/dev/IndicPositionalCategory.txt Outdated Show resolved Hide resolved

unicodetools/data/ucd/dev/IndicSyllabicCategory.txt Outdated Show resolved Hide resolved

markusicu and others added 2 commits October 6, 2023 15:24

Kirat Rai sort order with feedback from KenW

b916418

cribravi

0dcc344

eggrobin mentioned this pull request Oct 7, 2023

Split the UCA checks into their own job, check UCD consistency #562

Merged

eggrobin added 7 commits October 9, 2023 16:07

Merge remote-tracking branch 'la-vache/main' into 171-C13

a88dfb6

GenerateEnums

aa727d6

Revert UCA changes for now

a1c9c55

Merge remote-tracking branch 'la-vache/main' into 171-C13

ad58661

Revert InPC per Roozbeh’s comment, albeit contra Ken’s

d72557b

Mention Kirat Rai in IndicMeowCategory headers

6c384ed

Regenerate UCD

a671601

eggrobin marked this pull request as ready for review October 24, 2023 20:44

markusicu previously approved these changes Oct 24, 2023

View reviewed changes

Merge remote-tracking branch 'la-vache/main' into 171-C13

d0a208f

eggrobin dismissed markusicu’s stale review via d0a208f October 25, 2023 12:53

markusicu approved these changes Oct 25, 2023

View reviewed changes

eggrobin merged commit 6d3768b into unicode-org:main Oct 25, 2023
9 of 10 checks passed

eggrobin mentioned this pull request Nov 9, 2023

Indic_Positional_Category for Kirat Rai #603

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kirat Rai #445

Kirat Rai #445

eggrobin commented Apr 18, 2023 •

edited

Loading

markusicu commented Sep 6, 2023

eggrobin commented Oct 2, 2023

eggrobin commented Oct 2, 2023

eggrobin commented Oct 6, 2023 •

edited

Loading

markusicu commented Oct 6, 2023

markusicu commented Oct 6, 2023

roozbehp left a comment

markusicu commented Oct 7, 2023

markusicu commented Oct 7, 2023

eggrobin commented Oct 7, 2023

eggrobin commented Oct 24, 2023

roozbehp commented Nov 7, 2023

Kirat Rai #445

Kirat Rai #445

Conversation

eggrobin commented Apr 18, 2023 • edited Loading

markusicu commented Sep 6, 2023

eggrobin commented Oct 2, 2023

eggrobin commented Oct 2, 2023

eggrobin commented Oct 6, 2023 • edited Loading

markusicu commented Oct 6, 2023

markusicu commented Oct 6, 2023

roozbehp left a comment

Choose a reason for hiding this comment

markusicu commented Oct 7, 2023

markusicu commented Oct 7, 2023

eggrobin commented Oct 7, 2023

eggrobin commented Oct 24, 2023

roozbehp commented Nov 7, 2023

eggrobin commented Apr 18, 2023 •

edited

Loading

eggrobin commented Oct 6, 2023 •

edited

Loading