Skip to content

Commit

Permalink
Revert "Revert invariant tests"
Browse files Browse the repository at this point in the history
This reverts commit 80acf01.
  • Loading branch information
eggrobin committed Jan 10, 2024
1 parent 3ba16f5 commit 546071e
Showing 1 changed file with 31 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -746,6 +746,37 @@ Let $PostBaseSpacingMarks_Tweak = [\u103B \u1056 \u1057 \u1A57 \u1A6D]
Let $PostBaseSpacingMarks_Missed = []
[$PostBaseSpacingMarks_All - $PostBaseSpacingMarks_Tweak - $PostBaseSpacingMarks_Missed] ⊂ [:GCB=XX:]

# Check the consistency of grapheme cluster segmentation (both legacy and
# extended) with canonical equivalence.
# Non-starters are GCB=Extend or GCB=SpacingMark, so that GB9 and GB9a keep
# together any sequences that may be reordered by the Canonical Ordering
# Algorithm. This has been true ever since Extended Grapheme Clusters were
# added.
\P{U5.1.0:ccc=0} ⊆ [\p{U5.1.0:GCB=Extend}\p{U5.1.0:GCB=SpacingMark}]
\P{ccc=0} ⊆ [\p{GCB=Extend}\p{GCB=SpacingMark}]
# Non-starters are actually GCB=Extend, so that GB9 alone does the job, since
# there is no GB9a in legacy grapheme clusters.
# But not before Unicode Version 16.0, even though we were saying so since
# Unicode Version 4.0 (https://www.unicode.org/reports/tr29/tr29-4.html#Implementation_Notes),
# oops (see L2/24-009).
\P{U4.0.0:ccc=0} ⊆ \p{U4.0.0:Grapheme_Extend}

Check failure on line 762 in unicodetools/src/main/resources/org/unicode/text/UCD/UnicodeInvariantTest.txt

View workflow job for this annotation

GitHub Actions / Check UCD consistency, invariants, smoke-test generators

Invariant test failure

Expected empty, got: 5 [\U0001D166\U0001D16D\U0001D170-\U0001D172] In \P{U4.0.0:ccc=0} But Not In \p{U4.0.0:Grapheme_Extend} 1D166 # (𝅦) MUSICAL SYMBOL COMBINING SPRECHGESANG STEM 1D16D # (𝅭) MUSICAL SYMBOL COMBINING AUGMENTATION DOT 1D170..1D172 # [3] (𝅰..𝅲) MUSICAL SYMBOL COMBINING FLAG-3..MUSICAL SYMBOL COMBINING FLAG-5
\P{U4.1.0:ccc=0} ⊆ \p{U4.1.0:GCB=Extend}

Check failure on line 763 in unicodetools/src/main/resources/org/unicode/text/UCD/UnicodeInvariantTest.txt

View workflow job for this annotation

GitHub Actions / Check UCD consistency, invariants, smoke-test generators

Invariant test failure

Expected empty, got: 2 [\U0001D166\U0001D16D] In \P{U4.1.0:ccc=0} But Not In \p{U4.1.0:GCB=Extend} 1D166 # (𝅦) MUSICAL SYMBOL COMBINING SPRECHGESANG STEM 1D16D # (𝅭) MUSICAL SYMBOL COMBINING AUGMENTATION DOT
\P{U15.1.0:ccc=0} ⊆ \p{U15.1.0:GCB=Extend}

Check failure on line 764 in unicodetools/src/main/resources/org/unicode/text/UCD/UnicodeInvariantTest.txt

View workflow job for this annotation

GitHub Actions / Check UCD consistency, invariants, smoke-test generators

Invariant test failure

Expected empty, got: 18 [\u1715\u1734\u1B44\u1BAA\u1BF2\u1BF3\uA953\uA9C0\U000111C0\U00011235\U0001134D\U000116B6\U0001193D\U00011F41\U00016FF0\U00016FF1\U0001D166\U0001D16D] In \P{U15.1.0:ccc=0} But Not In \p{U15.1.0:GCB=Extend} 1715 # (᜕) TAGALOG SIGN PAMUDPOD 1734 # (᜴) HANUNOO SIGN PAMUDPOD 1B44 # (᭄) BALINESE ADEG ADEG 1BAA # (᮪) SUNDANESE SIGN PAMAAEH 1BF2..1BF3 # [2] (᯲..᯳) BATAK PANGOLAT..BATAK PANONGONAN A953 # (꥓) REJANG VIRAMA A9C0 # (꧀) JAVANESE PANGKON 111C0 # (𑇀) SHARADA SIGN VIRAMA 11235 # (𑈵) KHOJKI SIGN VIRAMA 1134D # (𑍍) GRANTHA SIGN VIRAMA 116B6 # (𑚶) TAKRI SIGN VIRAMA 1193D # (𑤽) DIVES AKURU SIGN HALANTA 11F41 # (𑽁) KAWI SIGN KILLER 16FF0..16FF1 # [2] (𖿰..𖿱) VIETNAMESE ALTERNATE READING MARK CA..VIETNAMESE ALTERNATE READING MARK NHAY 1D166 # (𝅦) MUSICAL SYMBOL COMBINING SPRECHGESANG STEM 1D16D # (𝅭) MUSICAL SYMBOL COMBINING AUGMENTATION DOT
\P{ccc=0} ⊆ \p{GCB=Extend}

Check failure on line 765 in unicodetools/src/main/resources/org/unicode/text/UCD/UnicodeInvariantTest.txt

View workflow job for this annotation

GitHub Actions / Check UCD consistency, invariants, smoke-test generators

Invariant test failure

Expected empty, got: 19 [\u1715\u1734\u1B44\u1BAA\u1BF2\u1BF3\uA953\uA9C0\U000111C0\U00011235\U0001134D\U000113CF\U000116B6\U0001193D\U00011F41\U00016FF0\U00016FF1\U0001D166\U0001D16D] In \P{ccc=0} But Not In \p{GCB=Extend} 1715 # (᜕) TAGALOG SIGN PAMUDPOD 1734 # (᜴) HANUNOO SIGN PAMUDPOD 1B44 # (᭄) BALINESE ADEG ADEG 1BAA # (᮪) SUNDANESE SIGN PAMAAEH 1BF2..1BF3 # [2] (᯲..᯳) BATAK PANGOLAT..BATAK PANONGONAN A953 # (꥓) REJANG VIRAMA A9C0 # (꧀) JAVANESE PANGKON 111C0 # (𑇀) SHARADA SIGN VIRAMA 11235 # (𑈵) KHOJKI SIGN VIRAMA 1134D # (𑍍) GRANTHA SIGN VIRAMA 113CF # (�) TULU-TIGALARI SIGN LOOPED VIRAMA 116B6 # (𑚶) TAKRI SIGN VIRAMA 1193D # (𑤽) DIVES AKURU SIGN HALANTA 11F41 # (𑽁) KAWI SIGN KILLER 16FF0..16FF1 # [2] (𖿰..𖿱) VIETNAMESE ALTERNATE READING MARK CA..VIETNAMESE ALTERNATE READING MARK NHAY 1D166 # (𝅦) MUSICAL SYMBOL COMBINING SPRECHGESANG STEM 1D16D # (𝅭) MUSICAL SYMBOL COMBINING AUGMENTATION DOT

# Characters that appear in non-initial position in the canonical decomposition
# of another character are either Extend, V, or T, so that sequences that are
# equivalent to a canonical composite are kept together by GB6..GB9.
# We only look at the starters, since we dealt with non-starters above.
# Characters that appear in non-initial position in the canonical decomposition
# of a primary composite are NFC_QC=Maybe. We would need to separately check
# the characters that appear in non-initial position in the canonical
# decomposition of a full composition exclusion.
# We would also need to separately check that the characters are T or V only
# appear in canonical decompositions where they follow an LV, LVT, V, or T, or
# an LV or V, respectively.
[\p{NFC_QC=Maybe}&\p{ccc=0}] ⊆ [\p{GCB=Extend}\p{GCB=T}\p{GCB=V}]

Check failure on line 778 in unicodetools/src/main/resources/org/unicode/text/UCD/UnicodeInvariantTest.txt

View workflow job for this annotation

GitHub Actions / Check UCD consistency, invariants, smoke-test generators

Invariant test failure

Expected empty, got: 3 [\U000113B8\U000113C2\U000113C9] In [\p{NFC_QC=Maybe}&\p{ccc=0}] But Not In [\p{GCB=Extend}\p{GCB=T}\p{GCB=V}] 113B8 # (�) TULU-TIGALARI VOWEL SIGN AA 113C2 # (�) TULU-TIGALARI VOWEL SIGN EE 113C9 # (�) TULU-TIGALARI AU LENGTH MARK

##########################
# Emoji
##########################
Expand Down

0 comments on commit 546071e

Please sign in to comment.