Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

16.0 normalization woes #619

Merged
merged 30 commits into from
Jan 22, 2024
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
cc76245
A test which I expected to fail, but not in this way
eggrobin Dec 1, 2023
e23d1c1
Pre-16 and NFKCQC
eggrobin Dec 1, 2023
24fe8e1
🤪
eggrobin Dec 2, 2023
328c761
Canonical closure tests
eggrobin Dec 29, 2023
2d0ceaf
Generate canonical closures
eggrobin Dec 29, 2023
3880c4f
Some interesting sequences
eggrobin Dec 29, 2023
b3b53c0
Some very crappy code
eggrobin Dec 29, 2023
22dfd8c
Drop Hangul and make sure we have all overlaps
eggrobin Dec 29, 2023
a742327
Split it into its own part and look at chaining compositions, not dec…
eggrobin Jan 3, 2024
182cc3a
despam
eggrobin Jan 3, 2024
53459b0
spots
eggrobin Jan 3, 2024
5f16271
Regenerate UCD
eggrobin Jan 3, 2024
747f982
Some comments.
eggrobin Jan 3, 2024
695c95e
Allow a single non-decomposable starter at either end of the chain
eggrobin Jan 4, 2024
9fea9ea
Deduplicate parts 4 and 5
eggrobin Jan 4, 2024
7362f2d
Remove redundant test cases in NFC (covered by the NFC column of othe…
eggrobin Jan 5, 2024
cdd391a
Clean things up
eggrobin Jan 5, 2024
7bcb9b4
more cleanup
eggrobin Jan 5, 2024
cf4275c
more cleanup
eggrobin Jan 5, 2024
3cb23ac
More testing
eggrobin Jan 7, 2024
e41b3ea
Fix the QC properties
eggrobin Jan 7, 2024
0c312ce
stray import
eggrobin Jan 7, 2024
361a977
factor
eggrobin Jan 7, 2024
0380b27
report all failures
eggrobin Jan 7, 2024
7a6220b
Markus’s suggestions
eggrobin Jan 20, 2024
89cdf7a
Merge remote-tracking branch 'la-vache/main' into normalization-woes
eggrobin Jan 20, 2024
e1a01ed
More honest primaryCompositesByMeowNFDCodePoint maps
eggrobin Jan 20, 2024
b0b4cf6
Regenerate UCD
eggrobin Jan 20, 2024
910039c
Merge branch 'normalization-woes' of https://github.com/eggrobin/unic…
eggrobin Jan 20, 2024
c21622e
spotless
eggrobin Jan 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions unicodetools/data/ucd/dev/DerivedNormalizationProps.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# DerivedNormalizationProps-16.0.0.txt
# Date: 2023-11-10, 20:57:25 GMT
# Date: 2024-01-07, 05:05:34 GMT
# © 2023 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -1167,17 +1167,17 @@ FB46..FB4E ; NFC_QC; N # Lo [9] HEBREW LETTER TSADI WITH DAGESH..HEBREW LET
113B8 ; NFC_QC; M # Mc TULU-TIGALARI VOWEL SIGN AA
113BB ; NFC_QC; M # Mn TULU-TIGALARI VOWEL SIGN U
113C2 ; NFC_QC; M # Mc TULU-TIGALARI VOWEL SIGN EE
113C9 ; NFC_QC; M # Mc TULU-TIGALARI AU LENGTH MARK
113C5 ; NFC_QC; M # Mc TULU-TIGALARI VOWEL SIGN AI
113C7..113C9 ; NFC_QC; M # Mc [3] TULU-TIGALARI VOWEL SIGN OO..TULU-TIGALARI AU LENGTH MARK
114B0 ; NFC_QC; M # Mc TIRHUTA VOWEL SIGN AA
114BA ; NFC_QC; M # Mn TIRHUTA VOWEL SIGN SHORT E
114BD ; NFC_QC; M # Mc TIRHUTA VOWEL SIGN SHORT O
115AF ; NFC_QC; M # Mc SIDDHAM VOWEL SIGN AA
11930 ; NFC_QC; M # Mc DIVES AKURU VOWEL SIGN AA
1611E..16120 ; NFC_QC; M # Mn [3] GURUNG KHEMA VOWEL SIGN AA..GURUNG KHEMA VOWEL SIGN II
16129 ; NFC_QC; M # Mn GURUNG KHEMA VOWEL LENGTH MARK
16D67 ; NFC_QC; M # Lo KIRAT RAI VOWEL SIGN E
1611E..16129 ; NFC_QC; M # Mn [12] GURUNG KHEMA VOWEL SIGN AA..GURUNG KHEMA VOWEL LENGTH MARK
16D67..16D68 ; NFC_QC; M # Lo [2] KIRAT RAI VOWEL SIGN E..KIRAT RAI VOWEL SIGN AI

# Total code points: 120
# Total code points: 132

# ================================================

Expand Down Expand Up @@ -2211,17 +2211,17 @@ FFED..FFEE ; NFKC_QC; N # So [2] HALFWIDTH BLACK SQUARE..HALFWIDTH WHITE CI
113B8 ; NFKC_QC; M # Mc TULU-TIGALARI VOWEL SIGN AA
113BB ; NFKC_QC; M # Mn TULU-TIGALARI VOWEL SIGN U
113C2 ; NFKC_QC; M # Mc TULU-TIGALARI VOWEL SIGN EE
113C9 ; NFKC_QC; M # Mc TULU-TIGALARI AU LENGTH MARK
113C5 ; NFKC_QC; M # Mc TULU-TIGALARI VOWEL SIGN AI
113C7..113C9 ; NFKC_QC; M # Mc [3] TULU-TIGALARI VOWEL SIGN OO..TULU-TIGALARI AU LENGTH MARK
114B0 ; NFKC_QC; M # Mc TIRHUTA VOWEL SIGN AA
114BA ; NFKC_QC; M # Mn TIRHUTA VOWEL SIGN SHORT E
114BD ; NFKC_QC; M # Mc TIRHUTA VOWEL SIGN SHORT O
115AF ; NFKC_QC; M # Mc SIDDHAM VOWEL SIGN AA
11930 ; NFKC_QC; M # Mc DIVES AKURU VOWEL SIGN AA
1611E..16120 ; NFKC_QC; M # Mn [3] GURUNG KHEMA VOWEL SIGN AA..GURUNG KHEMA VOWEL SIGN II
16129 ; NFKC_QC; M # Mn GURUNG KHEMA VOWEL LENGTH MARK
16D67 ; NFKC_QC; M # Lo KIRAT RAI VOWEL SIGN E
1611E..16129 ; NFKC_QC; M # Mn [12] GURUNG KHEMA VOWEL SIGN AA..GURUNG KHEMA VOWEL LENGTH MARK
16D67..16D68 ; NFKC_QC; M # Lo [2] KIRAT RAI VOWEL SIGN E..KIRAT RAI VOWEL SIGN AI

# Total code points: 120
# Total code points: 132

# ================================================

Expand Down
Loading
Loading