Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign Sentence_Terminal and Terminal_Punctuation for 16.0 characters #698

Merged
merged 11 commits into from
May 6, 2024
16 changes: 11 additions & 5 deletions unicodetools/data/ucd/dev/PropList.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# PropList-16.0.0.txt
# Date: 2024-04-30, 19:07:23 GMT
# Date: 2024-04-30, 20:08:39 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -151,9 +151,10 @@ FF63 ; Quotation_Mark # Pe HALFWIDTH RIGHT CORNER BRACKET
1808..1809 ; Terminal_Punctuation # Po [2] MONGOLIAN MANCHU COMMA..MONGOLIAN MANCHU FULL STOP
1944..1945 ; Terminal_Punctuation # Po [2] LIMBU EXCLAMATION MARK..LIMBU QUESTION MARK
1AA8..1AAB ; Terminal_Punctuation # Po [4] TAI THAM SIGN KAAN..TAI THAM SIGN SATKAANKUU
1B4E..1B4F ; Terminal_Punctuation # Po [2] BALINESE INVERTED CARIK SIKI..BALINESE INVERTED CARIK PAREREN
1B5A..1B5B ; Terminal_Punctuation # Po [2] BALINESE PANTI..BALINESE PAMADA
1B5D..1B5F ; Terminal_Punctuation # Po [3] BALINESE CARIK PAMUNGKAH..BALINESE CARIK PAREREN
1B7D..1B7E ; Terminal_Punctuation # Po [2] BALINESE PANTI LANTANG..BALINESE PAMADA LANTANG
1B7D..1B7F ; Terminal_Punctuation # Po [3] BALINESE PANTI LANTANG..BALINESE PANTI BAWAK
1C3B..1C3F ; Terminal_Punctuation # Po [5] LEPCHA PUNCTUATION TA-ROL..LEPCHA PUNCTUATION TSHOOK
1C7E..1C7F ; Terminal_Punctuation # Po [2] OL CHIKI PUNCTUATION MUCAAD..OL CHIKI PUNCTUATION DOUBLE MUCAAD
203C..203D ; Terminal_Punctuation # Po [2] DOUBLE EXCLAMATION MARK..INTERROBANG
Expand Down Expand Up @@ -203,6 +204,7 @@ FF64 ; Terminal_Punctuation # Po HALFWIDTH IDEOGRAPHIC COMMA
111DE..111DF ; Terminal_Punctuation # Po [2] SHARADA SECTION MARK-1..SHARADA SECTION MARK-2
11238..1123C ; Terminal_Punctuation # Po [5] KHOJKI DANDA..KHOJKI DOUBLE SECTION MARK
112A9 ; Terminal_Punctuation # Po MULTANI SECTION MARK
113D4..113D5 ; Terminal_Punctuation # Po [2] TULU-TIGALARI DANDA..TULU-TIGALARI DOUBLE DANDA
1144B..1144D ; Terminal_Punctuation # Po [3] NEWA DANDA..NEWA COMMA
1145A..1145B ; Terminal_Punctuation # Po [2] NEWA DOUBLE COMMA..NEWA PLACEHOLDER MARK
115C2..115C5 ; Terminal_Punctuation # Po [4] SIDDHAM DANDA..SIDDHAM SEPARATOR BAR
Expand All @@ -223,11 +225,12 @@ FF64 ; Terminal_Punctuation # Po HALFWIDTH IDEOGRAPHIC COMMA
16AF5 ; Terminal_Punctuation # Po BASSA VAH FULL STOP
16B37..16B39 ; Terminal_Punctuation # Po [3] PAHAWH HMONG SIGN VOS THOM..PAHAWH HMONG SIGN CIM CHEEM
16B44 ; Terminal_Punctuation # Po PAHAWH HMONG SIGN XAUS
16D6E..16D6F ; Terminal_Punctuation # Po [2] KIRAT RAI DANDA..KIRAT RAI DOUBLE DANDA
16E97..16E98 ; Terminal_Punctuation # Po [2] MEDEFAIDRIN COMMA..MEDEFAIDRIN FULL STOP
1BC9F ; Terminal_Punctuation # Po DUPLOYAN PUNCTUATION CHINOOK FULL STOP
1DA87..1DA8A ; Terminal_Punctuation # Po [4] SIGNWRITING COMMA..SIGNWRITING COLON

# Total code points: 277
# Total code points: 284

# ================================================

Expand Down Expand Up @@ -1530,9 +1533,10 @@ FF65 ; Other_ID_Continue # Po HALFWIDTH KATAKANA MIDDLE DOT
1809 ; Sentence_Terminal # Po MONGOLIAN MANCHU FULL STOP
1944..1945 ; Sentence_Terminal # Po [2] LIMBU EXCLAMATION MARK..LIMBU QUESTION MARK
1AA8..1AAB ; Sentence_Terminal # Po [4] TAI THAM SIGN KAAN..TAI THAM SIGN SATKAANKUU
1B4E..1B4F ; Sentence_Terminal # Po [2] BALINESE INVERTED CARIK SIKI..BALINESE INVERTED CARIK PAREREN
1B5A..1B5B ; Sentence_Terminal # Po [2] BALINESE PANTI..BALINESE PAMADA
1B5E..1B5F ; Sentence_Terminal # Po [2] BALINESE CARIK SIKI..BALINESE CARIK PAREREN
1B7D..1B7E ; Sentence_Terminal # Po [2] BALINESE PANTI LANTANG..BALINESE PAMADA LANTANG
1B7D..1B7F ; Sentence_Terminal # Po [3] BALINESE PANTI LANTANG..BALINESE PANTI BAWAK
1C3B..1C3C ; Sentence_Terminal # Po [2] LEPCHA PUNCTUATION TA-ROL..LEPCHA PUNCTUATION NYET THYOOM TA-ROL
1C7E..1C7F ; Sentence_Terminal # Po [2] OL CHIKI PUNCTUATION MUCAAD..OL CHIKI PUNCTUATION DOUBLE MUCAAD
203C..203D ; Sentence_Terminal # Po [2] DOUBLE EXCLAMATION MARK..INTERROBANG
Expand Down Expand Up @@ -1570,6 +1574,7 @@ FF61 ; Sentence_Terminal # Po HALFWIDTH IDEOGRAPHIC FULL STOP
11238..11239 ; Sentence_Terminal # Po [2] KHOJKI DANDA..KHOJKI DOUBLE DANDA
1123B..1123C ; Sentence_Terminal # Po [2] KHOJKI SECTION MARK..KHOJKI DOUBLE SECTION MARK
112A9 ; Sentence_Terminal # Po MULTANI SECTION MARK
113D4..113D5 ; Sentence_Terminal # Po [2] TULU-TIGALARI DANDA..TULU-TIGALARI DOUBLE DANDA
1144B..1144C ; Sentence_Terminal # Po [2] NEWA DANDA..NEWA DOUBLE DANDA
115C2..115C3 ; Sentence_Terminal # Po [2] SIDDHAM DANDA..SIDDHAM DOUBLE DANDA
115C9..115D7 ; Sentence_Terminal # Po [15] SIDDHAM END OF TEXT MARK..SIDDHAM SECTION MARK WITH CIRCLES AND FOUR ENCLOSURES
Expand All @@ -1586,11 +1591,12 @@ FF61 ; Sentence_Terminal # Po HALFWIDTH IDEOGRAPHIC FULL STOP
16AF5 ; Sentence_Terminal # Po BASSA VAH FULL STOP
16B37..16B38 ; Sentence_Terminal # Po [2] PAHAWH HMONG SIGN VOS THOM..PAHAWH HMONG SIGN VOS TSHAB CEEB
16B44 ; Sentence_Terminal # Po PAHAWH HMONG SIGN XAUS
16D6E..16D6F ; Sentence_Terminal # Po [2] KIRAT RAI DANDA..KIRAT RAI DOUBLE DANDA
16E98 ; Sentence_Terminal # Po MEDEFAIDRIN FULL STOP
1BC9F ; Sentence_Terminal # Po DUPLOYAN PUNCTUATION CHINOOK FULL STOP
1DA88 ; Sentence_Terminal # Po SIGNWRITING FULL STOP

# Total code points: 156
# Total code points: 163

# ================================================

Expand Down
Binary file not shown.
9 changes: 6 additions & 3 deletions unicodetools/data/ucd/dev/auxiliary/SentenceBreakProperty.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# SentenceBreakProperty-16.0.0.txt
# Date: 2024-04-25, 17:07:52 GMT
# Date: 2024-04-30, 20:08:52 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -2702,9 +2702,10 @@ FF0E ; ATerm # Po FULLWIDTH FULL STOP
1809 ; STerm # Po MONGOLIAN MANCHU FULL STOP
1944..1945 ; STerm # Po [2] LIMBU EXCLAMATION MARK..LIMBU QUESTION MARK
1AA8..1AAB ; STerm # Po [4] TAI THAM SIGN KAAN..TAI THAM SIGN SATKAANKUU
1B4E..1B4F ; STerm # Po [2] BALINESE INVERTED CARIK SIKI..BALINESE INVERTED CARIK PAREREN
1B5A..1B5B ; STerm # Po [2] BALINESE PANTI..BALINESE PAMADA
1B5E..1B5F ; STerm # Po [2] BALINESE CARIK SIKI..BALINESE CARIK PAREREN
1B7D..1B7E ; STerm # Po [2] BALINESE PANTI LANTANG..BALINESE PAMADA LANTANG
1B7D..1B7F ; STerm # Po [3] BALINESE PANTI LANTANG..BALINESE PANTI BAWAK
1C3B..1C3C ; STerm # Po [2] LEPCHA PUNCTUATION TA-ROL..LEPCHA PUNCTUATION NYET THYOOM TA-ROL
1C7E..1C7F ; STerm # Po [2] OL CHIKI PUNCTUATION MUCAAD..OL CHIKI PUNCTUATION DOUBLE MUCAAD
203C..203D ; STerm # Po [2] DOUBLE EXCLAMATION MARK..INTERROBANG
Expand Down Expand Up @@ -2740,6 +2741,7 @@ FF61 ; STerm # Po HALFWIDTH IDEOGRAPHIC FULL STOP
11238..11239 ; STerm # Po [2] KHOJKI DANDA..KHOJKI DOUBLE DANDA
1123B..1123C ; STerm # Po [2] KHOJKI SECTION MARK..KHOJKI DOUBLE SECTION MARK
112A9 ; STerm # Po MULTANI SECTION MARK
113D4..113D5 ; STerm # Po [2] TULU-TIGALARI DANDA..TULU-TIGALARI DOUBLE DANDA
1144B..1144C ; STerm # Po [2] NEWA DANDA..NEWA DOUBLE DANDA
115C2..115C3 ; STerm # Po [2] SIDDHAM DANDA..SIDDHAM DOUBLE DANDA
115C9..115D7 ; STerm # Po [15] SIDDHAM END OF TEXT MARK..SIDDHAM SECTION MARK WITH CIRCLES AND FOUR ENCLOSURES
Expand All @@ -2756,11 +2758,12 @@ FF61 ; STerm # Po HALFWIDTH IDEOGRAPHIC FULL STOP
16AF5 ; STerm # Po BASSA VAH FULL STOP
16B37..16B38 ; STerm # Po [2] PAHAWH HMONG SIGN VOS THOM..PAHAWH HMONG SIGN VOS TSHAB CEEB
16B44 ; STerm # Po PAHAWH HMONG SIGN XAUS
16D6E..16D6F ; STerm # Po [2] KIRAT RAI DANDA..KIRAT RAI DOUBLE DANDA
16E98 ; STerm # Po MEDEFAIDRIN FULL STOP
1BC9F ; STerm # Po DUPLOYAN PUNCTUATION CHINOOK FULL STOP
1DA88 ; STerm # Po SIGNWRITING FULL STOP

# Total code points: 153
# Total code points: 160

# ================================================

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -507,6 +507,9 @@ $combiningExclusions ⊇ [$firstNonStarter & \p{dt=canonical}]

\p{Terminal_Punctuation} ⊃ \p{STerm}

# We forgot about Sentence_Terminal while working on 16.0α; this would have caught us.
\p{Name=/DANDA/} ⊆ \p{Sentence_Terminal}

# 132-M3 if a character has a general category of "Number", it must have a numeric type that is not equal to "none".

[\p{General_Category=Decimal_Number}\p{General_Category=Letter_Number}\p{General_Category=Other_Number}] ∥ \p{Numeric_Type=None}
Expand Down
Loading