Skip to content

Commit

Permalink
Merge remote-tracking branch 'la-vache/main' into L2/24-018
Browse files Browse the repository at this point in the history
  • Loading branch information
eggrobin committed Nov 13, 2024
2 parents 1c39d7e + 2fb2e8b commit f5c8f3f
Show file tree
Hide file tree
Showing 38 changed files with 812 additions and 1,011 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/cache_retain.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ jobs:
retain-maven-cache:
name: Run all tests with Maven
runs-on: ubuntu-latest
# Only run this on the upstream repo. Otherwise, running in a personal fork will cause
# Github to disable the personal fork copy of the workflow
# (Github complains about running a scheduled workflow on a repo with > 60 days of inactivity)
if: github.ref == 'refs/heads/main' && github.repository == 'unicode-org/unicodetools'
steps:
- name: Checkout and setup
uses: actions/checkout@v2
Expand Down
1 change: 1 addition & 0 deletions unicodetools/data/ucd/dev/ArabicShaping.txt
Original file line number Diff line number Diff line change
Expand Up @@ -482,6 +482,7 @@
088C; TAH WITH 3 DOTS BELOW; D; TAH
088D; KEHEH WITH VERTICAL 2 DOTS BELOW; D; GAF
088E; VERTICAL TAIL; R; VERTICAL TAIL
088F; DOTLESS NOON WITH SEPARATE RING ABOVE; D; NOON
0890; ARABIC POUND MARK ABOVE; U; No_Joining_Group
0891; ARABIC PIASTRE MARK ABOVE; U; No_Joining_Group

Expand Down
11 changes: 9 additions & 2 deletions unicodetools/data/ucd/dev/DerivedAge.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# DerivedAge-17.0.0.txt
# Date: 2024-10-16, 16:30:17 GMT
# Date: 2024-11-13, 21:34:52 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -2065,8 +2065,15 @@ A7DA..A7DC ; 16.0 # [3] LATIN CAPITAL LETTER LAMBDA..LATIN CAPITAL LETTER L

# Newly assigned in Unicode 17.0.0 (September, 2025)

088F ; 17.0 # ARABIC LETTER NOON WITH RING ABOVE
09FF ; 17.0 # BENGALI LETTER SANSKRIT BA
0B53..0B54 ; 17.0 # [2] ORIYA SIGN DOT ABOVE..ORIYA SIGN DOUBLE DOT ABOVE
0C5C ; 17.0 # TELUGU ARCHAIC SHRII
0CDC ; 17.0 # KANNADA ARCHAIC SHRII
1ACF..1ADD ; 17.0 # [15] COMBINING DOUBLE CARON..COMBINING DOT-AND-RING BELOW
1AE0..1AEB ; 17.0 # [12] COMBINING LEFT TACK ABOVE..COMBINING DOUBLE RIGHTWARDS ARROW ABOVE
2B96 ; 17.0 # EQUALS SIGN WITH INFINITY ABOVE

# Total code points: 1
# Total code points: 34

# EOF
90 changes: 51 additions & 39 deletions unicodetools/data/ucd/dev/DerivedCoreProperties.txt

Large diffs are not rendered by default.

16 changes: 9 additions & 7 deletions unicodetools/data/ucd/dev/EastAsianWidth.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# EastAsianWidth-16.0.0.txt
# Date: 2024-06-06, 16:52:22 GMT
# EastAsianWidth-17.0.0.txt
# Date: 2024-11-13, 21:35:14 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -332,7 +332,7 @@
0860..086A ; N # Lo [11] SYRIAC LETTER MALAYALAM NGA..SYRIAC LETTER MALAYALAM SSA
0870..0887 ; N # Lo [24] ARABIC LETTER ALEF WITH ATTACHED FATHA..ARABIC BASELINE ROUND DOT
0888 ; N # Sk ARABIC RAISED ROUND DOT
0889..088E ; N # Lo [6] ARABIC LETTER NOON WITH INVERTED SMALL V..ARABIC VERTICAL TAIL
0889..088F ; N # Lo [7] ARABIC LETTER NOON WITH INVERTED SMALL V..ARABIC LETTER NOON WITH RING ABOVE
0890..0891 ; N # Cf [2] ARABIC POUND MARK ABOVE..ARABIC PIASTRE MARK ABOVE
0897..089F ; N # Mn [9] ARABIC PEPET..ARABIC HALF MADDA OVER MADDA
08A0..08C8 ; N # Lo [41] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER GRAF
Expand Down Expand Up @@ -391,6 +391,7 @@
09FC ; N # Lo BENGALI LETTER VEDIC ANUSVARA
09FD ; N # Po BENGALI ABBREVIATION SIGN
09FE ; N # Mn BENGALI SANDHI MARK
09FF ; N # Lo BENGALI LETTER SANSKRIT BA
0A01..0A02 ; N # Mn [2] GURMUKHI SIGN ADAK BINDI..GURMUKHI SIGN BINDI
0A03 ; N # Mc GURMUKHI SIGN VISARGA
0A05..0A0A ; N # Lo [6] GURMUKHI LETTER A..GURMUKHI LETTER UU
Expand Down Expand Up @@ -454,7 +455,7 @@
0B47..0B48 ; N # Mc [2] ORIYA VOWEL SIGN E..ORIYA VOWEL SIGN AI
0B4B..0B4C ; N # Mc [2] ORIYA VOWEL SIGN O..ORIYA VOWEL SIGN AU
0B4D ; N # Mn ORIYA SIGN VIRAMA
0B55..0B56 ; N # Mn [2] ORIYA SIGN OVERLINE..ORIYA AI LENGTH MARK
0B53..0B56 ; N # Mn [4] ORIYA SIGN DOT ABOVE..ORIYA AI LENGTH MARK
0B57 ; N # Mc ORIYA AU LENGTH MARK
0B5C..0B5D ; N # Lo [2] ORIYA LETTER RRA..ORIYA LETTER RHA
0B5F..0B61 ; N # Lo [3] ORIYA LETTER YYA..ORIYA LETTER VOCALIC LL
Expand Down Expand Up @@ -502,7 +503,7 @@
0C4A..0C4D ; N # Mn [4] TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA
0C55..0C56 ; N # Mn [2] TELUGU LENGTH MARK..TELUGU AI LENGTH MARK
0C58..0C5A ; N # Lo [3] TELUGU LETTER TSA..TELUGU LETTER RRRA
0C5D ; N # Lo TELUGU LETTER NAKAARA POLLU
0C5C..0C5D ; N # Lo [2] TELUGU ARCHAIC SHRII..TELUGU LETTER NAKAARA POLLU
0C60..0C61 ; N # Lo [2] TELUGU LETTER VOCALIC RR..TELUGU LETTER VOCALIC LL
0C62..0C63 ; N # Mn [2] TELUGU VOWEL SIGN VOCALIC L..TELUGU VOWEL SIGN VOCALIC LL
0C66..0C6F ; N # Nd [10] TELUGU DIGIT ZERO..TELUGU DIGIT NINE
Expand All @@ -528,7 +529,7 @@
0CCA..0CCB ; N # Mc [2] KANNADA VOWEL SIGN O..KANNADA VOWEL SIGN OO
0CCC..0CCD ; N # Mn [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA
0CD5..0CD6 ; N # Mc [2] KANNADA LENGTH MARK..KANNADA AI LENGTH MARK
0CDD..0CDE ; N # Lo [2] KANNADA LETTER NAKAARA POLLU..KANNADA LETTER FA
0CDC..0CDE ; N # Lo [3] KANNADA ARCHAIC SHRII..KANNADA LETTER FA
0CE0..0CE1 ; N # Lo [2] KANNADA LETTER VOCALIC RR..KANNADA LETTER VOCALIC LL
0CE2..0CE3 ; N # Mn [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL
0CE6..0CEF ; N # Nd [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE
Expand Down Expand Up @@ -806,7 +807,8 @@
1AA8..1AAD ; N # Po [6] TAI THAM SIGN KAAN..TAI THAM SIGN CAANG
1AB0..1ABD ; N # Mn [14] COMBINING DOUBLED CIRCUMFLEX ACCENT..COMBINING PARENTHESES BELOW
1ABE ; N # Me COMBINING PARENTHESES OVERLAY
1ABF..1ACE ; N # Mn [16] COMBINING LATIN SMALL LETTER W BELOW..COMBINING LATIN SMALL LETTER INSULAR T
1ABF..1ADD ; N # Mn [31] COMBINING LATIN SMALL LETTER W BELOW..COMBINING DOT-AND-RING BELOW
1AE0..1AEB ; N # Mn [12] COMBINING LEFT TACK ABOVE..COMBINING DOUBLE RIGHTWARDS ARROW ABOVE
1B00..1B03 ; N # Mn [4] BALINESE SIGN ULU RICEM..BALINESE SIGN SURANG
1B04 ; N # Mc BALINESE SIGN BISAH
1B05..1B33 ; N # Lo [47] BALINESE LETTER AKARA..BALINESE LETTER HA
Expand Down
4 changes: 2 additions & 2 deletions unicodetools/data/ucd/dev/IndicPositionalCategory.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# IndicPositionalCategory-16.0.0.txt
# Date: 2024-04-30, 21:48:21 GMT
# Date: 2024-06-06, 09:37:46 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -423,7 +423,7 @@ AABB..AABC ; Visual_Order_Left # Lo [2] TAI VIET VOWEL AUE..TAI VIET VOWEL
0AFA..0AFF ; Top # Mn [6] GUJARATI SIGN SUKUN..GUJARATI SIGN TWO-CIRCLE NUKTA ABOVE
0B01 ; Top # Mn ORIYA SIGN CANDRABINDU
0B3F ; Top # Mn ORIYA VOWEL SIGN I
0B55..0B56 ; Top # Mn [2] ORIYA SIGN OVERLINE..ORIYA AI LENGTH MARK
0B53..0B56 ; Top # Mn [4] ORIYA SIGN DOT ABOVE..ORIYA AI LENGTH MARK
0B82 ; Top # Mn TAMIL SIGN ANUSVARA
0BC0 ; Top # Mn TAMIL VOWEL SIGN II
0BCD ; Top # Mn TAMIL SIGN VIRAMA
Expand Down
7 changes: 4 additions & 3 deletions unicodetools/data/ucd/dev/IndicSyllabicCategory.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# IndicSyllabicCategory-16.0.0.txt
# Date: 2024-04-30, 21:48:21 GMT
# IndicSyllabicCategory-17.0.0.txt
# Date: 2024-11-13, 16:22:02 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -525,7 +525,7 @@ ABD1 ; Vowel_Independent # Lo MEETEI MAYEK LETTER ATIYA
0B41..0B44 ; Vowel_Dependent # Mn [4] ORIYA VOWEL SIGN U..ORIYA VOWEL SIGN VOCALIC RR
0B47..0B48 ; Vowel_Dependent # Mc [2] ORIYA VOWEL SIGN E..ORIYA VOWEL SIGN AI
0B4B..0B4C ; Vowel_Dependent # Mc [2] ORIYA VOWEL SIGN O..ORIYA VOWEL SIGN AU
0B55..0B56 ; Vowel_Dependent # Mn [2] ORIYA SIGN OVERLINE..ORIYA AI LENGTH MARK
0B53..0B56 ; Vowel_Dependent # Mn [4] ORIYA SIGN DOT ABOVE..ORIYA AI LENGTH MARK
0B57 ; Vowel_Dependent # Mc ORIYA AU LENGTH MARK
0B62..0B63 ; Vowel_Dependent # Mn [2] ORIYA VOWEL SIGN VOCALIC L..ORIYA VOWEL SIGN VOCALIC LL
0BBE..0BBF ; Vowel_Dependent # Mc [2] TAMIL VOWEL SIGN AA..TAMIL VOWEL SIGN I
Expand Down Expand Up @@ -814,6 +814,7 @@ AA74..AA76 ; Consonant_Placeholder # Lo [3] MYANMAR LOGOGRAM KHAMTI OAY..MY
09DC..09DD ; Consonant # Lo [2] BENGALI LETTER RRA..BENGALI LETTER RHA
09DF ; Consonant # Lo BENGALI LETTER YYA
09F0..09F1 ; Consonant # Lo [2] BENGALI LETTER RA WITH MIDDLE DIAGONAL..BENGALI LETTER RA WITH LOWER DIAGONAL
09FF ; Consonant # Lo BENGALI LETTER SANSKRIT BA
0A15..0A28 ; Consonant # Lo [20] GURMUKHI LETTER KA..GURMUKHI LETTER NA
0A2A..0A30 ; Consonant # Lo [7] GURMUKHI LETTER PA..GURMUKHI LETTER RA
0A32..0A33 ; Consonant # Lo [2] GURMUKHI LETTER LA..GURMUKHI LETTER LLA
Expand Down
15 changes: 9 additions & 6 deletions unicodetools/data/ucd/dev/LineBreak.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# LineBreak-17.0.0.txt
# Date: 2024-10-16, 16:31:16 GMT
# Date: 2024-11-13, 21:35:15 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -278,7 +278,7 @@
0860..086A ; AL # Lo [11] SYRIAC LETTER MALAYALAM NGA..SYRIAC LETTER MALAYALAM SSA
0870..0887 ; AL # Lo [24] ARABIC LETTER ALEF WITH ATTACHED FATHA..ARABIC BASELINE ROUND DOT
0888 ; AL # Sk ARABIC RAISED ROUND DOT
0889..088E ; AL # Lo [6] ARABIC LETTER NOON WITH INVERTED SMALL V..ARABIC VERTICAL TAIL
0889..088F ; AL # Lo [7] ARABIC LETTER NOON WITH INVERTED SMALL V..ARABIC LETTER NOON WITH RING ABOVE
0890..0891 ; NU # Cf [2] ARABIC POUND MARK ABOVE..ARABIC PIASTRE MARK ABOVE
0897..089F ; CM # Mn [9] ARABIC PEPET..ARABIC HALF MADDA OVER MADDA
08A0..08C8 ; AL # Lo [41] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER GRAF
Expand Down Expand Up @@ -338,6 +338,7 @@
09FC ; AL # Lo BENGALI LETTER VEDIC ANUSVARA
09FD ; AL # Po BENGALI ABBREVIATION SIGN
09FE ; CM # Mn BENGALI SANDHI MARK
09FF ; AL # Lo BENGALI LETTER SANSKRIT BA
0A01..0A02 ; CM # Mn [2] GURMUKHI SIGN ADAK BINDI..GURMUKHI SIGN BINDI
0A03 ; CM # Mc GURMUKHI SIGN VISARGA
0A05..0A0A ; AL # Lo [6] GURMUKHI LETTER A..GURMUKHI LETTER UU
Expand Down Expand Up @@ -401,7 +402,7 @@
0B47..0B48 ; CM # Mc [2] ORIYA VOWEL SIGN E..ORIYA VOWEL SIGN AI
0B4B..0B4C ; CM # Mc [2] ORIYA VOWEL SIGN O..ORIYA VOWEL SIGN AU
0B4D ; CM # Mn ORIYA SIGN VIRAMA
0B55..0B56 ; CM # Mn [2] ORIYA SIGN OVERLINE..ORIYA AI LENGTH MARK
0B53..0B56 ; CM # Mn [4] ORIYA SIGN DOT ABOVE..ORIYA AI LENGTH MARK
0B57 ; CM # Mc ORIYA AU LENGTH MARK
0B5C..0B5D ; AL # Lo [2] ORIYA LETTER RRA..ORIYA LETTER RHA
0B5F..0B61 ; AL # Lo [3] ORIYA LETTER YYA..ORIYA LETTER VOCALIC LL
Expand Down Expand Up @@ -449,7 +450,7 @@
0C4A..0C4D ; CM # Mn [4] TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA
0C55..0C56 ; CM # Mn [2] TELUGU LENGTH MARK..TELUGU AI LENGTH MARK
0C58..0C5A ; AL # Lo [3] TELUGU LETTER TSA..TELUGU LETTER RRRA
0C5D ; AL # Lo TELUGU LETTER NAKAARA POLLU
0C5C..0C5D ; AL # Lo [2] TELUGU ARCHAIC SHRII..TELUGU LETTER NAKAARA POLLU
0C60..0C61 ; AL # Lo [2] TELUGU LETTER VOCALIC RR..TELUGU LETTER VOCALIC LL
0C62..0C63 ; CM # Mn [2] TELUGU VOWEL SIGN VOCALIC L..TELUGU VOWEL SIGN VOCALIC LL
0C66..0C6F ; NU # Nd [10] TELUGU DIGIT ZERO..TELUGU DIGIT NINE
Expand All @@ -475,7 +476,7 @@
0CCA..0CCB ; CM # Mc [2] KANNADA VOWEL SIGN O..KANNADA VOWEL SIGN OO
0CCC..0CCD ; CM # Mn [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA
0CD5..0CD6 ; CM # Mc [2] KANNADA LENGTH MARK..KANNADA AI LENGTH MARK
0CDD..0CDE ; AL # Lo [2] KANNADA LETTER NAKAARA POLLU..KANNADA LETTER FA
0CDC..0CDE ; AL # Lo [3] KANNADA ARCHAIC SHRII..KANNADA LETTER FA
0CE0..0CE1 ; AL # Lo [2] KANNADA LETTER VOCALIC RR..KANNADA LETTER VOCALIC LL
0CE2..0CE3 ; CM # Mn [2] KANNADA VOWEL SIGN VOCALIC L..KANNADA VOWEL SIGN VOCALIC LL
0CE6..0CEF ; NU # Nd [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE
Expand Down Expand Up @@ -776,7 +777,9 @@
1AA8..1AAD ; SA # Po [6] TAI THAM SIGN KAAN..TAI THAM SIGN CAANG
1AB0..1ABD ; CM # Mn [14] COMBINING DOUBLED CIRCUMFLEX ACCENT..COMBINING PARENTHESES BELOW
1ABE ; CM # Me COMBINING PARENTHESES OVERLAY
1ABF..1ACE ; CM # Mn [16] COMBINING LATIN SMALL LETTER W BELOW..COMBINING LATIN SMALL LETTER INSULAR T
1ABF..1ADD ; CM # Mn [31] COMBINING LATIN SMALL LETTER W BELOW..COMBINING DOT-AND-RING BELOW
1AE0..1AEA ; CM # Mn [11] COMBINING LEFT TACK ABOVE..COMBINING UPWARDS ARROW ABOVE
1AEB ; GL # Mn COMBINING DOUBLE RIGHTWARDS ARROW ABOVE
1B00..1B03 ; CM # Mn [4] BALINESE SIGN ULU RICEM..BALINESE SIGN SURANG
1B04 ; CM # Mc BALINESE SIGN BISAH
1B05..1B33 ; AK # Lo [47] BALINESE LETTER AKARA..BALINESE LETTER HA
Expand Down
Loading

0 comments on commit f5c8f3f

Please sign in to comment.