Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

U+0300 and U+0308 from L2/23-280 #748

Merged
merged 3 commits into from
Apr 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions unicodetools/data/ucd/dev/ScriptExtensions.txt
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,15 @@
02CD ; Latn Lisu # Lm MODIFIER LETTER LOW MACRON
02D7 ; Latn Thai # Sk MODIFIER LETTER MINUS SIGN
02D9 ; Bopo Latn # Sk DOT ABOVE
0300 ; Cher Copt Cyrl Grek Latn Perm Sunu Tale #Mn COMBINING GRAVE ACCENT
0301 ; Cher Cyrl Grek Latn Osge Sunu Tale Todr #Mn COMBINING ACUTE ACCENT
0302 ; Cher Cyrl Latn Tfng # Mn COMBINING CIRCUMFLEX ACCENT
0303 ; Glag Latn Sunu Syrc Thai # Mn COMBINING TILDE
0304 ; Aghb Cher Copt Cyrl Goth Grek Latn Osge Syrc Tfng Todr #Mn COMBINING MACRON
0305 ; Copt Elba Glag Goth Kana Latn # Mn COMBINING OVERLINE
0306 ; Cyrl Grek Latn Perm # Mn COMBINING BREVE
0307 ; Copt Hebr Latn Perm Syrc Tale Tfng Todr #Mn COMBINING DOT ABOVE
0308 ; Armn Cyrl Goth Grek Hebr Latn Perm Syrc Tale #Mn COMBINING DIAERESIS
0309 ; Latn Tfng # Mn COMBINING HOOK ABOVE
030A ; Latn Syrc # Mn COMBINING RING ABOVE
030B ; Cher Cyrl Latn Osge # Mn COMBINING DOUBLE ACUTE ACCENT
Expand Down
10 changes: 6 additions & 4 deletions unicodetools/src/test/java/org/unicode/test/TestSecurity.java
Original file line number Diff line number Diff line change
Expand Up @@ -453,15 +453,17 @@ public void TestScriptDetection() {
Set<Set<Script_Values>> expected = new HashSet<>();
String[][] tests = {
{"℮", "Common"},
{"1ℓ ℮", "Common"},
{"75 cl ℮", "Latin"},
markusicu marked this conversation as resolved.
Show resolved Hide resolved
{"ցօօց1℮", "Armenian"},
{"ցօօց1℮ー", "Armenian; Japanese"},
{"ー", "Japanese"},
{"カー", "Japanese"},
{"\u303C", "Han, Korean, Japanese"},
{"\u303Cー", "Japanese"},
{"\u303CA", "Latin; Han, Korean, Japanese"},
{"\u0300", "Common"},
{"\u0300.", "Common"},
{"\u0300", "Cherokee, Coptic, Cyrillic, Greek, Latin, Old_Permic, Sunuwar, Tai_Le"},
{"\u0300.", "Cherokee, Coptic, Cyrillic, Greek, Latin, Old_Permic, Sunuwar, Tai_Le"},
{"a\u0300", "Latin"},
{"ä", "Latin"},
};
Expand Down Expand Up @@ -524,9 +526,9 @@ public void TestWholeScripts() {
{"⼒", Status.SAME}, // KANGXI RADICAL POWER
{"力", Status.SAME}, // CJK UNIFIED IDEOGRAPH-529B
{"!", Status.SAME, Status.OTHER},
{"\u0300", Status.SAME},
{"\u0300", Status.COMMON},
{"a\u0300", Status.SAME, Status.COMMON, Status.OTHER},
{"ä", Status.SAME, Status.COMMON, Status.OTHER},
{"ä", Status.SAME, Status.OTHER},
Comment on lines +529 to +531
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at what these expected values are, but (a) I assume these do naturally fall out from the data changes and (b) I believe that whole-script confusables haven been put on ice until someone can figure out real use cases, so this probably does not matter a lot.

FYI @macchiati

{"idSet", "[[:L:][:M:][:N:]-[:nfkcqc=n:]]"}, // a typical identifier set
{"google", Status.SAME},
{"ցօօց1℮", Status.OTHER},
Expand Down
Loading