Skip to content

Commit

Permalink
Merge remote-tracking branch 'la-vache/main' into 8-affricate-ligatures
Browse files Browse the repository at this point in the history
  • Loading branch information
eggrobin committed Aug 15, 2024
2 parents d833b17 + 177880f commit acbbcfb
Show file tree
Hide file tree
Showing 79 changed files with 17,758 additions and 15,494 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ jobs:
uses: actions/checkout@v3
with:
sparse-checkout: py/pipeline-workflow
- name: Check L2 document
- name: Check L2 document and WG references
run: |
python3 py/pipeline-workflow/check-l2-document.py
utc-decision:
Expand Down
5 changes: 3 additions & 2 deletions UnicodeJsps/src/test/java/org/unicode/jsptest/TestJsp.java
Original file line number Diff line number Diff line change
Expand Up @@ -929,7 +929,7 @@ public void TestIdna() {
checkValues(error, Uts46.SINGLETON);
checkValidIdna(Uts46.SINGLETON, "À。÷");
checkValidIdna(Uts46.SINGLETON, "≠"); // valid since Unicode 15.1
checkInvalidIdna(Uts46.SINGLETON, "\u0001");
checkInvalidIdna(Uts46.SINGLETON, "\u0080");
checkToUnicode(Uts46.SINGLETON, "ß。ab", "ß.ab");
// checkToPunyCode(Uts46.SINGLETON, "\u0002", "xn---");
checkToPunyCode(Uts46.SINGLETON, "ß。ab", "ss.ab");
Expand Down Expand Up @@ -973,7 +973,8 @@ public void TestIdna() {
private void checkValues(boolean[] error, Idna idna) {
checkToUnicodeAndPunyCode(idna, "α.xn--mxa", "α.α", "xn--mxa.xn--mxa");
checkValidIdna(idna, "a");
checkInvalidIdna(idna, "=");
// 33C2 ; disallowed # 1.1 SQUARE AM
checkInvalidIdna(idna, "㏂");
}

private void checkToUnicodeAndPunyCode(
Expand Down
36 changes: 16 additions & 20 deletions docs/help/changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,42 +3,38 @@
The Unicode Utilities have been modified to support both properties from the
released version of Unicode (via ICU) and from the new Unicode beta.

To get the beta version of the property, insert β *after* the property name.
To get the beta version of the property, insert `Uβ:` *before* the property name.
The explicit version number for the β can be used;
the resulting property is then only valid when that specific β is current.
Examples:

| `\p{Word_Break=ALetter}` | Released version of Unicode |
| `\p{Word_Breakβ=ALetter}` | Beta version of Unicode |
| Query | Result |
|---|---|
| `\p{Word_Break=ALetter}` | Released version of Unicode. |
| `\p{Uβ:Word_Break=ALetter}` | Beta version of Unicode; error outside of beta review. |
| `\p{U16β:Word_Break=ALetter}` | Beta version of Unicode 16.0; error during the beta review of any other version. |


For example, to see additions to that property value in the beta version, use:

<center>

[`\p{Word_Breakβ=ALetter}-\\p{Word_Break=ALetter}`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BWord_Break%CE%B2%3DALetter%7D-%5Cp%7BWord_Break%3DALetter%7D&g=&i=)
[`\p{Uβ:Word_Break=ALetter}-\p{Word_Break=ALetter}`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BU%CE%B2%3AWord_Break%3DALetter%7D-%5Cp%7BWord_Break%3DALetter%7D&g=&i=)

</center>


## Caveats

The support is not complete done, and there are some known problems.

1. Some properties are not supported in beta versions. See
<https://util.unicode.org/UnicodeJsps/properties.jsp>
for the list.
2. When characters are listed, the new blocks and subheads don't show up.
3. If you use a property that has a β version but no ICU version, you get no
error: just an empty listing.
4. The beta properties don't yet have the "shorthands" for cases like \\p{Lu}.
So make sure the property is listed, eg \\p{gcβ=Lu}
1. Example:
[`\p{gcβ=Lu}-\\p{gc=Lu}`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7Bgc%CE%B2%3DLu%7D-%5Cp%7Bgc%3DLu%7D&g=&i=)
5. Tools for segmentation, etc. use the release properties; there isn't a way
The support is not completely done, and there are some known problems.

1. The General_Category groupings such as \\p{Uβ:L} are not correctly implemented.
Only actual values, such as \\p{Uβ:Lu} etc., work.
2. Tools for segmentation, etc. use the release properties; there isn't a way
to have them use the beta properties.
6. There are probably others...
3. There are probably others...

If you find a problem, please file a ticket at
<https://cldr.unicode.org/index/bug-reports>: make sure to start the summary with
"Unicode Utilities: "
https://github.com/unicode-org/unicodetools/issues.

[Back to Unicode Utilities Help Home](index)
1 change: 1 addition & 0 deletions docs/pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ PR preparation:
- [ ] If from SAH — Link SAH issue
- [ ] If from ESC or CJK — Mention ESC or CJK in the PR description
- [ ] When for a UTC decision — Cite in the format UTC-\d\d\d-[MC]\d+ or with a link.
- [ ] Link RMG issue
- [ ] Whenever there is a Proposal document — Cite L2 number in the format L2/yy-nnn
- [ ] data-for-new — Set label
- [ ] pipeline-* — Set label to **pipeline-recommended-to-UTC** if the characters are not yet in the pipeline, and **pipeline-provisionally-assigned**, or **pipeline-`<version>`** depending on their status in [the Pipeline](https://unicode.org/alloc/Pipeline.html#future).
Expand Down
2 changes: 1 addition & 1 deletion pub/copy-beta-to-draft.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ mv $DRAFT/UCD/ucd/zipped-ReadMe.txt $DRAFT/zipped/ReadMe.txt

mkdir -p $DRAFT/UCA
cp -r $UNITOOLS_DATA/uca/dev/* $DRAFT/UCA
sed -i -f $DEST/sed-readmes.txt $DRAFT/UCA/CollationTest.html
sed -i -f $DRAFT/sed-readmes.txt $DRAFT/UCA/CollationTest.html

mkdir -p $DRAFT/emoji
cp $UNITOOLS_DATA/emoji/dev/* $DRAFT/emoji
Expand Down
5 changes: 5 additions & 0 deletions py/pipeline-workflow/check-l2-document.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,9 @@
"PRs for character additions must include a link to the SAH issue, or "
"the mention ESC or CJK.")
errors += 1
if not re.search(r"(unicode-org/utc-release-management(#|/issues/)\d)", pr_body):
print("::error title=Need RMG reference::"
"PRs for character additions must include a link to the corresponding "
"RMG issue.")
errors += 1
exit(errors)
14 changes: 7 additions & 7 deletions unicodetools/data/emoji/dev/emoji-test.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# emoji-test.txt
# Date: 2024-06-04, 16:46:01 GMT
# Date: 2024-08-14, 23:51:54 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -1751,12 +1751,12 @@
1F936 1F3FD ; fully-qualified # 🤶🏽 E3.0 Mrs. Claus: medium skin tone
1F936 1F3FE ; fully-qualified # 🤶🏾 E3.0 Mrs. Claus: medium-dark skin tone
1F936 1F3FF ; fully-qualified # 🤶🏿 E3.0 Mrs. Claus: dark skin tone
1F9D1 200D 1F384 ; fully-qualified # 🧑‍🎄 E13.0 Mx claus
1F9D1 1F3FB 200D 1F384 ; fully-qualified # 🧑🏻‍🎄 E13.0 Mx claus: light skin tone
1F9D1 1F3FC 200D 1F384 ; fully-qualified # 🧑🏼‍🎄 E13.0 Mx claus: medium-light skin tone
1F9D1 1F3FD 200D 1F384 ; fully-qualified # 🧑🏽‍🎄 E13.0 Mx claus: medium skin tone
1F9D1 1F3FE 200D 1F384 ; fully-qualified # 🧑🏾‍🎄 E13.0 Mx claus: medium-dark skin tone
1F9D1 1F3FF 200D 1F384 ; fully-qualified # 🧑🏿‍🎄 E13.0 Mx claus: dark skin tone
1F9D1 200D 1F384 ; fully-qualified # 🧑‍🎄 E13.0 Mx Claus
1F9D1 1F3FB 200D 1F384 ; fully-qualified # 🧑🏻‍🎄 E13.0 Mx Claus: light skin tone
1F9D1 1F3FC 200D 1F384 ; fully-qualified # 🧑🏼‍🎄 E13.0 Mx Claus: medium-light skin tone
1F9D1 1F3FD 200D 1F384 ; fully-qualified # 🧑🏽‍🎄 E13.0 Mx Claus: medium skin tone
1F9D1 1F3FE 200D 1F384 ; fully-qualified # 🧑🏾‍🎄 E13.0 Mx Claus: medium-dark skin tone
1F9D1 1F3FF 200D 1F384 ; fully-qualified # 🧑🏿‍🎄 E13.0 Mx Claus: dark skin tone
1F9B8 ; fully-qualified # 🦸 E11.0 superhero
1F9B8 1F3FB ; fully-qualified # 🦸🏻 E11.0 superhero: light skin tone
1F9B8 1F3FC ; fully-qualified # 🦸🏼 E11.0 superhero: medium-light skin tone
Expand Down
14 changes: 7 additions & 7 deletions unicodetools/data/emoji/dev/emoji-zwj-sequences.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# emoji-zwj-sequences.txt
# Date: 2024-06-04, 16:46:01 GMT
# Date: 2024-08-14, 23:51:54 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -665,7 +665,7 @@
1F9D1 200D 1F33E ; RGI_Emoji_ZWJ_Sequence ; farmer # E12.1 [1] (🧑‍🌾)
1F9D1 200D 1F373 ; RGI_Emoji_ZWJ_Sequence ; cook # E12.1 [1] (🧑‍🍳)
1F9D1 200D 1F37C ; RGI_Emoji_ZWJ_Sequence ; person feeding baby # E13.0 [1] (🧑‍🍼)
1F9D1 200D 1F384 ; RGI_Emoji_ZWJ_Sequence ; Mx claus # E13.0 [1] (🧑‍🎄)
1F9D1 200D 1F384 ; RGI_Emoji_ZWJ_Sequence ; Mx Claus # E13.0 [1] (🧑‍🎄)
1F9D1 200D 1F393 ; RGI_Emoji_ZWJ_Sequence ; student # E12.1 [1] (🧑‍🎓)
1F9D1 200D 1F3A4 ; RGI_Emoji_ZWJ_Sequence ; singer # E12.1 [1] (🧑‍🎤)
1F9D1 200D 1F3A8 ; RGI_Emoji_ZWJ_Sequence ; artist # E12.1 [1] (🧑‍🎨)
Expand All @@ -689,7 +689,7 @@
1F9D1 1F3FB 200D 1F33E ; RGI_Emoji_ZWJ_Sequence ; farmer: light skin tone # E12.1 [1] (🧑🏻‍🌾)
1F9D1 1F3FB 200D 1F373 ; RGI_Emoji_ZWJ_Sequence ; cook: light skin tone # E12.1 [1] (🧑🏻‍🍳)
1F9D1 1F3FB 200D 1F37C ; RGI_Emoji_ZWJ_Sequence ; person feeding baby: light skin tone # E13.0 [1] (🧑🏻‍🍼)
1F9D1 1F3FB 200D 1F384 ; RGI_Emoji_ZWJ_Sequence ; Mx claus: light skin tone # E13.0 [1] (🧑🏻‍🎄)
1F9D1 1F3FB 200D 1F384 ; RGI_Emoji_ZWJ_Sequence ; Mx Claus: light skin tone # E13.0 [1] (🧑🏻‍🎄)
1F9D1 1F3FB 200D 1F393 ; RGI_Emoji_ZWJ_Sequence ; student: light skin tone # E12.1 [1] (🧑🏻‍🎓)
1F9D1 1F3FB 200D 1F3A4 ; RGI_Emoji_ZWJ_Sequence ; singer: light skin tone # E12.1 [1] (🧑🏻‍🎤)
1F9D1 1F3FB 200D 1F3A8 ; RGI_Emoji_ZWJ_Sequence ; artist: light skin tone # E12.1 [1] (🧑🏻‍🎨)
Expand All @@ -713,7 +713,7 @@
1F9D1 1F3FC 200D 1F33E ; RGI_Emoji_ZWJ_Sequence ; farmer: medium-light skin tone # E12.1 [1] (🧑🏼‍🌾)
1F9D1 1F3FC 200D 1F373 ; RGI_Emoji_ZWJ_Sequence ; cook: medium-light skin tone # E12.1 [1] (🧑🏼‍🍳)
1F9D1 1F3FC 200D 1F37C ; RGI_Emoji_ZWJ_Sequence ; person feeding baby: medium-light skin tone # E13.0 [1] (🧑🏼‍🍼)
1F9D1 1F3FC 200D 1F384 ; RGI_Emoji_ZWJ_Sequence ; Mx claus: medium-light skin tone # E13.0 [1] (🧑🏼‍🎄)
1F9D1 1F3FC 200D 1F384 ; RGI_Emoji_ZWJ_Sequence ; Mx Claus: medium-light skin tone # E13.0 [1] (🧑🏼‍🎄)
1F9D1 1F3FC 200D 1F393 ; RGI_Emoji_ZWJ_Sequence ; student: medium-light skin tone # E12.1 [1] (🧑🏼‍🎓)
1F9D1 1F3FC 200D 1F3A4 ; RGI_Emoji_ZWJ_Sequence ; singer: medium-light skin tone # E12.1 [1] (🧑🏼‍🎤)
1F9D1 1F3FC 200D 1F3A8 ; RGI_Emoji_ZWJ_Sequence ; artist: medium-light skin tone # E12.1 [1] (🧑🏼‍🎨)
Expand All @@ -737,7 +737,7 @@
1F9D1 1F3FD 200D 1F33E ; RGI_Emoji_ZWJ_Sequence ; farmer: medium skin tone # E12.1 [1] (🧑🏽‍🌾)
1F9D1 1F3FD 200D 1F373 ; RGI_Emoji_ZWJ_Sequence ; cook: medium skin tone # E12.1 [1] (🧑🏽‍🍳)
1F9D1 1F3FD 200D 1F37C ; RGI_Emoji_ZWJ_Sequence ; person feeding baby: medium skin tone # E13.0 [1] (🧑🏽‍🍼)
1F9D1 1F3FD 200D 1F384 ; RGI_Emoji_ZWJ_Sequence ; Mx claus: medium skin tone # E13.0 [1] (🧑🏽‍🎄)
1F9D1 1F3FD 200D 1F384 ; RGI_Emoji_ZWJ_Sequence ; Mx Claus: medium skin tone # E13.0 [1] (🧑🏽‍🎄)
1F9D1 1F3FD 200D 1F393 ; RGI_Emoji_ZWJ_Sequence ; student: medium skin tone # E12.1 [1] (🧑🏽‍🎓)
1F9D1 1F3FD 200D 1F3A4 ; RGI_Emoji_ZWJ_Sequence ; singer: medium skin tone # E12.1 [1] (🧑🏽‍🎤)
1F9D1 1F3FD 200D 1F3A8 ; RGI_Emoji_ZWJ_Sequence ; artist: medium skin tone # E12.1 [1] (🧑🏽‍🎨)
Expand All @@ -761,7 +761,7 @@
1F9D1 1F3FE 200D 1F33E ; RGI_Emoji_ZWJ_Sequence ; farmer: medium-dark skin tone # E12.1 [1] (🧑🏾‍🌾)
1F9D1 1F3FE 200D 1F373 ; RGI_Emoji_ZWJ_Sequence ; cook: medium-dark skin tone # E12.1 [1] (🧑🏾‍🍳)
1F9D1 1F3FE 200D 1F37C ; RGI_Emoji_ZWJ_Sequence ; person feeding baby: medium-dark skin tone # E13.0 [1] (🧑🏾‍🍼)
1F9D1 1F3FE 200D 1F384 ; RGI_Emoji_ZWJ_Sequence ; Mx claus: medium-dark skin tone # E13.0 [1] (🧑🏾‍🎄)
1F9D1 1F3FE 200D 1F384 ; RGI_Emoji_ZWJ_Sequence ; Mx Claus: medium-dark skin tone # E13.0 [1] (🧑🏾‍🎄)
1F9D1 1F3FE 200D 1F393 ; RGI_Emoji_ZWJ_Sequence ; student: medium-dark skin tone # E12.1 [1] (🧑🏾‍🎓)
1F9D1 1F3FE 200D 1F3A4 ; RGI_Emoji_ZWJ_Sequence ; singer: medium-dark skin tone # E12.1 [1] (🧑🏾‍🎤)
1F9D1 1F3FE 200D 1F3A8 ; RGI_Emoji_ZWJ_Sequence ; artist: medium-dark skin tone # E12.1 [1] (🧑🏾‍🎨)
Expand All @@ -785,7 +785,7 @@
1F9D1 1F3FF 200D 1F33E ; RGI_Emoji_ZWJ_Sequence ; farmer: dark skin tone # E12.1 [1] (🧑🏿‍🌾)
1F9D1 1F3FF 200D 1F373 ; RGI_Emoji_ZWJ_Sequence ; cook: dark skin tone # E12.1 [1] (🧑🏿‍🍳)
1F9D1 1F3FF 200D 1F37C ; RGI_Emoji_ZWJ_Sequence ; person feeding baby: dark skin tone # E13.0 [1] (🧑🏿‍🍼)
1F9D1 1F3FF 200D 1F384 ; RGI_Emoji_ZWJ_Sequence ; Mx claus: dark skin tone # E13.0 [1] (🧑🏿‍🎄)
1F9D1 1F3FF 200D 1F384 ; RGI_Emoji_ZWJ_Sequence ; Mx Claus: dark skin tone # E13.0 [1] (🧑🏿‍🎄)
1F9D1 1F3FF 200D 1F393 ; RGI_Emoji_ZWJ_Sequence ; student: dark skin tone # E12.1 [1] (🧑🏿‍🎓)
1F9D1 1F3FF 200D 1F3A4 ; RGI_Emoji_ZWJ_Sequence ; singer: dark skin tone # E12.1 [1] (🧑🏿‍🎤)
1F9D1 1F3FF 200D 1F3A8 ; RGI_Emoji_ZWJ_Sequence ; artist: dark skin tone # E12.1 [1] (🧑🏿‍🎨)
Expand Down
Loading

0 comments on commit acbbcfb

Please sign in to comment.