Skip to content

Commit

Permalink
Merge remote-tracking branch 'la-vache/main' into modifier-ψ-and-ω
Browse files Browse the repository at this point in the history
  • Loading branch information
eggrobin committed Aug 7, 2024
2 parents f741099 + 5cc9f49 commit ac03c07
Show file tree
Hide file tree
Showing 58 changed files with 16,912 additions and 15,360 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ jobs:
uses: actions/checkout@v3
with:
sparse-checkout: py/pipeline-workflow
- name: Check L2 document
- name: Check L2 document and WG references
run: |
python3 py/pipeline-workflow/check-l2-document.py
utc-decision:
Expand Down
5 changes: 3 additions & 2 deletions UnicodeJsps/src/test/java/org/unicode/jsptest/TestJsp.java
Original file line number Diff line number Diff line change
Expand Up @@ -929,7 +929,7 @@ public void TestIdna() {
checkValues(error, Uts46.SINGLETON);
checkValidIdna(Uts46.SINGLETON, "À。÷");
checkValidIdna(Uts46.SINGLETON, "≠"); // valid since Unicode 15.1
checkInvalidIdna(Uts46.SINGLETON, "\u0001");
checkInvalidIdna(Uts46.SINGLETON, "\u0080");
checkToUnicode(Uts46.SINGLETON, "ß。ab", "ß.ab");
// checkToPunyCode(Uts46.SINGLETON, "\u0002", "xn---");
checkToPunyCode(Uts46.SINGLETON, "ß。ab", "ss.ab");
Expand Down Expand Up @@ -973,7 +973,8 @@ public void TestIdna() {
private void checkValues(boolean[] error, Idna idna) {
checkToUnicodeAndPunyCode(idna, "α.xn--mxa", "α.α", "xn--mxa.xn--mxa");
checkValidIdna(idna, "a");
checkInvalidIdna(idna, "=");
// 33C2 ; disallowed # 1.1 SQUARE AM
checkInvalidIdna(idna, "㏂");
}

private void checkToUnicodeAndPunyCode(
Expand Down
36 changes: 16 additions & 20 deletions docs/help/changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,42 +3,38 @@
The Unicode Utilities have been modified to support both properties from the
released version of Unicode (via ICU) and from the new Unicode beta.

To get the beta version of the property, insert β *after* the property name.
To get the beta version of the property, insert `Uβ:` *before* the property name.
The explicit version number for the β can be used;
the resulting property is then only valid when that specific β is current.
Examples:

| `\p{Word_Break=ALetter}` | Released version of Unicode |
| `\p{Word_Breakβ=ALetter}` | Beta version of Unicode |
| Query | Result |
|---|---|
| `\p{Word_Break=ALetter}` | Released version of Unicode. |
| `\p{Uβ:Word_Break=ALetter}` | Beta version of Unicode; error outside of beta review. |
| `\p{U16β:Word_Break=ALetter}` | Beta version of Unicode 16.0; error during the beta review of any other version. |


For example, to see additions to that property value in the beta version, use:

<center>

[`\p{Word_Breakβ=ALetter}-\\p{Word_Break=ALetter}`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BWord_Break%CE%B2%3DALetter%7D-%5Cp%7BWord_Break%3DALetter%7D&g=&i=)
[`\p{Uβ:Word_Break=ALetter}-\p{Word_Break=ALetter}`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BU%CE%B2%3AWord_Break%3DALetter%7D-%5Cp%7BWord_Break%3DALetter%7D&g=&i=)

</center>


## Caveats

The support is not complete done, and there are some known problems.

1. Some properties are not supported in beta versions. See
<https://util.unicode.org/UnicodeJsps/properties.jsp>
for the list.
2. When characters are listed, the new blocks and subheads don't show up.
3. If you use a property that has a β version but no ICU version, you get no
error: just an empty listing.
4. The beta properties don't yet have the "shorthands" for cases like \\p{Lu}.
So make sure the property is listed, eg \\p{gcβ=Lu}
1. Example:
[`\p{gcβ=Lu}-\\p{gc=Lu}`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7Bgc%CE%B2%3DLu%7D-%5Cp%7Bgc%3DLu%7D&g=&i=)
5. Tools for segmentation, etc. use the release properties; there isn't a way
The support is not completely done, and there are some known problems.

1. The General_Category groupings such as \\p{Uβ:L} are not correctly implemented.
Only actual values, such as \\p{Uβ:Lu} etc., work.
2. Tools for segmentation, etc. use the release properties; there isn't a way
to have them use the beta properties.
6. There are probably others...
3. There are probably others...

If you find a problem, please file a ticket at
<https://cldr.unicode.org/index/bug-reports>: make sure to start the summary with
"Unicode Utilities: "
https://github.com/unicode-org/unicodetools/issues.

[Back to Unicode Utilities Help Home](index)
1 change: 1 addition & 0 deletions docs/pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ PR preparation:
- [ ] If from SAH — Link SAH issue
- [ ] If from ESC or CJK — Mention ESC or CJK in the PR description
- [ ] When for a UTC decision — Cite in the format UTC-\d\d\d-[MC]\d+ or with a link.
- [ ] Link RMG issue
- [ ] Whenever there is a Proposal document — Cite L2 number in the format L2/yy-nnn
- [ ] data-for-new — Set label
- [ ] pipeline-* — Set label to **pipeline-recommended-to-UTC** if the characters are not yet in the pipeline, and **pipeline-provisionally-assigned**, or **pipeline-`<version>`** depending on their status in [the Pipeline](https://unicode.org/alloc/Pipeline.html#future).
Expand Down
5 changes: 5 additions & 0 deletions py/pipeline-workflow/check-l2-document.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,9 @@
"PRs for character additions must include a link to the SAH issue, or "
"the mention ESC or CJK.")
errors += 1
if not re.search(r"(unicode-org/utc-release-management(#|/issues/)\d)", pr_body):
print("::error title=Need RMG reference::"
"PRs for character additions must include a link to the corresponding "
"RMG issue.")
errors += 1
exit(errors)
Loading

0 comments on commit ac03c07

Please sign in to comment.