Skip to content

Commit

Permalink
Merge remote-tracking branch 'la-vache/main' into pipeline-gap-176-C34
Browse files Browse the repository at this point in the history
  • Loading branch information
eggrobin committed Oct 16, 2024
2 parents e676337 + 69a376c commit 47b058e
Show file tree
Hide file tree
Showing 353 changed files with 3,636,481 additions and 17,952 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/build-jsp.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
name: Build JSP

env:
CURRENT_UVERSION: 16.0.0
PREVIOUS_UVERSION: 15.1.0 # not used at present
CURRENT_UVERSION: 17.0.0 # FIX_FOR_NEW_VERSION
PREVIOUS_UVERSION: 16.0.0 # not used at present

on:
push:
Expand Down Expand Up @@ -69,7 +69,7 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Upload UnicodeJsps.war
uses: actions/upload-artifact@v2
uses: actions/upload-artifact@v3
with:
name: UnicodeJsps
path: UnicodeJsps/target/UnicodeJsps.war
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/cli-build-instructions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ on:
- '*'

env:
CURRENT_UVERSION: 16.0.0
PREVIOUS_UVERSION: 15.1.0
CURRENT_UVERSION: 17.0.0 # FIX_FOR_NEW_VERSION
PREVIOUS_UVERSION: 16.0.0

jobs:

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ jobs:
uses: actions/checkout@v3
with:
sparse-checkout: py/pipeline-workflow
- name: Check L2 document
- name: Check L2 document and WG references
run: |
python3 py/pipeline-workflow/check-l2-document.py
utc-decision:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/push-jsp-on-tag.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Upload UnicodeJsps.war
uses: actions/upload-artifact@v2
uses: actions/upload-artifact@v3
with:
name: UnicodeJsps
path: UnicodeJsps/target/UnicodeJsps.war
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pythonpackage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.8]
python-version: [3.12]

steps:
- uses: actions/checkout@v3
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,6 @@ The tools maintainers use GH issues for issues with the code in this repo.

Copyright © 2001-2024 Unicode, Inc. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.

The project is released under [LICENSE](./LICENSE).

A CLA is required to contribute to this project - please refer to the [CONTRIBUTING.md](https://github.com/unicode-org/.github/blob/main/.github/CONTRIBUTING.md) file (or start a Pull Request) for more information.

The contents of this repository are governed by the Unicode [Terms of Use](https://www.unicode.org/copyright.html) and are released under [LICENSE](./LICENSE).
5 changes: 3 additions & 2 deletions UnicodeJsps/src/test/java/org/unicode/jsptest/TestJsp.java
Original file line number Diff line number Diff line change
Expand Up @@ -929,7 +929,7 @@ public void TestIdna() {
checkValues(error, Uts46.SINGLETON);
checkValidIdna(Uts46.SINGLETON, "À。÷");
checkValidIdna(Uts46.SINGLETON, "≠"); // valid since Unicode 15.1
checkInvalidIdna(Uts46.SINGLETON, "\u0001");
checkInvalidIdna(Uts46.SINGLETON, "\u0080");
checkToUnicode(Uts46.SINGLETON, "ß。ab", "ß.ab");
// checkToPunyCode(Uts46.SINGLETON, "\u0002", "xn---");
checkToPunyCode(Uts46.SINGLETON, "ß。ab", "ss.ab");
Expand Down Expand Up @@ -973,7 +973,8 @@ public void TestIdna() {
private void checkValues(boolean[] error, Idna idna) {
checkToUnicodeAndPunyCode(idna, "α.xn--mxa", "α.α", "xn--mxa.xn--mxa");
checkValidIdna(idna, "a");
checkInvalidIdna(idna, "=");
// 33C2 ; disallowed # 1.1 SQUARE AM
checkInvalidIdna(idna, "㏂");
}

private void checkToUnicodeAndPunyCode(
Expand Down
9 changes: 0 additions & 9 deletions docs/emoji/aac.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,6 @@
Once the emoji are finalized for new version of TR51, or there is a new version
of CLDR, run AacOrder.java to generate 3 new files which will be checked in.

Fix the versions at the top of the file, such as:
```
private static final VersionInfo VERSION = Emoji.VERSION12;
private static final VersionInfo UCD_VERSION = Emoji.VERSION12;
```

The emoji version will be ≥ the UCD version.

**Results:**

:construction: **TODO**: Work with Mark on working replacements for "draft" URLs.
Expand Down
36 changes: 16 additions & 20 deletions docs/help/changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,42 +3,38 @@
The Unicode Utilities have been modified to support both properties from the
released version of Unicode (via ICU) and from the new Unicode beta.

To get the beta version of the property, insert β *after* the property name.
To get the beta version of the property, insert `Uβ:` *before* the property name.
The explicit version number for the β can be used;
the resulting property is then only valid when that specific β is current.
Examples:

| `\p{Word_Break=ALetter}` | Released version of Unicode |
| `\p{Word_Breakβ=ALetter}` | Beta version of Unicode |
| Query | Result |
|---|---|
| `\p{Word_Break=ALetter}` | Released version of Unicode. |
| `\p{Uβ:Word_Break=ALetter}` | Beta version of Unicode; error outside of beta review. |
| `\p{U16β:Word_Break=ALetter}` | Beta version of Unicode 16.0; error during the beta review of any other version. |


For example, to see additions to that property value in the beta version, use:

<center>

[`\p{Word_Breakβ=ALetter}-\\p{Word_Break=ALetter}`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BWord_Break%CE%B2%3DALetter%7D-%5Cp%7BWord_Break%3DALetter%7D&g=&i=)
[`\p{Uβ:Word_Break=ALetter}-\p{Word_Break=ALetter}`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BU%CE%B2%3AWord_Break%3DALetter%7D-%5Cp%7BWord_Break%3DALetter%7D&g=&i=)

</center>


## Caveats

The support is not complete done, and there are some known problems.

1. Some properties are not supported in beta versions. See
<https://util.unicode.org/UnicodeJsps/properties.jsp>
for the list.
2. When characters are listed, the new blocks and subheads don't show up.
3. If you use a property that has a β version but no ICU version, you get no
error: just an empty listing.
4. The beta properties don't yet have the "shorthands" for cases like \\p{Lu}.
So make sure the property is listed, eg \\p{gcβ=Lu}
1. Example:
[`\p{gcβ=Lu}-\\p{gc=Lu}`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7Bgc%CE%B2%3DLu%7D-%5Cp%7Bgc%3DLu%7D&g=&i=)
5. Tools for segmentation, etc. use the release properties; there isn't a way
The support is not completely done, and there are some known problems.

1. The General_Category groupings such as \\p{Uβ:L} are not correctly implemented.
Only actual values, such as \\p{Uβ:Lu} etc., work.
2. Tools for segmentation, etc. use the release properties; there isn't a way
to have them use the beta properties.
6. There are probably others...
3. There are probably others...

If you find a problem, please file a ticket at
<https://cldr.unicode.org/index/bug-reports>: make sure to start the summary with
"Unicode Utilities: "
https://github.com/unicode-org/unicodetools/issues.

[Back to Unicode Utilities Help Home](index)
18 changes: 11 additions & 7 deletions docs/pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ PR preparation:
- [ ] If from SAH — Link SAH issue
- [ ] If from ESC or CJK — Mention ESC or CJK in the PR description
- [ ] When for a UTC decision — Cite in the format UTC-\d\d\d-[MC]\d+ or with a link.
- [ ] Link RMG issue
- [ ] Whenever there is a Proposal document — Cite L2 number in the format L2/yy-nnn
- [ ] data-for-new — Set label
- [ ] pipeline-* — Set label to **pipeline-recommended-to-UTC** if the characters are not yet in the pipeline, and **pipeline-provisionally-assigned**, or **pipeline-`<version>`** depending on their status in [the Pipeline](https://unicode.org/alloc/Pipeline.html#future).
Expand Down Expand Up @@ -113,7 +114,7 @@ git checkout la-vache/main unicodetools/data/ucd/dev/extracted/*;
git checkout la-vache/main unicodetools/data/ucd/dev/auxiliary/*;
rm .\Generated\* -recurse -force;
mvn compile exec:java '-Dexec.mainClass="org.unicode.text.UCD.Main"' '-Dexec.args="build MakeUnicodeFiles"' -am -pl unicodetools "-DCLDR_DIR=..\cldr\" "-DUNICODETOOLS_GEN_DIR=Generated" "-DUNICODETOOLS_REPO_DIR=.";
cp .\Generated\UCD\16.0.0\* .\unicodetools\data\ucd\dev -recurse -force;
cp .\Generated\UCD\17.0.0\* .\unicodetools\data\ucd\dev -recurse -force;
rm unicodetools\data\ucd\dev\zzz-unchanged-*;
rm unicodetools\data\ucd\dev\*\zzz-unchanged-*;
rm .\unicodetools\data\ucd\dev\extra\*;
Expand All @@ -123,19 +124,20 @@ git merge --continue
```

markusicu (Linux, out-of-source; main tracks unicode-org/main)
<!--FIX_FOR_NEW_VERSION-->
```sh
git merge main
# complains about merge conflicts as expected
git checkout main unicodetools/data/ucd/dev/Derived*
git checkout main unicodetools/data/ucd/dev/extracted/*
git checkout main unicodetools/data/ucd/dev/auxiliary/*
rm -r ../Generated/BIN/16.0.0.0/
rm -r ../Generated/BIN/UCD_Data16.0.0.bin
mvn -s ~/.m2/settings.xml compile exec:java -Dexec.mainClass="org.unicode.text.UCD.Main" -Dexec.args="version 16.0.0 build MakeUnicodeFiles" -am -pl unicodetools -DCLDR_DIR=$(cd ../../../cldr/mine/src ; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated ; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd) -DUVERSION=16.0.0
rm -r ../Generated/BIN/17.0.0.0/
rm -r ../Generated/BIN/UCD_Data17.0.0.bin
mvn -s ~/.m2/settings.xml compile exec:java -Dexec.mainClass="org.unicode.text.UCD.Main" -Dexec.args="version 17.0.0 build MakeUnicodeFiles" -am -pl unicodetools -DCLDR_DIR=$(cd ../../../cldr/mine/src ; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated ; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd) -DUVERSION=17.0.0
# fix merge conflicts in unicodetools/src/main/java/org/unicode/text/UCD/UCD_Types.java
# and in UCD_Names.java
# rerun mvn
cp -r ../Generated/UCD/16.0.0/* unicodetools/data/ucd/dev
cp -r ../Generated/UCD/17.0.0/* unicodetools/data/ucd/dev
rm unicodetools/data/ucd/dev/ZZZ-UNCHANGED-*
rm unicodetools/data/ucd/dev/*/ZZZ-UNCHANGED-*
rm unicodetools/data/ucd/dev/extra/*
Expand All @@ -156,10 +158,11 @@ Cf. https://github.com/unicode-org/unicodetools/pull/636
### Regenerate UCD

eggrobin (Windows, in-source).
<!--FIX_FOR_NEW_VERSION-->
```powershell
rm .\Generated\* -recurse -force
mvn compile exec:java '-Dexec.mainClass="org.unicode.text.UCD.Main"' '-Dexec.args="build MakeUnicodeFiles"' -am -pl unicodetools "-DCLDR_DIR=..\cldr\" "-DUNICODETOOLS_GEN_DIR=Generated" "-DUNICODETOOLS_REPO_DIR=."
cp .\Generated\UCD\16.0.0\* .\unicodetools\data\ucd\dev -recurse -force
cp .\Generated\UCD\17.0.0\* .\unicodetools\data\ucd\dev -recurse -force
rm unicodetools\data\ucd\dev\zzz-unchanged-*
rm unicodetools\data\ucd\dev\*\zzz-unchanged-*
rm .\unicodetools\data\ucd\dev\extra\*
Expand All @@ -171,10 +174,11 @@ git commit -m "Regenerate UCD"
### Regenerate LineBreak

eggrobin (Windows, in-source).
<!--FIX_FOR_NEW_VERSION-->
```powershell
rm .\Generated\* -recurse -force
mvn compile exec:java '-Dexec.mainClass="org.unicode.text.UCD.Main"' '-Dexec.args="build MakeUnicodeFiles"' -am -pl unicodetools "-DCLDR_DIR=..\cldr\" "-DUNICODETOOLS_GEN_DIR=Generated" "-DUNICODETOOLS_REPO_DIR=."
cp .\Generated\UCD\16.0.0\LineBreak.txt .\unicodetools\data\ucd\dev
cp .\Generated\UCD\17.0.0\LineBreak.txt .\unicodetools\data\ucd\dev
```

### GenerateEnums
Expand Down
4 changes: 2 additions & 2 deletions pub/copy-alpha-to-draft.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ UNITOOLS_DATA=$UNICODETOOLS/unicodetools/data

# Adjust the following for each year and version as needed.
COPY_YEAR=2024
UNI_VER=16.0.0
EMOJI_VER=16.0
UNI_VER=17.0.0
EMOJI_VER=17.0

TODAY=`date --iso-8601`

Expand Down
8 changes: 4 additions & 4 deletions pub/copy-beta-to-draft.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ UNITOOLS_DATA=$UNICODETOOLS/unicodetools/data

# Adjust the following for each year and version as needed.
COPY_YEAR=2024
UNI_VER=16.0.0
EMOJI_VER=16.0
UNI_VER=17.0.0
EMOJI_VER=17.0
# UTS #10 release revision number to be used in CollationTest.html:
# One more than the last release revision number.
TR10_REV=tr10-50
TR10_REV=tr10-52

TODAY=`date --iso-8601`

Expand All @@ -42,7 +42,7 @@ mv $DRAFT/UCD/ucd/zipped-ReadMe.txt $DRAFT/zipped/ReadMe.txt

mkdir -p $DRAFT/UCA
cp -r $UNITOOLS_DATA/uca/dev/* $DRAFT/UCA
sed -i -f $DEST/sed-readmes.txt $DRAFT/UCA/CollationTest.html
sed -i -f $DRAFT/sed-readmes.txt $DRAFT/UCA/CollationTest.html

mkdir -p $DRAFT/emoji
cp $UNITOOLS_DATA/emoji/dev/* $DRAFT/emoji
Expand Down
6 changes: 3 additions & 3 deletions pub/copy-final.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ UNITOOLS_DATA=$UNICODETOOLS/unicodetools/data

# Adjust the following for each year and version as needed.
COPY_YEAR=2024
UNI_VER=16.0.0
EMOJI_VER=16.0
UNI_VER=17.0.0
EMOJI_VER=17.0
# UTS #10 release revision number to be used in CollationTest.html:
# *Two* more than the last release revision number.
TR10_REV=tr10-51
TR10_REV=tr10-53

TODAY=`date --iso-8601`

Expand Down
5 changes: 5 additions & 0 deletions py/pipeline-workflow/check-l2-document.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,9 @@
"PRs for character additions must include a link to the SAH issue, or "
"the mention ESC or CJK.")
errors += 1
if not re.search(r"(unicode-org/utc-release-management(#|/issues/)\d)", pr_body):
print("::error title=Need RMG reference::"
"PRs for character additions must include a link to the corresponding "
"RMG issue.")
errors += 1
exit(errors)
21 changes: 21 additions & 0 deletions unicodetools/data/emoji/16.0/ReadMe.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Unicode Emoji
# © COPY_YEAR Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html

This directory contains PUB_STATUS data files for Unicode Emoji, Version EMOJI_VER

PUBLIC_EMOJI/

emoji-sequences.txt
emoji-zwj-sequences.txt
emoji-test.txt

The following related files are found in the UCD for Version EMOJI_VER

PUBLIC_UCD/ucd/emoji/

emoji-data.txt
emoji-variation-sequences.txt

For documentation, see UTS #51 Unicode Emoji, Version EMOJI_VER
Loading

0 comments on commit 47b058e

Please sign in to comment.