Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow: Publish Data #980

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
59 changes: 59 additions & 0 deletions .github/workflows/publish-data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# See publish-data.yml

TMP=pub/tmp
mkdir $TMP

COPY_YEAR=$(date -u +%Y)
PUB_DATE=$(date -u +%Y-%m-%d)

cat > $TMP/sed-readmes.txt << eof
s/COPY_YEAR/$COPY_YEAR/
s/PUB_DATE/$PUB_DATE/
s/PUB_STATUS/draft/
s/UNI_VER/$UNI_VER/
s/EMOJI_VER/$EMOJI_VER/
s/TR10_REV/$TR10_REV/
s%PUBLIC_EMOJI%Public/draft/emoji%
s%PUBLIC_UCD%Public/draft/UCD%
eof

mkdir $TMP/UCD
cp -R unicodetools/data/ucd/dev $TMP/UCD/ucd
mv $TMP/UCD/ucd/version-ReadMe.txt $TMP/UCD/ReadMe.txt
rm -r $TMP/UCD/ucd/Unihan

if [ "$RELEASE_PHASE" = "Dev" ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would normally go with bash conditional expressions (with [[ ]]) rather than the test command (aka [).

@markusicu, I am unsure about our bash style here, which one do we prefer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn’t familiar with the difference between [ and [[, but now after some reading I do prefer [[ as it offers an experience similar to modern languages. Made the change.

rm -r $TMP/UCD/ucd/emoji
fi

if [ "$RELEASE_PHASE" = "Alpha" ] || [ "$RELEASE_PHASE" = "Beta" ]; then
cp -R unicodetools/data/emoji/dev $TMP/emoji

cp -R unicodetools/data/idna/dev $TMP/idna

mkdir $TMP/idna2008derived
cp unicodetools/data/idna/idna2008derived/ReadMe.txt $TMP/idna2008derived
cp unicodetools/data/idna/idna2008derived/Idna2008-$UNI_VER.txt $TMP/idna2008derived
fi

if [ "$RELEASE_PHASE" = "Beta" ]; then
cp -R unicodetools/data/uca/dev $TMP/UCA
sed -i -f $TMP/sed-readmes.txt $TMP/UCA/CollationTest.html

cp -R unicodetools/data/security/dev $TMP/security
fi

# Update the readmes in-place (-i) as set up above.
find $TMP -name '*ReadMe.txt' | xargs sed -i -f $TMP/sed-readmes.txt
rm $TMP/sed-readmes.txt

mkdir $TMP/zipped
mv $TMP/UCD/ucd/zipped-ReadMe.txt $TMP/zipped/ReadMe.txt
(cd $TMP/UCD/ucd; zip -r UCD.zip *)
mv $TMP/UCD/ucd/UCD.zip $TMP/zipped

if [ "$RELEASE_PHASE" = "Beta" ]; then
(cd $TMP/UCA; zip -r CollationTest.zip CollationTest; rm -r CollationTest)

(cd $TMP/security; zip -r uts39-data-$UNI_VER.zip *)
fi
43 changes: 43 additions & 0 deletions .github/workflows/publish-data.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# See https://github.com/unicode-org/unicodetools/blob/main/docs/data-workflow.md#publication

# Test locally with https://github.com/nektos/act:
# act --workflows .github/workflows/publish-data.yml --input releasePhase=Alpha

name: Publish Data

run-name: "${{ github.workflow }}: ${{ inputs.releasePhase }}"

on:
workflow_dispatch:
inputs:
releasePhase: # See ReleasePhase in https://github.com/unicode-org/unicodetools/blob/main/unicodetools/src/main/java/org/unicode/text/utility/Settings.java
description: Release phase
type: choice
options:
- Dev
- Alpha
- Beta
default: Dev

env:
UNI_VER: "17.0.0"
EMOJI_VER: "17.0"
TR10_REV: "tr10-52" # UTS #10 release revision number to be used in CollationTest.html: One more than the last release revision number.
RELEASE_PHASE: ${{ inputs.releasePhase }}

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
sparse-checkout: |
.github/workflows
unicodetools/data/ucd/dev
unicodetools/data/emoji/dev
unicodetools/data/idna/dev
unicodetools/data/idna/idna2008derived
- run: .github/workflows/publish-data.sh
- uses: actions/upload-artifact@v4
with:
path: pub/tmp # See TMP in publish-data.sh
21 changes: 11 additions & 10 deletions docs/data-workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@ https://github.com/unicode-org/unicodetools/tree/main/unicodetools/data/emoji/de

## Publication

> An experimental GitHub workflow, [publish-data.yml](/.github/workflows/publish-data.yml),
can create a dev (UCD), alpha, or beta snapshot.

Certain snapshots of the .../dev/ files are copied into https://www.unicode.org/Public/draft/
for Unicode alpha, beta, and final releases, and more as appropriate.
* UCD files go into https://www.unicode.org/Public/draft/UCD/
Expand All @@ -104,12 +107,10 @@ script from an up-to-date repo workspace.
The script copies the set of the .../dev/ data files for an alpha snapshot
from a unicodetools workspace to a target folder with the layout of https://www.unicode.org/Public/draft/ .

Send the resulting zip file to Rick for posting to https://www.unicode.org/Public/draft/ .
Ask Rick to add other files that are not tracked in the unicodetools repo:
Send the resulting zip file to Gregg for posting to https://www.unicode.org/Public/draft/ .
Ask Gregg to add other files that are not tracked in the unicodetools repo:
* Unihan.zip to .../draft/UCD/ucd

TODO: Figure out new process & people replacing Rick in 2025.

Note: No version/delta infixes in names of data files.
We simply use the “draft” folder and the file-internal time stamps for versioning.

Expand All @@ -124,8 +125,8 @@ script from an up-to-date repo workspace.
The script copies the set of the .../dev/ data files for an alpha snapshot
from a unicodetools workspace to a target folder with the layout of https://www.unicode.org/Public/draft/ .

Send the resulting zip file to Rick for posting to https://www.unicode.org/Public/draft/ .
Ask Rick to add other files that are not tracked in the unicodetools repo:
Send the resulting zip file to Gregg for posting to https://www.unicode.org/Public/draft/ .
Ask Gregg to add other files that are not tracked in the unicodetools repo:
* Unihan.zip to .../draft/UCD/ucd
* alpha charts to .../draft/UCD/charts

Expand All @@ -141,8 +142,8 @@ script from an up-to-date repo workspace.
The script copies the set of the .../dev/ data files for a beta snapshot
from a unicodetools workspace to a target folder with the layout of https://www.unicode.org/Public/draft/ .

Send the resulting zip file to Rick for posting to https://www.unicode.org/Public/draft/ .
Ask Rick to add other files that are not tracked in the unicodetools repo:
Send the resulting zip file to Gregg for posting to https://www.unicode.org/Public/draft/ .
Ask Gregg to add other files that are not tracked in the unicodetools repo:
* Unihan.zip to .../draft/UCD/ucd
* UCDXML files to .../draft/UCD/ucdxml
* beta charts to .../draft/UCD/charts
Expand All @@ -158,8 +159,8 @@ Verify the final set of files in the draft folder.
Run the [pub/copy-final.sh](https://github.com/unicode-org/unicodetools/blob/main/pub/copy-final.sh)
script from an up-to-date repo workspace.

Send the resulting zip file to Rick for posting to https://www.unicode.org/Public/ (not .../Public/draft/).
Ask Rick to add other files that are not tracked in the unicodetools repo:
Send the resulting zip file to Gregg for posting to https://www.unicode.org/Public/ (not .../Public/draft/).
Ask Gregg to add other files that are not tracked in the unicodetools repo:
* Unihan.zip to .../{version}/ucd
* UCDXML files to .../{version}/ucdxml
* final charts to .../{version}/charts
Expand Down
Loading