Skip to content

Commit

Permalink
Merge remote-tracking branch 'la-vache/main' into 171-C13
Browse files Browse the repository at this point in the history
  • Loading branch information
eggrobin committed Oct 2, 2023
2 parents 851f175 + 4ddfe22 commit fff8aa3
Show file tree
Hide file tree
Showing 279 changed files with 3,294,828 additions and 2,041 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/build-jsp.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
name: Build JSP

env:
CURRENT_UVERSION: 15.1.0
PREVIOUS_UVERSION: 15.0.0 # not used at present
CURRENT_UVERSION: 16.0.0
PREVIOUS_UVERSION: 15.1.0 # not used at present

on:
push:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/cli-build-instructions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ on:
- '*'

env:
CURRENT_UVERSION: 15.1.0
PREVIOUS_UVERSION: 15.0.0
CURRENT_UVERSION: 16.0.0
PREVIOUS_UVERSION: 15.1.0

jobs:

Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -85,3 +85,4 @@ rules.mk

.DS_Store
/output
/cldr
71 changes: 32 additions & 39 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,46 +1,39 @@
UNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWARE
UNICODE LICENSE V3

See Terms of Use for definitions of Unicode Inc.'s
Data Files and Software.
COPYRIGHT AND PERMISSION NOTICE

NOTICE TO USER: Carefully read the following legal agreement.
BY DOWNLOADING, INSTALLING, COPYING OR OTHERWISE USING UNICODE INC.'S
DATA FILES ("DATA FILES"), AND/OR SOFTWARE ("SOFTWARE"),
YOU UNEQUIVOCALLY ACCEPT, AND AGREE TO BE BOUND BY, ALL OF THE
TERMS AND CONDITIONS OF THIS AGREEMENT.
IF YOU DO NOT AGREE, DO NOT DOWNLOAD, INSTALL, COPY, DISTRIBUTE OR USE
THE DATA FILES OR SOFTWARE.
Copyright © 2001-2023 Unicode, Inc.

COPYRIGHT AND PERMISSION NOTICE
NOTICE TO USER: Carefully read the following legal agreement. BY
DOWNLOADING, INSTALLING, COPYING OR OTHERWISE USING DATA FILES, AND/OR
SOFTWARE, YOU UNEQUIVOCALLY ACCEPT, AND AGREE TO BE BOUND BY, ALL OF THE
TERMS AND CONDITIONS OF THIS AGREEMENT. IF YOU DO NOT AGREE, DO NOT
DOWNLOAD, INSTALL, COPY, DISTRIBUTE OR USE THE DATA FILES OR SOFTWARE.

Copyright © 1991-2020 Unicode, Inc. All rights reserved.
Distributed under the Terms of Use in https://www.unicode.org/copyright.html.
Permission is hereby granted, free of charge, to any person obtaining a
copy of data files and any associated documentation (the "Data Files") or
software and any associated documentation (the "Software") to deal in the
Data Files or Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, and/or sell
copies of the Data Files or Software, and to permit persons to whom the
Data Files or Software are furnished to do so, provided that either (a)
this copyright and permission notice appear with all copies of the Data
Files or Software, or (b) this copyright and permission notice appear in
associated Documentation.

Permission is hereby granted, free of charge, to any person obtaining
a copy of the Unicode data files and any associated documentation
(the "Data Files") or Unicode software and any associated documentation
(the "Software") to deal in the Data Files or Software
without restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, and/or sell copies of
the Data Files or Software, and to permit persons to whom the Data Files
or Software are furnished to do so, provided that either
(a) this copyright and permission notice appear with all copies
of the Data Files or Software, or
(b) this copyright and permission notice appear in associated
Documentation.
THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF
THIRD PARTY RIGHTS.

THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF
ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT OF THIRD PARTY RIGHTS.
IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS
NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL
DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THE DATA FILES OR SOFTWARE.
IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE
BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES,
OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THE DATA
FILES OR SOFTWARE.

Except as contained in this notice, the name of a copyright holder
shall not be used in advertising or otherwise to promote the sale,
use or other dealings in these Data Files or Software without prior
written authorization of the copyright holder.
Except as contained in this notice, the name of a copyright holder shall
not be used in advertising or otherwise to promote the sale, use or other
dealings in these Data Files or Software without prior written
authorization of the copyright holder.
10 changes: 4 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,10 @@ use the Unicode Contact Form: https://www.unicode.org/reporting.html
Do not use the GitHub Issues feature in this repo for those.
The tools maintainers use GH issues for issues with the code in this repo.

### Licenses
### Copyright & Licenses

- Data and software is governed by the [Unicode Terms of Use](https://www.unicode.org/copyright.html)
a copy of which is included as [LICENSE](./LICENSE).
Copyright © 2001-2023 Unicode, Inc. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.

### Copyright
The project is released under [LICENSE](./LICENSE).

© 1991 and later: Unicode, Inc. and others.
License & terms of use: <https://www.unicode.org/copyright.html>
A CLA is required to contribute to this project - please refer to the [CONTRIBUTING.md](https://github.com/unicode-org/.github/blob/main/.github/CONTRIBUTING.md) file (or start a Pull Request) for more information.
2 changes: 1 addition & 1 deletion UnicodeJsps/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ RUN ls -lh /build/source/bidiref1 && (/build/source/bidiref1 || true)
ADD ./target/cldr-unicodetools.tgz /build/data/
# move this into place (including unicodetools/unicodetools)
RUN rm -rf /build/data/cldr/.git # unneeded
FROM jetty:9-jre11-slim AS run
FROM jetty:9-jre11-alpine-eclipse-temurin AS run
ADD port-entrypoint.sh /port-entrypoint.sh
ADD ./jetty.d/ROOT /var/lib/jetty/webapps/ROOT/
ENTRYPOINT [ "/port-entrypoint.sh" ]
Expand Down
2 changes: 1 addition & 1 deletion UnicodeJsps/src/main/java/org/unicode/jsp/CachedProps.java
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
import org.unicode.props.UnicodeProperty;

public class CachedProps {
public static final boolean IS_BETA = true;
public static final boolean IS_BETA = false;

public static final Splitter HASH_SPLITTER = Splitter.on('#').trimResults();
public static final Splitter SEMI_SPLITTER = Splitter.on(';').trimResults();
Expand Down
38 changes: 24 additions & 14 deletions UnicodeJsps/src/main/java/org/unicode/jsp/UnicodeUtilities.java
Original file line number Diff line number Diff line change
Expand Up @@ -637,16 +637,7 @@ private void showString(final String string, String separator, Appendable out)
if (UnicodeUtilities.RTL.containsSome(literal)) {
literal = '\u200E' + literal + '\u200E';
}
String name = UnicodeUtilities.getName(string, separator, false);
if (name == null || name.length() == 0) {
name = "<i>no name</i>";
} else {
boolean special = name.indexOf('<') >= 0;
name = UnicodeUtilities.toHTML.transliterate(name);
if (special) {
name = "<i>" + name + "</i>";
}
}
String name = UnicodeUtilities.getName(string, separator, false, false);
literal = UnicodeSetUtilities.addEmojiVariation(literal);
if (doTable) {
out.append(
Expand Down Expand Up @@ -801,7 +792,8 @@ String getPropString(List<UnicodeProperty> props, String codePoints, boolean sho
// }
}

private static String getName(String string, String separator, boolean andCode) {
private static String getName(
String string, String separator, boolean andCode, boolean plainText) {
StringBuilder result = new StringBuilder();
int cp;
for (int i = 0; i < string.length(); i += UTF16.getCharCount(cp)) {
Expand All @@ -812,7 +804,25 @@ private static String getName(String string, String separator, boolean andCode)
if (andCode) {
result.append("U+").append(com.ibm.icu.impl.Utility.hex(cp, 4)).append(' ');
}
result.append(CachedProps.NAMES.getValue(cp));
final String name = CachedProps.NAMES.getValue(cp);
if (name != null) {
result.append(name);
} else {
// TODO(egg): We only have Name_Aliasβ during β, which is silly. This will probably
// solve itself as part of https://github.com/unicode-org/unicodetools/issues/432.
String alias =
getFactory()
.getProperty(CachedProps.IS_BETA ? "Name_Aliasβ" : "Name_Alias")
.getValue(cp);
if (alias == null) {
alias = "no name";
}
if (plainText) {
result.append("(" + alias + ")");
} else {
result.append("<i>" + alias + "</i>");
}
}
}
return result.toString();
}
Expand Down Expand Up @@ -1931,7 +1941,7 @@ private static void showBidiLine(
writer.println("</tr><tr><th>Character</th>");
for (int i = 0; i < str.length(); ++i) {
final String s = str.substring(i, i + 1);
String title = toHTML.transform(getName(s, "", true));
String title = toHTML.transform(getName(s, "", true, true));
writer.println(
"<td class='bccell' title='"
+ title
Expand Down Expand Up @@ -1982,7 +1992,7 @@ private static void showBidiLine(
String title =
bidiChar.length() == 0
? "deleted"
: toHTML.transform(getName(bidiChar, "", true));
: toHTML.transform(getName(bidiChar, "", true, true));
String td = bidiChar.length() == 0 ? "bxcell" : "bccell";
writer.println(
"<td class='"
Expand Down
69 changes: 48 additions & 21 deletions docs/data-workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,17 +30,15 @@ Process:
* Iterate between KenW and Michel.
* Generated from UnicodeData.txt and an annotations file, using some C program.
* Used for generating code charts.
* KenW posts NamesList.txt into https://www.unicode.org/Public/draft/UCD/ucd/ .
* KenW posts NamesList.txt somewhere.
* A unicodetools GitHub contributor fetches this file
and creates a pull request as for “regular” data files.

### Folder readmes

The “source of truth” for these is outside of GitHub for now.
KenW updates or vets these files and posts them to https://www.unicode.org/Public/draft/ .
A unicodetools GitHub contributor fetches these files and creates a pull request as above.

See https://github.com/unicode-org/properties/issues/8 “simplify versioning of readme files”
The various ReadMe.txt files are checked into the unicodetools repo.
They are templatized, and the publication scripts below replace variables with the
Unicode and emoji versions, copyright year, and publication date (date when the script was run).

### “Regular” data files

Expand Down Expand Up @@ -97,6 +95,8 @@ and skip any others that are only for internal use.

For the alpha review, publish (at least) the UCD and emoji files, and the charts.

Review/edit the pub/*.sh scripts and advance the version numbers and copyright years.

Run the [pub/copy-alpha-to-draft.sh](https://github.com/unicode-org/unicodetools/blob/main/pub/copy-alpha-to-draft.sh)
script from an up-to-date repo workspace.
The script copies the set of the .../dev/ data files for an alpha snapshot
Expand All @@ -122,27 +122,54 @@ from a unicodetools workspace to a target folder with the layout of https://www.
Send the resulting zip file to Rick for posting to https://www.unicode.org/Public/draft/ .
Ask Rick to add other files that are not tracked in the unicodetools repo:
* Unihan.zip to .../draft/UCD/ucd
* UCDXML files to .../draft/UCD/ucdxml
* beta charts to .../draft/UCD/charts

### Publish a release
### Publish a release snapshot

TODO: Write a script like /pub/copy-release-to-draft.sh that will be run on the unicode.org server
and copy the set of the .../dev/ data files for a beta snapshot
from a unicodetools workspace to the location behind https://www.unicode.org/Public/draft/ .
After the last UTC meeting for the release, collect all of the data file updates
(mostly from recently opened action items).

When complete, publish the draft files once more via the beta script.
Verify the final set of files in the draft folder.

TODO: Write a script like /pub/copy-draft-to-release.sh that will be run on the unicode.org server
and copy the files from the location behind https://www.unicode.org/Public/draft/
to the locations behind the version-specific release folders.
For example:
* https://www.unicode.org/Public/draft/UCD/https://www.unicode.org/Public/15.1.0/
* https://www.unicode.org/Public/draft/UCA/https://www.unicode.org/Public/UCA/15.1.0/
* https://www.unicode.org/Public/draft/emoji/https://www.unicode.org/Public/emoji/15.1/
* etc.
Run the [pub/copy-final.sh](https://github.com/unicode-org/unicodetools/blob/main/pub/copy-final.sh)
script from an up-to-date repo workspace.

Send the resulting zip file to Rick for posting to https://www.unicode.org/Public/ (not .../Public/draft/).
Ask Rick to add other files that are not tracked in the unicodetools repo:
* Unihan.zip to .../{version}/ucd
* UCDXML files to .../{version}/ucdxml
* final charts to .../{version}/charts

This script works much like the beta script, except it:
* assembles all of the files for Public/ in their release folder structure,
rather than for Public/draft/
* creates a zipped/{version} folder with UCD.zip

### Before a release

After a Unicode release, copy a snapshot of the unicodetools repo .../dev/ files
(matching the released files, of course) to a versioned unicodetools folder;
for example: .../unicodetools/data/ucd/15.1.0/ .
When the data files are supposed to be final, about a week or two before the release:

Verify once more that the unicodetools repo .../dev/ files match the released/published files.

Create a release tag in the repo.
Example, from four days before Unicode 15.1 was released:
https://github.com/unicode-org/unicodetools/releases/tag/final-15.1-20230908

### After a release

Copy a snapshot of the unicodetools repo .../dev/ files to a versioned unicodetools folder;
for example: .../unicodetools/data/ucd/16.0.0/ .
(We no longer append a “-Update” suffix to the folder name.)
List: emoji, idna, security, uca, ucd, ucdxml
Watch for different naming conventions: emoji versions use only two fields, not three.

Edit the pub/*.sh scripts and advance the version numbers.

Change the Unicode Tools code as necessary for the start of work on the next version.
Settings.java lastVersion & latestVersion and more.

Example, Unicode 15.1→16.0: https://github.com/unicode-org/unicodetools/pull/539

Declare “main” to be open for the next version.
64 changes: 64 additions & 0 deletions docs/unicodejsps/gcp-run.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Manually Building and Pushing UnicodeJSPs to Docker / GCP Run

- This page is Under Construction by Steven `@srl295`

- see [index.md](./index.md) for the prior documentation

## maven stuff

- local build

```
mvn -B package -am -pl UnicodeJsps -DskipTests=true
```

- make a copy of CLDR - lots of ways to do this

```
git clone --reference-if-able ~/src/cldr https://github.com/unicode-org/cldr.git
mkdir -p UnicodeJsps/target && tar -cpz --exclude=.git -f UnicodeJsps/target/cldr-unicodetools.tgz ./cldr/ ./unicodetools/
```

## docker stuff

- build it

```
docker build -t unicode/unicode-jsps .
```

- try it

```
docker run --rm -p 8080:8080 unicode/unicode-jsps
```

=> <http://127.0.0.1:8080>


## cloudy stuff

- install gcloud sdk

- `gcloud init`

- login to docker

```
gcloud auth configure-docker \
us-central1-docker.pkg.dev
```

- build docker image

```
docker build -t us-central1-docker.pkg.dev/goog-unicode-dev/unicode-jsps/unicode-jsps:latest .
```

- push docker image

_(takes a while - ~4G to push)_

```
docker push us-central1-docker.pkg.dev/goog-unicode-dev/unicode-jsps/unicode-jsps:latest
```
Loading

0 comments on commit fff8aa3

Please sign in to comment.