Skip to content

Commit

Permalink
CLDR-17566 initial text and md files
Browse files Browse the repository at this point in the history
  • Loading branch information
chpy04 committed Sep 2, 2024
1 parent eb4b003 commit 40f9f1a
Show file tree
Hide file tree
Showing 9 changed files with 403 additions and 0 deletions.
46 changes: 46 additions & 0 deletions docs/site/TEMP-TEXT-FILES/creating-the-archive.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
Checking out the CLDR Archive
A number of the tools in CLDR depend on access to older versions. These tools include:
Generating Charts
Update Validity XML
Updating English/Root
[Note: add others when we find them]
Some tests
TestCompatibility.java
TestTransforms.java
TestValidity.java
Some other tools (typically when given a version argument on the command line)
FindPluralDifferences
...
Here's how to do that.
Create an archive directory cldr-archive. The Simplest is if it on the same level as your local CLDR repository. In other words, if your CLDR_DIR is .../workspace/cldr, then create the directory  …/workspace/cldr-archive
(Note: The Java property ARCHIVE can be used to overide the path to cldr-archive).
Open up ToolConstants.java and look at ToolConstants.CLDR_VERSIONS. You'll see something like:
public static final List<String> CLDR_VERSIONS = ImmutableList.of(
"1.1.1",
"1.2",
"1.3",
"1.4.1",
"1.5.1",
"1.6.1",
"1.7.2",
"1.8.1",
...
"41.0"
// add to this once the release is final!
);
NOTE: this should also match CldrVersion.java (those two need to be merged together)
Add the just-released version, such as "42.0" to the list  above
Also update DEV_VERSION to "43" (the next development version)
Finally, update CldrVersion.java and make similar changes.
Now, run the tool org.unicode.cldr.tool.CheckoutArchive
Or from the command line:
mvn -DCLDR_DIR=path_to/cldr --file=tools/pom.xml -pl cldr-code compile -DskipTests=true exec:java -Dexec.mainClass=org.unicode.cldr.tool.CheckoutArchive  -Dexec.args=""
Note other options for this tool:
--help will give help
--prune will run a 'git workspace prune' before proceeding
--echo will just show the commands that would be run, without running anything
(For example,  -Dexec.args="--prune" in the above command line)
The end result (where you need all of the releases) looks something like the following:
Advanced Configuration
You can set the property  -DCLDR_ARCHIVE to point to a different parent directory for the archive
You can set -DCLDR_HAS_ARCHIVE=false to tell unit tests and tools not to look for the archive
22 changes: 22 additions & 0 deletions docs/site/TEMP-TEXT-FILES/documenting-cldr-tools.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Documenting CLDR Tools
Developers: Make sure your tool is easily accessible from the command line.
You can add the @CLDRTool annotation to any class in cldr-code that has a main() function, and it will be documented as part of the JAR cldr-code.jar is used.
See CLDR Tools for general information about obtaining and using CLDR tools.
Coding it
An example from ConsoleCheckCLDR.java will start us out here
@CLDRTool(alias = "check",
description = "Run CheckCLDR against CLDR data")
public class ConsoleCheckCLDR { …
Then, calling java -jar cldr-tools.jar -l produces:
check - Run CheckCLDR against CLDR data
<http://cldr.unicode.org/tools/check>
= org.unicode.cldr.test.ConsoleCheckCLDR
And then java -jar cldr-tools.jar check can be used to run this tool. All additional arguments after "check" are passed to ConsoleCheckCLDR.main() as arguments.
Note these annotation parameters. Only "alias" is required.
alias - used from the command line instead of the full class name. Also forms part of the default URL for documentation.
description - a short description of the tool.
Additional parameters:
url - you can specify a custom URL for the tool. This is displayed with the listing.
hidden - if non-empty, this specifies a reason to not show the tool when running "java -jar" without "-l". For example, the main() function may be a less-useful internal tool, or a test.
Documenting it
Assuming your tools’s alias is myalias, create a new subpage with the URL http://cldr.unicode.org/tools/myalias (a subpage of CLDR Tools). Fill this page out with information about how to use your tool.
29 changes: 29 additions & 0 deletions docs/site/TEMP-TEXT-FILES/running-tests.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
Running Tests
You will always need to run tests when you do a check-in.
Preconditions
If you change the DTD, be sure to read and follow Updating DTDs first.
If you added a new feature or fixed a significant bug, add a unit test for it.
See unittest/NumberingSystemsTest as an example.
Remember to add to unittest/TestAll
Run TestAll -e
These are the unit tests in exhaustive mode
If you are doing something you know to be simple, you could do the shorter run of just TestAll
Run ConsoleCheckCLDR -e -z final_testing -S common,seed
This runs the same set of test that the Survey Tool does.
If you know what you are doing, you can run a set of filtered tests.
Other tests
The unit tests are not complete, so you get a better workout if you are doing anything fancy by running:
NewLdml2IcuConverter
Generating Charts
If you have interesting new data, write a chart for it. See subclasses of Chart.java for examples.
Running tests on the command line
$ export CLDR_DIR=/path/to/svn/root/for/cldr
$ cd $CLDR_DIR/tools/java && ant all
$ cd $CLDR_DIR/tools/cldr-unittest && ant unittestExhaustive datacheck
[TODO: add more commands here; can't we automate all this into a single build rule for ant?] TODO: TODOL ticket:8864
Debugging
[TODO: add more tips here]
Regexes
We use a lot of regexes!
There is org.unicode.cldr.util.RegexUtilities.showMismatch (and related methods) that are really useful in debugging cases where regexes fail. You hand it a pattern or matcher and a string, and it shows how far the regex got before it failed.
To debug RegexLookup, there is a special call you can make where you pass in a set. On return, that set is filled with a set of strings showing how far each of the regex patterns progressed. You can thus see why a string didn't match as expected.
53 changes: 53 additions & 0 deletions docs/site/TEMP-TEXT-FILES/updating-englishroot.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
Updating English/Root
Whenever you update English or Root, there is one additional step that needs to be done for the vetting viewer and tests to work properly.
Update CldrVersion.java to have the newest release in the list.
Run GenerateBirth
The tool is in tools/java/org/unicode/cldr/tool/GenerateBirth.java. It requires a set of sources from all previous major CLDR release, trunk, and a generation directory. These three directories must be structured as follows. The tool takes environment parameters for the second two.
cldr (set with -t <target>, default=CldrUtility.BASE_DIRECTORY, set with environment variable -DCLDR_DIR)
... common/ ... tools/ java/ (apps such as GenerateBirth are run from here) ...
CldrUtility.ARCHIVE_DIRECTORY
Create the archive (Creating the Archive) with all releases (if you don't have it already)
The archive directory should have the latest version of every major and minor version (where versions before 21.0 have the major version split across the top two fields).
You will probably need to modify both CldrVersion.java and ToolConstants.java to bring them up to date.
log (set with -l <log>, default=CldrUtility.UTIL_DATA_DIR, set with CLDR_DIR
Pass an argument for -t to specify the output directory. Takes a few minutes to run (and make sure you have set Java with enough memory)!
The tool generates (among other things) the following two binary files (among others) in the output directory specified with -t:
outdated.data
outdatedEnglish.data
Replacing the previous versions in /cldr/tools/java/org/unicode/cldr/util/data/births/. These files are used to support OutdatedPaths.java, which is used in CheckNew.
Readable data is found in https://github.com/unicode-org/cldr-staging/tree/master/births/* That should also be checked in, for comparison over time. Easiest to read if you paste into a spreadsheet!
Binary File Format
outdatedEnglish.data
outdated.data
int:size
long:pathId str:oldValue
long:pathId str:oldValue
...
str:locale
int:size
long:pathId
long:pathId
...
str:locale
int:size
long:pathId
long:pathId
$END$
~50KB
$END$
~100KB
In a limited release, the file SubmissionLocales.java is set up to allow just certain locales and paths in those locales.
Testing
Make sure TestOutdatedPaths.java passes. It may take some modifications, since it depends on the exact data.
Run TestCheckCLDR and TestBasic with the option -prop:logKnownIssue=false (that option is important!). This checks that the Limited Submission is set up properly and that SubmissionLocales are correct.
If you run into any problems, look below at debugging.
Check in the files
Eg https://github.com/unicode-org/cldr/pull/243
Debugging
It also generates readable log files for double checking. These will be in {workspace}/cldr-aux/births/<version>/, that is: CLDRPaths.AUX_DIRECTORY + "births/" + trunkVersion. Examples: https://unicode.org/repos/cldr-aux/births/35.0/en.txt, https://unicode.org/repos/cldr-aux/births/35.0/fr.txt.
Their format is the following (TSV = tab-delimited-values) — to view, it is probably easier to copy the files into a spreadsheet.
English doesn't have the E... values, but is a complete record.
Other languages only have lines where the English value is more recently changed (younger) than the native’s.
So what the first line below says is that French has "bengali" dating back to version 1.1.1, while English has "Bangla" dating back to version 30.
A value of � indicates that there is no value for that version.
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
title: Updating English/Root
---

# Updating English/Root

Whenever you update English or Root, there is one additional step that needs to be done for the vetting viewer and tests to work properly.

Update CldrVersion.java to have the newest release in the list.

## Run GenerateBirth

The tool is in tools/java/org/unicode/cldr/tool/GenerateBirth.java. It requires a set of sources from all previous major CLDR release, trunk, and a generation directory. These three directories must be structured as follows. The tool takes environment parameters for the second two.

**cldr (set with \-t \<target\>, default\=CldrUtility.BASE\_DIRECTORY, set with environment variable \-DCLDR\_DIR)**

... common/ ... tools/ java/ (apps such as GenerateBirth are run from here) ...

**CldrUtility.ARCHIVE\_DIRECTORY**

1. Create the archive ([Creating the Archive](https://cldr.unicode.org/development/creating-the-archive)) with all releases (if you don't have it already)
2. The archive directory should have the latest version of every major and minor version (where versions before 21\.0 have the major version split across the top two fields).
3. You will probably need to modify both CldrVersion.java and ToolConstants.java to bring them up to date.

**log (set with \-l \<log\>, default\=CldrUtility.UTIL\_DATA\_DIR, set with CLDR\_DIR**

Pass an argument for \-t to specify the output directory. Takes a few minutes to run (and make sure you have set Java with enough memory)!

The tool generates (among other things) the following two binary files (among others) in the output directory specified with \-t:

- **outdated.data**
- **outdatedEnglish.data**

Replacing the previous versions in /cldr/tools/java/org/unicode/cldr/util/data/births/. These files are used to support OutdatedPaths.java, which is used in CheckNew.

Readable data is found in https://github.com/unicode\-org/cldr\-staging/tree/master/births/\* That should also be checked in, for comparison over time. Easiest to read if you paste into a spreadsheet!

## Binary File Format

| outdatedEnglish.data | outdated.data |
|---|---|
| **int:size** | **str:locale** |
| long:pathId str:oldValue | **int:size** |
| long:pathId str:oldValue | long:pathId |
| ... | long:pathId |
| | ... |
| | **str:locale** |
| | **int:size** |
| | long:pathId |
| | long:pathId |
| | ... |
| **\$END\$** | **\$END\$** |
| ~50KB | ~100KB |

In a limited release, the file **SubmissionLocales.java** is set up to allow just certain locales and paths in those locales.

## Testing

Make sure TestOutdatedPaths.java passes. It may take some modifications, since it depends on the exact data.

Run TestCheckCLDR and TestBasic with the option **\-prop:logKnownIssue\=false** (that option is important!). This checks that the Limited Submission is set up properly and that SubmissionLocales are correct.



If you run into any problems, look below at debugging.

**Check in the files**

Eg https://github.com/unicode-org/cldr/pull/243

## Debugging

It also generates readable log files for double checking. These will be in {workspace}/cldr\-aux/births/\<version\>/, that is: CLDRPaths.AUX\_DIRECTORY \+ "births/" \+ trunkVersion. Examples: https://unicode.org/repos/cldr-aux/births/35.0/en.txt, https://unicode.org/repos/cldr-aux/births/35.0/fr.txt.

Their format is the following (TSV \= tab\-delimited\-values) — to view, it is probably easier to copy the files into a spreadsheet.

- English doesn't have the E... values, but is a complete record.
- Other languages only have lines where the English value is more recently changed (younger) than the native’s.
- So what the first line below says is that French has "bengali" dating back to version 1\.1\.1, while English has "Bangla" dating back to version 30\.

| Loc | Version | Value | PrevValue | EVersion | EValue | EPrevValue | Path |
|---|:---:|---|---|:---:|---|---|---|
| fr | 1.1.1 | bengali || 30 | Bangla | Bengali | //ldml/localeDisplayNames/languages/language[@type="bn"] |
| fr | 1.1.1 | galicien || 1.4.1 | Galician | Gallegan | //ldml/localeDisplayNames/languages/language[@type="gl"] |
| fr | 1.1.1 | kirghize || 24 | Kyrgyz | Kirghiz | //ldml/localeDisplayNames/languages/language[@type="ky"] |
| fr | 1.1.1 | ndébélé du Nord || 1.3 | North Ndebele | Ndebele, North | //ldml/localeDisplayNames/languages/language[@type="nd"] |
| fr | 1.1.1 | ndébélé du Sud || 1.3 | South Ndebele | Ndebele, South | //ldml/localeDisplayNames/languages/language[@type="nr"] |
| ... | | | | | | | |
| fr | 34 | exclamation \| point d’exclamation blanc \| ponctuation | exclamation \| point d’exclamation blanc | trunk | ! \| exclamation \| mark \| outlined \| punctuation \| white exclamation mark | exclamation \| mark \| outlined \| punctuation \| white exclamation mark | //ldml/annotations/annotation[@cp="❕"] |
| fr | 34 | exclamation \| point d’exclamation \| ponctuation | exclamation \| point d’exclamation | trunk | ! \| exclamation \| mark \| punctuation | exclamation \| mark \| punctuation | //ldml/annotations/annotation[@cp="❗"] |
| fr | 34 | cœur \| cœur point d’exclamation \| exclamation \| ponctuation | cœur \| cœur point d’exclamation | trunk | exclamation \| heart exclamation \| mark \| punctuation | exclamation \| heavy heart exclamation \| mark \| punctuation | //ldml/annotations/annotation[@cp="❣"] |
| fr | 34 | couple \| deux hommes se tenant la main \| hommes \| jumeaux | couple \| deux hommes se tenant la main \| jumeaux | trunk | couple \| Gemini \| man \| twins \| men \| holding hands \| zodiac | couple \| Gemini \| man \| twins \| two men holding hands \| zodiac | //ldml/annotations/annotation[@cp="👬"] |
| fr | 34 | couple \| deux femmes se tenant la main \| femmes \| jumelles | couple \| deux femmes se tenant la main \| jumelles | trunk | couple \| hand \| holding hands \| women | couple \| hand \| two women holding hands \| woman | //ldml/annotations/annotation[@cp="👭"] |

A value of � indicates that there is no value for that version.

![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)
46 changes: 46 additions & 0 deletions docs/site/development/coding-cldr-tools/documenting-cldr-tools.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
title: Documenting CLDR Tools
---

# Documenting CLDR Tools

*Developers: Make sure your tool is easily accessible from the command line.*

You can add the @CLDRTool annotation to any class in cldr\-code that has a main() function, and it will be documented as part of the JAR cldr\-code.jar is used.

See [CLDR Tools](https://cldr.unicode.org/development/cldr-tools) for general information about obtaining and using CLDR tools.

## Coding it

An example from ConsoleCheckCLDR.java will start us out here

&emsp;&emsp;@CLDRTool(alias \= "check",

&emsp;&emsp;description \= "Run CheckCLDR against CLDR data")

&emsp;&emsp;public class ConsoleCheckCLDR { …

Then, calling ```java -jar cldr-tools.jar -l``` produces:

&emsp;&emsp;*check \- Run CheckCLDR against CLDR data*

&emsp;&emsp;*\<http://cldr.unicode.org/tools/check\>*

&emsp;&emsp;*\= org.unicode.cldr.test.ConsoleCheckCLDR*

And then ```java -jar cldr-tools.jar check``` can be used to run this tool. All additional arguments after "check" are passed to **ConsoleCheckCLDR.main()** as arguments.

Note these annotation parameters. Only "alias" is required.

- **alias** \- used from the command line instead of the full class name. Also forms part of the default URL for documentation.
- **description** \- a short description of the tool.

Additional parameters:

- **url** \- you can specify a custom URL for the tool. This is displayed with the listing.
- **hidden** \- if non\-empty, this specifies a reason to *not* show the tool when running "java \-jar" without "\-l". For example, the main() function may be a less\-useful internal tool, or a test.
## Documenting it

Assuming your tools’s alias is *myalias,* create a new subpage with the URL http://cldr.unicode.org/tools/myalias (a subpage of [CLDR Tools](https://cldr.unicode.org/development/cldr-tools)). Fill this page out with information about how to use your tool.

![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)
Loading

0 comments on commit 40f9f1a

Please sign in to comment.