diff --git a/docs/site/development/cldr-development-site/updating-englishroot.md b/docs/site/development/cldr-development-site/updating-englishroot.md new file mode 100644 index 00000000000..74e184e3da4 --- /dev/null +++ b/docs/site/development/cldr-development-site/updating-englishroot.md @@ -0,0 +1,97 @@ +--- +title: Updating English/Root +--- + +# Updating English/Root + +Whenever you update English or Root, there is one additional step that needs to be done for the vetting viewer and tests to work properly. + +Update CldrVersion.java to have the newest release in the list. + +## Run GenerateBirth + +The tool is in tools/java/org/unicode/cldr/tool/GenerateBirth.java. It requires a set of sources from all previous major CLDR release, trunk, and a generation directory. These three directories must be structured as follows. The tool takes environment parameters for the second two. + +**cldr (set with \-t \, default\=CldrUtility.BASE\_DIRECTORY, set with environment variable \-DCLDR\_DIR)** + +... common/ ... tools/ java/ (apps such as GenerateBirth are run from here) ... + +**CldrUtility.ARCHIVE\_DIRECTORY** + +1. Create the archive ([Creating the Archive](https://cldr.unicode.org/development/creating-the-archive)) with all releases (if you don't have it already) +2. The archive directory should have the latest version of every major and minor version (where versions before 21\.0 have the major version split across the top two fields). +3. You will probably need to modify both CldrVersion.java and ToolConstants.java to bring them up to date. + +**log (set with \-l \, default\=CldrUtility.UTIL\_DATA\_DIR, set with CLDR\_DIR** + +Pass an argument for \-t to specify the output directory. Takes a few minutes to run (and make sure you have set Java with enough memory)! + +The tool generates (among other things) the following two binary files (among others) in the output directory specified with \-t: + +- **outdated.data** +- **outdatedEnglish.data** + +Replacing the previous versions in /cldr/tools/java/org/unicode/cldr/util/data/births/. These files are used to support OutdatedPaths.java, which is used in CheckNew. + +Readable data is found in https://github.com/unicode\-org/cldr\-staging/tree/master/births/\* That should also be checked in, for comparison over time. Easiest to read if you paste into a spreadsheet! + +## Binary File Format + +| outdatedEnglish.data | outdated.data | +|---|---| +| **int:size** | **str:locale** | +| long:pathId str:oldValue | **int:size** | +| long:pathId str:oldValue | long:pathId | +| ... | long:pathId | +| | ... | +| | **str:locale** | +| | **int:size** | +| | long:pathId | +| | long:pathId | +| | ... | +| **\$END\$** | **\$END\$** | +| ~50KB | ~100KB | + +In a limited release, the file **SubmissionLocales.java** is set up to allow just certain locales and paths in those locales. + +## Testing + +Make sure TestOutdatedPaths.java passes. It may take some modifications, since it depends on the exact data. + +Run TestCheckCLDR and TestBasic with the option **\-prop:logKnownIssue\=false** (that option is important!). This checks that the Limited Submission is set up properly and that SubmissionLocales are correct. + + + +If you run into any problems, look below at debugging. + +**Check in the files** + +Eg https://github.com/unicode-org/cldr/pull/243 + +## Debugging + +It also generates readable log files for double checking. These will be in {workspace}/cldr\-aux/births/\/, that is: CLDRPaths.AUX\_DIRECTORY \+ "births/" \+ trunkVersion. Examples: https://unicode.org/repos/cldr-aux/births/35.0/en.txt, https://unicode.org/repos/cldr-aux/births/35.0/fr.txt. + +Their format is the following (TSV \= tab\-delimited\-values) — to view, it is probably easier to copy the files into a spreadsheet. + +- English doesn't have the E... values, but is a complete record. +- Other languages only have lines where the English value is more recently changed (younger) than the native’s. +- So what the first line below says is that French has "bengali" dating back to version 1\.1\.1, while English has "Bangla" dating back to version 30\. + +| Loc | Version | Value | PrevValue | EVersion | EValue | EPrevValue | Path | +|---|:---:|---|---|:---:|---|---|---| +| fr | 1.1.1 | bengali | � | 30 | Bangla | Bengali | //ldml/localeDisplayNames/languages/language[@type="bn"] | +| fr | 1.1.1 | galicien | � | 1.4.1 | Galician | Gallegan | //ldml/localeDisplayNames/languages/language[@type="gl"] | +| fr | 1.1.1 | kirghize | � | 24 | Kyrgyz | Kirghiz | //ldml/localeDisplayNames/languages/language[@type="ky"] | +| fr | 1.1.1 | ndébélé du Nord | � | 1.3 | North Ndebele | Ndebele, North | //ldml/localeDisplayNames/languages/language[@type="nd"] | +| fr | 1.1.1 | ndébélé du Sud | � | 1.3 | South Ndebele | Ndebele, South | //ldml/localeDisplayNames/languages/language[@type="nr"] | +| ... | | | | | | | | +| fr | 34 | exclamation \| point d’exclamation blanc \| ponctuation | exclamation \| point d’exclamation blanc | trunk | ! \| exclamation \| mark \| outlined \| punctuation \| white exclamation mark | exclamation \| mark \| outlined \| punctuation \| white exclamation mark | //ldml/annotations/annotation[@cp="❕"] | +| fr | 34 | exclamation \| point d’exclamation \| ponctuation | exclamation \| point d’exclamation | trunk | ! \| exclamation \| mark \| punctuation | exclamation \| mark \| punctuation | //ldml/annotations/annotation[@cp="❗"] | +| fr | 34 | cœur \| cœur point d’exclamation \| exclamation \| ponctuation | cœur \| cœur point d’exclamation | trunk | exclamation \| heart exclamation \| mark \| punctuation | exclamation \| heavy heart exclamation \| mark \| punctuation | //ldml/annotations/annotation[@cp="❣"] | +| fr | 34 | couple \| deux hommes se tenant la main \| hommes \| jumeaux | couple \| deux hommes se tenant la main \| jumeaux | trunk | couple \| Gemini \| man \| twins \| men \| holding hands \| zodiac | couple \| Gemini \| man \| twins \| two men holding hands \| zodiac | //ldml/annotations/annotation[@cp="👬"] | +| fr | 34 | couple \| deux femmes se tenant la main \| femmes \| jumelles | couple \| deux femmes se tenant la main \| jumelles | trunk | couple \| hand \| holding hands \| women | couple \| hand \| two women holding hands \| woman | //ldml/annotations/annotation[@cp="👭"] | + +A value of � indicates that there is no value for that version. + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file diff --git a/docs/site/development/coding-cldr-tools/documenting-cldr-tools.md b/docs/site/development/coding-cldr-tools/documenting-cldr-tools.md new file mode 100644 index 00000000000..d206f6cbc8f --- /dev/null +++ b/docs/site/development/coding-cldr-tools/documenting-cldr-tools.md @@ -0,0 +1,46 @@ +--- +title: Documenting CLDR Tools +--- + +# Documenting CLDR Tools + +*Developers: Make sure your tool is easily accessible from the command line.* + +You can add the @CLDRTool annotation to any class in cldr\-code that has a main() function, and it will be documented as part of the JAR cldr\-code.jar is used. + +See [CLDR Tools](https://cldr.unicode.org/development/cldr-tools) for general information about obtaining and using CLDR tools. + +## Coding it + +An example from ConsoleCheckCLDR.java will start us out here + +  @CLDRTool(alias \= "check", + +  description \= "Run CheckCLDR against CLDR data") + +  public class ConsoleCheckCLDR { … + +Then, calling ```java -jar cldr-tools.jar -l``` produces: + +  *check \- Run CheckCLDR against CLDR data* + +  *\* + +  *\= org.unicode.cldr.test.ConsoleCheckCLDR* + +And then ```java -jar cldr-tools.jar check``` can be used to run this tool. All additional arguments after "check" are passed to **ConsoleCheckCLDR.main()** as arguments. + +Note these annotation parameters. Only "alias" is required. + +- **alias** \- used from the command line instead of the full class name. Also forms part of the default URL for documentation. +- **description** \- a short description of the tool. + +Additional parameters: + +- **url** \- you can specify a custom URL for the tool. This is displayed with the listing. +- **hidden** \- if non\-empty, this specifies a reason to *not* show the tool when running "java \-jar" without "\-l". For example, the main() function may be a less\-useful internal tool, or a test. +## Documenting it + +Assuming your tools’s alias is *myalias,* create a new subpage with the URL http://cldr.unicode.org/tools/myalias (a subpage of [CLDR Tools](https://cldr.unicode.org/development/cldr-tools)). Fill this page out with information about how to use your tool. + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file diff --git a/docs/site/development/creating-the-archive.md b/docs/site/development/creating-the-archive.md new file mode 100644 index 00000000000..8909967950c --- /dev/null +++ b/docs/site/development/creating-the-archive.md @@ -0,0 +1,61 @@ +--- +title: Checking out the CLDR Archive +--- + +# Checking out the CLDR Archive + +A number of the tools in CLDR depend on access to older versions. These tools include: + +- [Generating Charts](https://cldr.unicode.org/development/cldr-big-red-switch/generating-charts) +- [Update Validity XML](https://cldr.unicode.org/development/updating-codes/update-validity-xml) +- [Updating English/Root](https://cldr.unicode.org/development/cldr-development-site/updating-englishroot) + - \[Note: add others when we find them] + - Some tests + - TestCompatibility.java + - TestTransforms.java + - TestValidity.java + - Some other tools (typically when given a version argument on the command line) + - FindPluralDifferences + - ... + +### Here's how to do that. + +1. Create an archive directory **cldr\-archive**. The Simplest is if it on the same level as your local CLDR repository. In other words, if your [CLDR\_DIR](https://cldr.unicode.org/development/cldr-development-site/running-cldr-tools) is .../workspace/cldr, then create the directory  **…/workspace/cldr\-archive**
+(Note: The Java property **ARCHIVE** can be used to overide the path to cldr\-archive). +2. Open up ToolConstants.java and look at ToolConstants.CLDR\_VERSIONS. You'll see something like: + 1. **public static final** List\ ***CLDR\_VERSIONS*** \= ImmutableList.of( + 2. "1\.1\.1", + 3. "1\.2", + 4. "1\.3", + 5. "1\.4\.1", + 6. "1\.5\.1", + 7. "1\.6\.1", + 8. "1\.7\.2", + 9. "1\.8\.1", + 10. ... + 11. "41\.0" + 12. // add to this once the release is final! + 13. ); + - NOTE: this should also match CldrVersion.java (those two need to be merged together) +3. Add the just\-released version, such as "**42\.0**" to the list  above + - Also update **DEV\_VERSION** to "43" (the next development version) + - Finally, update CldrVersion.java and make similar changes. +4. Now, run the tool **org.unicode.cldr.tool.CheckoutArchive** + - Or from the command line:
+ **mvn \-DCLDR\_DIR\=** *path\_to/cldr* **\-\-file\=tools/pom.xml \-pl cldr\-code compile \-DskipTests\=true exec:java \-Dexec.mainClass\=org.unicode.cldr.tool.CheckoutArchive  \-Dexec.args\=""** + - Note other options for this tool: +   *\-\-help* will give help +   *\-\-prune* will run a 'git workspace prune' before proceeding +   *\-\-echo* will just show the commands that would be run, without running anything + (For example,  **\-Dexec.args\="\-\-prune"** in the above command line) + +The end result (where you need all of the releases) looks something like the following: + +![alt-text](../images/development/creatingTheArchive.png) + +## Advanced Configuration + +- You can set the property  **\-DCLDR\_ARCHIVE** to point to a different parent directory for the archive +- You can set **\-DCLDR\_HAS\_ARCHIVE\=false** to tell unit tests and tools not to look for the archive + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file diff --git a/docs/site/development/running-tests.md b/docs/site/development/running-tests.md new file mode 100644 index 00000000000..42bd0b31e9b --- /dev/null +++ b/docs/site/development/running-tests.md @@ -0,0 +1,49 @@ +--- +title: Running Tests +--- + +# Running Tests + +You will always need to run tests when you do a check\-in. + +1. Preconditions + - If you change the DTD, be sure to read and follow [Updating DTDs](https://cldr.unicode.org/development/updating-dtds) first. + - If you added a new feature or fixed a significant bug, add a unit test for it. + - See unittest/NumberingSystemsTest as an example. + - Remember to add to unittest/TestAll +2. Run **TestAll \-e** + - These are the unit tests in exhaustive mode + - If you are doing something you know to be simple, you could do the shorter run of just **TestAll** +3. Run **ConsoleCheckCLDR \-e \-z final\_testing \-S common,seed** + - This runs the same set of test that the Survey Tool does. + - If you know what you are doing, you can run a set of filtered tests. +4. Other tests + 1. The unit tests are not complete, so you get a better workout if you are doing anything fancy by running: + 2. [**NewLdml2IcuConverter**](https://cldr.unicode.org/development/coding-cldr-tools/newldml2icuconverter) + 3. [**Generating Charts**](https://cldr.unicode.org/development/cldr-big-red-switch/generating-charts) + 1. If you have interesting new data, write a chart for it. See subclasses of Chart.java for examples. + +## Running tests on the command line + +```bash +$ export CLDR_DIR=/path/to/svn/root/for/cldr + +$ cd $CLDR_DIR/tools/java && ant all + +$ cd $CLDR_DIR/tools/cldr-unittest && ant unittestExhaustive datacheck +``` + +\[TODO: add more commands here; can't we automate all this into a single build rule for ant?] TODO: [TODOL ticket:8864](http://unicode.org/cldr/trac/ticket/8864) + +## Debugging + +\[TODO: add more tips here] + +### Regexes + +We use a lot of regexes! + +1. There is org.unicode.cldr.util.RegexUtilities.showMismatch (and related methods) that are really useful in debugging cases where regexes fail. You hand it a pattern or matcher and a string, and it shows how far the regex got before it failed. +2. To debug RegexLookup, there is a special call you can make where you pass in a set. On return, that set is filled with a set of strings showing how far each of the regex patterns progressed. You can thus see why a string didn't match as expected. + +![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) \ No newline at end of file diff --git a/docs/site/images/development/creatingTheArchive.png b/docs/site/images/development/creatingTheArchive.png new file mode 100644 index 00000000000..1a6a51a8919 Binary files /dev/null and b/docs/site/images/development/creatingTheArchive.png differ