Skip to content

Commit

Permalink
CLDR-17566 Converting Dev P2 (unicode-org#4008)
Browse files Browse the repository at this point in the history
  • Loading branch information
chpy04 authored and haytenf committed Sep 17, 2024
1 parent d4ca01e commit f8b301f
Show file tree
Hide file tree
Showing 5 changed files with 253 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
title: Updating English/Root
---

# Updating English/Root

Whenever you update English or Root, there is one additional step that needs to be done for the vetting viewer and tests to work properly.

Update CldrVersion.java to have the newest release in the list.

## Run GenerateBirth

The tool is in tools/java/org/unicode/cldr/tool/GenerateBirth.java. It requires a set of sources from all previous major CLDR release, trunk, and a generation directory. These three directories must be structured as follows. The tool takes environment parameters for the second two.

**cldr (set with \-t \<target\>, default\=CldrUtility.BASE\_DIRECTORY, set with environment variable \-DCLDR\_DIR)**

... common/ ... tools/ java/ (apps such as GenerateBirth are run from here) ...

**CldrUtility.ARCHIVE\_DIRECTORY**

1. Create the archive ([Creating the Archive](https://cldr.unicode.org/development/creating-the-archive)) with all releases (if you don't have it already)
2. The archive directory should have the latest version of every major and minor version (where versions before 21\.0 have the major version split across the top two fields).
3. You will probably need to modify both CldrVersion.java and ToolConstants.java to bring them up to date.

**log (set with \-l \<log\>, default\=CldrUtility.UTIL\_DATA\_DIR, set with CLDR\_DIR**

Pass an argument for \-t to specify the output directory. Takes a few minutes to run (and make sure you have set Java with enough memory)!

The tool generates (among other things) the following two binary files (among others) in the output directory specified with \-t:

- **outdated.data**
- **outdatedEnglish.data**

Replacing the previous versions in /cldr/tools/java/org/unicode/cldr/util/data/births/. These files are used to support OutdatedPaths.java, which is used in CheckNew.

Readable data is found in https://github.com/unicode\-org/cldr\-staging/tree/master/births/\* That should also be checked in, for comparison over time. Easiest to read if you paste into a spreadsheet!

## Binary File Format

| outdatedEnglish.data | outdated.data |
|---|---|
| **int:size** | **str:locale** |
| long:pathId str:oldValue | **int:size** |
| long:pathId str:oldValue | long:pathId |
| ... | long:pathId |
| | ... |
| | **str:locale** |
| | **int:size** |
| | long:pathId |
| | long:pathId |
| | ... |
| **\$END\$** | **\$END\$** |
| ~50KB | ~100KB |

In a limited release, the file **SubmissionLocales.java** is set up to allow just certain locales and paths in those locales.

## Testing

Make sure TestOutdatedPaths.java passes. It may take some modifications, since it depends on the exact data.

Run TestCheckCLDR and TestBasic with the option **\-prop:logKnownIssue\=false** (that option is important!). This checks that the Limited Submission is set up properly and that SubmissionLocales are correct.



If you run into any problems, look below at debugging.

**Check in the files**

Eg https://github.com/unicode-org/cldr/pull/243

## Debugging

It also generates readable log files for double checking. These will be in {workspace}/cldr\-aux/births/\<version\>/, that is: CLDRPaths.AUX\_DIRECTORY \+ "births/" \+ trunkVersion. Examples: https://unicode.org/repos/cldr-aux/births/35.0/en.txt, https://unicode.org/repos/cldr-aux/births/35.0/fr.txt.

Their format is the following (TSV \= tab\-delimited\-values) — to view, it is probably easier to copy the files into a spreadsheet.

- English doesn't have the E... values, but is a complete record.
- Other languages only have lines where the English value is more recently changed (younger) than the native’s.
- So what the first line below says is that French has "bengali" dating back to version 1\.1\.1, while English has "Bangla" dating back to version 30\.

| Loc | Version | Value | PrevValue | EVersion | EValue | EPrevValue | Path |
|---|:---:|---|---|:---:|---|---|---|
| fr | 1.1.1 | bengali || 30 | Bangla | Bengali | //ldml/localeDisplayNames/languages/language[@type="bn"] |
| fr | 1.1.1 | galicien || 1.4.1 | Galician | Gallegan | //ldml/localeDisplayNames/languages/language[@type="gl"] |
| fr | 1.1.1 | kirghize || 24 | Kyrgyz | Kirghiz | //ldml/localeDisplayNames/languages/language[@type="ky"] |
| fr | 1.1.1 | ndébélé du Nord || 1.3 | North Ndebele | Ndebele, North | //ldml/localeDisplayNames/languages/language[@type="nd"] |
| fr | 1.1.1 | ndébélé du Sud || 1.3 | South Ndebele | Ndebele, South | //ldml/localeDisplayNames/languages/language[@type="nr"] |
| ... | | | | | | | |
| fr | 34 | exclamation \| point d’exclamation blanc \| ponctuation | exclamation \| point d’exclamation blanc | trunk | ! \| exclamation \| mark \| outlined \| punctuation \| white exclamation mark | exclamation \| mark \| outlined \| punctuation \| white exclamation mark | //ldml/annotations/annotation[@cp="❕"] |
| fr | 34 | exclamation \| point d’exclamation \| ponctuation | exclamation \| point d’exclamation | trunk | ! \| exclamation \| mark \| punctuation | exclamation \| mark \| punctuation | //ldml/annotations/annotation[@cp="❗"] |
| fr | 34 | cœur \| cœur point d’exclamation \| exclamation \| ponctuation | cœur \| cœur point d’exclamation | trunk | exclamation \| heart exclamation \| mark \| punctuation | exclamation \| heavy heart exclamation \| mark \| punctuation | //ldml/annotations/annotation[@cp="❣"] |
| fr | 34 | couple \| deux hommes se tenant la main \| hommes \| jumeaux | couple \| deux hommes se tenant la main \| jumeaux | trunk | couple \| Gemini \| man \| twins \| men \| holding hands \| zodiac | couple \| Gemini \| man \| twins \| two men holding hands \| zodiac | //ldml/annotations/annotation[@cp="👬"] |
| fr | 34 | couple \| deux femmes se tenant la main \| femmes \| jumelles | couple \| deux femmes se tenant la main \| jumelles | trunk | couple \| hand \| holding hands \| women | couple \| hand \| two women holding hands \| woman | //ldml/annotations/annotation[@cp="👭"] |

A value of � indicates that there is no value for that version.

![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)
46 changes: 46 additions & 0 deletions docs/site/development/coding-cldr-tools/documenting-cldr-tools.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
title: Documenting CLDR Tools
---

# Documenting CLDR Tools

*Developers: Make sure your tool is easily accessible from the command line.*

You can add the @CLDRTool annotation to any class in cldr\-code that has a main() function, and it will be documented as part of the JAR cldr\-code.jar is used.

See [CLDR Tools](https://cldr.unicode.org/development/cldr-tools) for general information about obtaining and using CLDR tools.

## Coding it

An example from ConsoleCheckCLDR.java will start us out here

&emsp;&emsp;@CLDRTool(alias \= "check",

&emsp;&emsp;description \= "Run CheckCLDR against CLDR data")

&emsp;&emsp;public class ConsoleCheckCLDR { …

Then, calling ```java -jar cldr-tools.jar -l``` produces:

&emsp;&emsp;*check \- Run CheckCLDR against CLDR data*

&emsp;&emsp;*\<http://cldr.unicode.org/tools/check\>*

&emsp;&emsp;*\= org.unicode.cldr.test.ConsoleCheckCLDR*

And then ```java -jar cldr-tools.jar check``` can be used to run this tool. All additional arguments after "check" are passed to **ConsoleCheckCLDR.main()** as arguments.

Note these annotation parameters. Only "alias" is required.

- **alias** \- used from the command line instead of the full class name. Also forms part of the default URL for documentation.
- **description** \- a short description of the tool.

Additional parameters:

- **url** \- you can specify a custom URL for the tool. This is displayed with the listing.
- **hidden** \- if non\-empty, this specifies a reason to *not* show the tool when running "java \-jar" without "\-l". For example, the main() function may be a less\-useful internal tool, or a test.
## Documenting it

Assuming your tools’s alias is *myalias,* create a new subpage with the URL http://cldr.unicode.org/tools/myalias (a subpage of [CLDR Tools](https://cldr.unicode.org/development/cldr-tools)). Fill this page out with information about how to use your tool.

![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)
61 changes: 61 additions & 0 deletions docs/site/development/creating-the-archive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: Checking out the CLDR Archive
---

# Checking out the CLDR Archive

A number of the tools in CLDR depend on access to older versions. These tools include:

- [Generating Charts](https://cldr.unicode.org/development/cldr-big-red-switch/generating-charts)
- [Update Validity XML](https://cldr.unicode.org/development/updating-codes/update-validity-xml)
- [Updating English/Root](https://cldr.unicode.org/development/cldr-development-site/updating-englishroot)
- \[Note: add others when we find them]
- Some tests
- TestCompatibility.java
- TestTransforms.java
- TestValidity.java
- Some other tools (typically when given a version argument on the command line)
- FindPluralDifferences
- ...

### Here's how to do that.

1. Create an archive directory **cldr\-archive**. The Simplest is if it on the same level as your local CLDR repository. In other words, if your [CLDR\_DIR](https://cldr.unicode.org/development/cldr-development-site/running-cldr-tools) is .../workspace/cldr, then create the directory  **…/workspace/cldr\-archive** <br>
(Note: The Java property **ARCHIVE** can be used to overide the path to cldr\-archive).
2. Open up ToolConstants.java and look at ToolConstants.CLDR\_VERSIONS. You'll see something like:
1. **public static final** List\<String\> ***CLDR\_VERSIONS*** \= ImmutableList.of(
2. "1\.1\.1",
3. "1\.2",
4. "1\.3",
5. "1\.4\.1",
6. "1\.5\.1",
7. "1\.6\.1",
8. "1\.7\.2",
9. "1\.8\.1",
10. ...
11. "41\.0"
12. // add to this once the release is final!
13. );
- NOTE: this should also match CldrVersion.java (those two need to be merged together)
3. Add the just\-released version, such as "**42\.0**" to the list  above
- Also update **DEV\_VERSION** to "43" (the next development version)
- Finally, update CldrVersion.java and make similar changes.
4. Now, run the tool **org.unicode.cldr.tool.CheckoutArchive**
- Or from the command line:<br>
**mvn \-DCLDR\_DIR\=** *path\_to/cldr* **\-\-file\=tools/pom.xml \-pl cldr\-code compile \-DskipTests\=true exec:java \-Dexec.mainClass\=org.unicode.cldr.tool.CheckoutArchive  \-Dexec.args\=""**
- Note other options for this tool:
  *\-\-help* will give help
  *\-\-prune* will run a 'git workspace prune' before proceeding
  *\-\-echo* will just show the commands that would be run, without running anything
(For example,  **\-Dexec.args\="\-\-prune"** in the above command line)

The end result (where you need all of the releases) looks something like the following:

![alt-text](../images/development/creatingTheArchive.png)

## Advanced Configuration

- You can set the property  **\-DCLDR\_ARCHIVE** to point to a different parent directory for the archive
- You can set **\-DCLDR\_HAS\_ARCHIVE\=false** to tell unit tests and tools not to look for the archive

![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)
49 changes: 49 additions & 0 deletions docs/site/development/running-tests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
title: Running Tests
---

# Running Tests

You will always need to run tests when you do a check\-in.

1. Preconditions
- If you change the DTD, be sure to read and follow [Updating DTDs](https://cldr.unicode.org/development/updating-dtds) first.
- If you added a new feature or fixed a significant bug, add a unit test for it.
- See unittest/NumberingSystemsTest as an example.
- Remember to add to unittest/TestAll
2. Run **TestAll \-e**
- These are the unit tests in exhaustive mode
- If you are doing something you know to be simple, you could do the shorter run of just **TestAll**
3. Run **ConsoleCheckCLDR \-e \-z final\_testing \-S common,seed**
- This runs the same set of test that the Survey Tool does.
- If you know what you are doing, you can run a set of filtered tests.
4. Other tests
1. The unit tests are not complete, so you get a better workout if you are doing anything fancy by running:
2. [**NewLdml2IcuConverter**](https://cldr.unicode.org/development/coding-cldr-tools/newldml2icuconverter)
3. [**Generating Charts**](https://cldr.unicode.org/development/cldr-big-red-switch/generating-charts)
1. If you have interesting new data, write a chart for it. See subclasses of Chart.java for examples.

## Running tests on the command line

```bash
$ export CLDR_DIR=/path/to/svn/root/for/cldr

$ cd $CLDR_DIR/tools/java && ant all

$ cd $CLDR_DIR/tools/cldr-unittest && ant unittestExhaustive datacheck
```

\[TODO: add more commands here; can't we automate all this into a single build rule for ant?] TODO: [TODOL ticket:8864](http://unicode.org/cldr/trac/ticket/8864)

## Debugging

\[TODO: add more tips here]

### Regexes

We use a lot of regexes!

1. There is org.unicode.cldr.util.RegexUtilities.showMismatch (and related methods) that are really useful in debugging cases where regexes fail. You hand it a pattern or matcher and a string, and it shows how far the regex got before it failed.
2. To debug RegexLookup, there is a special call you can make where you pass in a set. On return, that set is filled with a set of strings showing how far each of the regex patterns progressed. You can thus see why a string didn't match as expected.

![Unicode copyright](https://www.unicode.org/img/hb_notice.gif)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit f8b301f

Please sign in to comment.