Skip to content

Commit

Permalink
CLDR-16403 Add v44 PersonName features (#3304)
Browse files Browse the repository at this point in the history
* CLDR-16403 Add v44 Personname features

* CLDR-16403 Add the new modifiers

* CLDR-16403 formatting
  • Loading branch information
macchiati authored Oct 3, 2023
1 parent 3dee601 commit 2dd770a
Showing 1 changed file with 62 additions and 11 deletions.
73 changes: 62 additions & 11 deletions docs/ldml/tr35-personNames.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,13 @@ The following features are currently out of scope for Person Names formating:

A draft API for formatting personal names is included in ICU4J 73. (“Draft” means that the full functionality is present, but the API might be refined before it is stabilized.) The implementation can be found at [PersonNameFormatter.java](https://github.com/unicode-org/icu/blob/main/icu4j/main/classes/core/src/com/ibm/icu/text/PersonNameFormatter.java) and [SimplePersonName.java](https://github.com/unicode-org/icu/blob/main/icu4j/main/classes/core/src/com/ibm/icu/text/SimplePersonName.java).

In addition to the settings in this document, it is recommended that implementations provide some additional features in their APIs to allow more control for clients, notably:

1. forceGivenFirst — no matter what the values are in nameOrderLocales or in the NameObject, display the name as givenFirst.
2. forceSurnameFirst — no matter what the values are in nameOrderLocales or in the NameObject, display the name as surnameFirst.
3. forceNativeOrdering — no matter what the values are in nameOrderLocales or in the NameObject, display the name with the same ordering as the native locale.
4. surnameFirstAllCaps — display the surname and surname2 fields in all caps **if** not using native order. Thus where the foreign name ordering is surnameFirst, the name {given=Shinzo, surname=Abe} would display as “ABE Shinzo”.

### Person Name Formatting Overview

Logically, the model used for applying the CLDR data is the following:
Expand Down Expand Up @@ -200,7 +207,7 @@ Person name formatting data is stored as LDML with schema defined as follows. Ea
### personNames Element

```xml
<!ELEMENT personNames ( nameOrderLocales*, foreignSpaceReplacement?, initialPattern*, personName+, sampleName* ) >
<!ELEMENT personNames ( nameOrderLocales*, parameterDefault*, nativeSpaceReplacement*, foreignSpaceReplacement*, initialPattern*, personName*, sampleName* ) >
```

The LDML top-level `<personNames>` element contains information regarding the formatting of person names, and the formatting of person names in specific contexts for a specific locale.
Expand Down Expand Up @@ -268,9 +275,18 @@ An example from English may look like the following
This would tell the formatting code, when handling person name data from an English locale, to use patterns with the `givenFirst` order attribute for all data except name data from Korean, Vietnamese, Cantonese, and Chinese locales, where the `surnameFirst` patterns should be used.

### parameterDefault Element
```xml
<!ELEMENT parameterDefault ( #PCDATA ) >
<!ATTLIST parameterDefault parameter (length | formality) #REQUIRED >
```
Many clients of the person-names functionality don’t really care about formal versus informal; they just want whatever the “normal” formality level is for the user’s language. The same goes for the default length.

This parameter provides that information, so that APIs can allow users to use default values for the formality and length. The exact form that this takes depends on the API conventions, of course.

### foreignSpaceReplacement Element

The `<foreignSpaceReplacement>` element is used to specify how spaces should be handled when the name language is different from the formatting language.
The `<foreignSpaceReplacement>` element is used to specify how spaces should be handled when the name language is **different from** the formatting language. It is used in languages that don't normally require spaces between words. For example, Japanese and Chinese have the value of a middle dot (‘·’ U+00B7 MIDDLE DOT or ‘・’ U+30FB KATAKANA MIDDLE DOT), so that it is used between words in a foreign name; most other languages have the value of SPACE.

```xml
<!ELEMENT foreignSpaceReplacement ( #PCDATA ) >
Expand All @@ -280,6 +296,18 @@ The `<foreignSpaceReplacement>` element is used to specify how spaces should be
* `xml:space` must be set to `'preserve'` so that actual spaces in the pattern are preserved. See [W3C XML White Space Handling](https://www.w3.org/TR/xml/#sec-white-space).
* The `#PCDATA `is the character sequence used to replace spaces when postprocessing a pattern.

### nativeSpaceReplacement Element

The `<nativeSpaceReplacement>` element is used to specify how spaces should be handled when the name language is **the same as** the formatting language. It is used in languages that don't normally require spaces between words, but may use spaces within names. For example, Japanese and Chinese have the value of an empty string between words in a native name; most other languages have the value of SPACE.

```xml
<!ELEMENT nativeSpaceReplacement ( #PCDATA ) >
<!ATTLIST nativeSpaceReplacement xml:space preserve #REQUIRED >
```

* `xml:space` must be set to `'preserve'` so that actual spaces in the pattern are preserved. See [W3C XML White Space Handling](https://www.w3.org/TR/xml/#sec-white-space).
* The `#PCDATA `is the character sequence used to replace spaces when postprocessing a pattern.

### initialPattern Element

The `<initialPattern>` element is used to specify how to format initials of name parts.
Expand Down Expand Up @@ -468,13 +496,41 @@ The modifiers transform the input data as described in the following table:
| initialCap | Request the element with the first grapheme capitalized, and remaining characters unchanged. This is used in cases where an element is usually in lower case but may need to be modified. For example in Dutch, the name<br/>{ title: “dhr.”, given: ”Johannes”, surname: “van den Berg” },<br/>when addressed formally, would need to be “dhr. Van den Berg”. This would be represented as<br/>“{title} {surname-initialCap}”<br/><br/>Only the _“-allCaps”_ or the _“-initalCap”_ modifier may be used, but not both. They are mutually exclusive. |
| initial | Requests the initial grapheme cluster of each word in a field. The `initialPattern` patterns for the locale are used to create the format and layout for lists of initials. For example, if the initialPattern types are<br/>`<initialPattern type="initial">{0}.</initialPattern>`<br/>`<initialPattern type="initialSequence">{0} {1}</initialPattern>`<br/>then a name such as<br/>{ given: “John”, given2: “Ronald Reuel”, surname: “Tolkien” }<br/>could be represented as<br/>“{given-initial-allCaps} {given2-initial-allCaps} {surname}”<br/>and will format to “**J. R. R. Tolkien**”<br/><br/>_The default implementation uses the first grapheme cluster of each word for the value for the field; if the PersonName object has a locale, and CLDR supports a locale-specific grapheme cluster algorithm for that locale, then that algorithm is used. The PersonName object can override this, as detailed below._<br/><br/>Only the _“-initial”_ or the _“-monogram”_ modifier may be used, but not both. They are mutually exclusive. |
| monogram | Requests initial grapheme. Example: A name such as<br/>{ given: “Landon”, given2: “Bainard Crawford”, surname: “Johnson” }<br/>could be represented as<br/>“{given-monogram-allCaps}{given2-monogram-allCaps}{surname-monogram-allCaps}”<br/>or “**LBJ**”<br/><br/>_The default implementation uses the first grapheme cluster of the value for the field; if the PersonName object has a locale, and CLDR supports a locale-specific grapheme cluster algorithm for that locale, then that algorithm is used. The PersonName object can override this, as detailed below. The difference between monogram an initial is that monogram only returns one element, not one element per word._<br/><br/>Only the _“-initial”_ or the _“-monogram”_ modifier may be used, but not both. They are mutually exclusive. |
| retain | This is needed in languages that preserve punctuation when forming initials. For example, normally the name {given=Anne-Marie} is converted into initials with {given-initialCaps} as “A. M.”. However, where a language preserves the hyphen, the pattern should use {given-initialCaps**-retain**} instead. In that case, the result is “A.-M.”. (The periods are added by the pattern-initialSequence.) |
| genitive, vocative | Patterns can use these modifiers so that better results can be obtained for inflected languages. However, see the details below. |

#### Grammatical Modifiers for Names

The CLDR person name formatting does not itself support grammatical inflection.
However, name sources (NameObject) can support inflections, either by having additional fields or by using an inflection engine that can handle personal name parts.

In the current release, the focus is on supporting `referring` and `addressing` forms.
Typically the `referring` forms will be in the most neutral (*nominative*) case, and the `addressing` forms will be in the *vocative* case.
Some modifiers have been added to facilitate this, so that there can be patterns like: {given-vocative} {surname-vocative}.

Notice that some **parts** of the formatted name may be in different grammatical cases, so the cases may not be consistent across the whole name.
For example:

| English Pattern | Examples | Latvian Pattern | Examples |
| ---- | ---- | ---- | ---- |
| {given} {surname} | John Smith | {given} {surname} | Kārlis Ozoliņš |
| {title} {surname} | Mr Smith | {surname} {title} | Ozoliņa kungs |

Notice that the `surname` in Latvian needs to change to the genitive case with that pattern:

Ozoliņš ➡︎ **Ozoliņa**

That is accomplished by changing the pattern to be {surname<b>-genitive</b>} {title}. In this case the {surname} should only be genitive if followed by the {title}.

#### Future Modifiers

There may be more modifiers in the future.
Additional modifiers may be added in future versions of CLDR.

Examples:

1. For the initial of the surname **_“de Souza”_**, in a language that treats the “de” as a tussenvoegsel, the PersonName object can automatically recast `{surname-initial}` to:<br/>`{surname-prefix-initial}{surname-core-initial-allCaps} `to get “dS” instead of “d”.
2. If the locale expects a surname prefix to to be sorted after a surname, then both `{surname-core} `then `{surname-prefix}` would be used as in<br/>`{surname-core}, {given} {given2} {surname-prefix}`
3. Only the grammatical modifiers requested by translators for `referring` or `addressing` have been added as yet, but additional grammatical modifiers may be added in the future.

## Formatting Process

Expand Down Expand Up @@ -710,14 +766,8 @@ Here are examples for Albert Einstein in Japanese and Chinese:

#### Setting the spaceReplacement

1. The foreignSpaceReplacement is provided by the value for the `foreignSpaceReplacement` element; the default value is " ".
2. The nativeSpaceReplacement is determined by the following algorithm, choosing between " " and "".
1. Get the script of the formatting locale
2. If the likely script is Thai, let nativeSpaceReplacement = " " (space)
3. Otherwise let nativeSpaceReplacement = "" (empty string) if either of the following applies:
1. The script is Jpan, Hant, or Hans
2. The script has the script metadata property lbLetters = YES (this can also be algorithmically derived from the LineBreak property data).
4. Otherwise, let nativeSpaceReplacement = " " (space)
1. The foreignSpaceReplacement is provided by the value for the `foreignSpaceReplacement` element; the default value is a SPACE (" ").
2. The nativeSpaceReplacement is provided by the value for the `nativeSpaceReplacement` element; the default value is SPACE (" ").
3. If the formatter base language matches the name base language, then let spaceReplacement = nativeSpaceReplacement, otherwise let spaceReplacement = foreignSpaceReplacement.
4. Replace all sequences of space in the formatted value string by the spaceReplacement.

Expand Down Expand Up @@ -748,6 +798,7 @@ Suppose the PersonNames formatting patterns for `ja_JP` and `de_CH` contained th
&lt;personNames&gt;
&lt;nameOrderLocales order="givenFirst"&gt;und&lt;/nameOrderLocales&gt;
&lt;<strong>nameOrderLocales</strong> order="<strong>surnameFirst</strong>"&gt;hu <strong>ja</strong> ko vi yue zh <strong>und_JP</strong>&lt;/nameOrderLocales&gt;
&lt;<strong>nativeSpaceReplacement</strong> xml:space="preserve"&gt;<span style="background-color:aqua"></span>&lt;/nativeSpaceReplacement&gt;
&lt;<strong>foreignSpaceReplacement</strong> xml:space="preserve"&gt;<span style="background-color:aqua">・</span>&lt;/foreignSpaceReplacement&gt;
. . .
&lt;personName order="<strong>givenFirst</strong>" length="medium" usage="referring" formality="formal"&gt;
Expand Down

0 comments on commit 2dd770a

Please sign in to comment.