diff --git a/docs/ldml/tr35-keyboards.md b/docs/ldml/tr35-keyboards.md index f90fd6791b9..02257391816 100644 --- a/docs/ldml/tr35-keyboards.md +++ b/docs/ldml/tr35-keyboards.md @@ -31,7 +31,7 @@ This document is a _technical preview_ of the Keyboard standard. To process earlier XML files, use the data and specification from v43.1, found at -The CLDR [Keyboard Workgroup](https://cldr.unicode.org/index/keyboard-workgroup) is currently +The CLDR [Keyboard Workgroup][keyboard-workgroup] is currently developing this technical preview to the CLDR keyboard specification. ## Parts @@ -150,12 +150,11 @@ The LDML specification is divided into the following parts: The Unicode Standard and related technologies such as CLDR have dramatically improved the path to language support. However, keyboard support remains platform and vendor specific, causing inconsistencies in implementation as well as timeline. -> “More and more language communities are determining that digitization is vital to their approach to language preservation and that engagement with Unicode is essential to becoming fully digitized. For many of these communities, however, getting new characters or a new script added to The Unicode Standard is not the end of their journey. The next, often more challenging stage is to get device makers, operating systems, apps and services to implement the script requirements that Unicode has just added to support their language. … -> -> “However, commensurate improvements to streamline new language support on the input side have been lacking. CLDR’s new Keyboard Subcommittee has been established to address this very gap.” -> _(Cornelius et. al, “Standardizing Keyboards with CLDR,” presented at the 45th Internationalization and Unicode Conference, Santa Clara, California, USA, October 2021)_ +More and more language communities are determining that digitization is vital to their approach to language preservation and that engagement with Unicode is essential to becoming fully digitized. For many of these communities, however, getting new characters or a new script added to The Unicode Standard is not the end of their journey. The next, often more challenging stage is to get device makers, operating systems, apps and services to implement the script requirements that Unicode has just added to support their language. + +However, commensurate improvements to streamline new language support on the input side have been lacking. CLDR’s Keyboard specification has been updated in an attempt to address this gap. -The CLDR keyboard format seeks to address these challenges, by providing an interchange format for the communication of keyboard mapping data independent of vendors and platforms. Keyboard authors can then create a single mapping file for their language, which implementations can use to provide that language’s keyboard mapping on their own platform. +This document specifies an interchange format for the communication of keyboard mapping data independent of vendors and platforms. Keyboard authors can then create a single mapping file for their language, which implementations can use to provide that language’s keyboard mapping on their own platform. Additionally, the standardized identifier for keyboards can be used to communicate, internally or externally, a request for a particular keyboard mapping that is to be used to transform either text or keystrokes. The corresponding data can then be used to perform the requested actions. For example, a remote screen-access application (such as used for customer service or server management) would be able to communicate and choose the same keyboard layout on the remote device as is used in front of the user, even if the two systems used different platforms. @@ -177,8 +176,6 @@ Some goals of this format are: 2. Provide definitive platform-independent definitions for new keyboard layouts. * For example, a new French standard keyboard layout would have a single definition which would be usable across all implementations. 3. Allow platforms to be able to use CLDR keyboard data for the character-emitting keys (non-frame) aspects of keyboard layouts. - * For example, platform-specific keys such as Fn, Numpad, IME swap keys, and cursor keys are out of scope. - * This also means that modifier (frame) keys cannot generate output, such as capslock -> backslash. 4. Deprecate & archive existing LDML platform-specific layouts so they are not part of future releases. +**Implementation:** see **Keyboard implementation** **Input Method Editor (IME):** a component or program that supports input of large character sets. Typically, IMEs employ contextual logic and candidate UI to identify the Unicode characters intended by the user. - +1. A _compile/build tool_ part used by **Keyboard authors** to parse the XML file and produce a compact runtime format, and +2. A _runtime_ part which interprets the runtime format when the keyboard is selected by the end user, and delivers the output plain text to the platform or application. **Key:** A physical key on a hardware keyboard, or a virtual key on a touch keyboard. @@ -271,9 +267,15 @@ If it becomes necessary in the future, the format could extend the ISO layout to **Virtual keyboard:** see **Touch keyboard** +## Notation + +- Ellipses (`…`) in syntax examples are used to denote substituted parts. + + For example, `id="…keyId"` denotes that `…keyId` (the part between double quotes) is to be replaced with something, in this case a key identifier. As another example, `\u{…usv}` denotes that the `…usv` is to be replaced with something, in this case a Unicode scalar value in hex. + ### Escaping -When explicitly specified, attribute values can contain escaped characters. This specification uses two methods of escaping, the _UnicodeSet_ notation and the `\u{...}` notation. +When explicitly specified, attribute values can contain escaped characters. This specification uses two methods of escaping, the _UnicodeSet_ notation and the `\u{…usv}` notation. ### UnicodeSet Escaping @@ -289,7 +291,7 @@ Currently, the following attribute values allow _UnicodeSet_ notation: ### UTS18 Escaping -The `\u{...}` notation, a subset of hex notation, is described in [UTS #18 section 1.1](https://www.unicode.org/reports/tr18/#Hex_notation). It can refer to one or multiple individual codepoints. Currently, the following attribute values allow the `\u{...}` notation: +The `\u{…usv}` notation, a subset of hex notation, is described in [UTS #18 section 1.1](https://www.unicode.org/reports/tr18/#Hex_notation). It can refer to one or multiple individual codepoints. Currently, the following attribute values allow the `\u{…}` notation: * `output` on the `` element * `from` or `to` on the `` element @@ -306,19 +308,18 @@ Attribute values escaped in this manner are annotated with the ` ``` @@ -646,7 +647,7 @@ For purposes of this current draft specification, the value should always be `te _Attribute:_ `locale` (required) -This attribute represents the primary locale of the keyboard using BCP 47 [Unicode locale identifiers](tr35.md#Canonical_Unicode_Locale_Identifiers) - for example `"el"` for Greek. Sometimes, the locale may not specify the base language. For example, a Devanagari keyboard for many languages could be specified by BCP-47 code: `"und-Deva"`. However, it is better to list out the languages explicitly using the [`locales`](#element-locales) element. +This attribute value contains the primary locale of the keyboard using BCP 47 [Unicode locale identifiers](tr35.md#Canonical_Unicode_Locale_Identifiers) - for example `"el"` for Greek. Sometimes, the locale may not specify the base language. For example, a Devanagari keyboard for many languages could be specified by BCP-47 code: `"und-Deva"`. However, it is better to list out the languages explicitly using the [`locales`](#element-locales) element. For further details about the choice of locale ID, see [Keyboard IDs](#keyboard-ids). @@ -696,7 +697,7 @@ The optional `` element allows specifying additional or alternate local **Syntax** ```xml - + ``` > @@ -738,7 +739,7 @@ Element used to keep track of the source data version. **Syntax** ```xml - + ``` > @@ -779,10 +780,10 @@ Element containing informative properties about the layout, for displaying in us ```xml + name="…name" + author="…author" + layout="…hint of the layout" + indicator="…short identifier" /> ``` > @@ -907,16 +908,16 @@ This element defines a mapping between an abstract key and its output. This elem ```xml ``` @@ -937,21 +938,21 @@ _Attribute:_ `id` > The `id` attribute uniquely identifies the key. NMTOKEN. It can (but needn't be) the key name (a, b, c, A, B, C, …), or any other valid token (e-acute, alef, alif, alpha, …). > -> In the future, this attribute’s definition is expected to be updated to align with [UAX#31](https://www.unicode.org/reports/tr31/). Please see [CLDR-17043](https://unicode-org.atlassian.net/browse/CLDR-17043) for more details. +> In the future, this attribute’s definition is expected to be updated to align with [UAX#31](https://www.unicode.org/reports/tr31/). -_Attribute:_ `flickId="{flick id}"` (optional) +_Attribute:_ `flickId="…flickId"` (optional) > The `flickId` attribute indicates that this key makes use of a [`flick`](#element-flick) set with the specified id. _Attribute:_ `gap="true"` (optional) -> The `gap` attribute indicates that this key does not have any appearance, but represents a "gap" of the specified number of key widths. Can be used with `width` to set a width. +> The `gap` attribute indicates that this key does not have any appearance, but causes a "gap" of the specified number of key widths. Can be used with `width` to set a width. ```xml ``` -_Attribute:_ `longPressKeyIds="{list of key ids}"` (optional) +_Attribute:_ `longPressKeyIds="…list of keyIds"` (optional) > A space-separated ordered list of `key` element ids, which keys which can be emitted by "long-pressing" this key. This feature is prominent in mobile devices. > @@ -980,7 +981,7 @@ _Attribute:_ `longPressKeyIds="{list of key ids}"` (optional) > > ``` -_Attribute:_ `longPressDefaultKeyId="{key-id}"` (optional) +_Attribute:_ `longPressDefaultKeyId="…keyId"` (optional) > Specifies the default key, by id, in a list of long-press keys. See the discussion of `LongPressKeyIds`, above. @@ -1022,14 +1023,14 @@ _Attribute:_ `layerId="shift"` (optional) > > This attribute is an NMTOKEN. > -> In the future, this attribute’s definition is expected to be updated to align with [UAX#31](https://www.unicode.org/reports/tr31/). Please see [CLDR-17043](https://unicode-org.atlassian.net/browse/CLDR-17043) for more details. +> In the future, this attribute’s definition is expected to be updated to align with [UAX#31](https://www.unicode.org/reports/tr31/). _Attribute:_ `output` -> The `output` attribute value contains the sequence of characters that is emitted when pressing this particular key. Control characters, whitespace (other than the regular space character) and combining marks in this attribute are escaped using the `\u{...}` notation. More than one key may output the same output. +> The `output` attribute value contains the sequence of characters that is emitted when pressing this particular key. Control characters, whitespace (other than the regular space character) and combining marks in this attribute are escaped using the `\u{…}` notation. More than one key may output the same output. > -> The `output` attribute may also contain the `\m{…}` syntax to insert a marker. See the definition of [markers](#markers). +> The `output` attribute may also contain the `\m{…markerId}` syntax to insert a marker. See the definition of [markers](#markers). _Attribute:_ `width="1.2"` (optional, default "1.0") @@ -1135,7 +1136,7 @@ _Attribute:_ `id` (required) > The `flick` elements do not share a namespace with the `key`s, so it would also be allowed > to have `` > -> In the future, this attribute’s definition is expected to be updated to align with [UAX#31](https://www.unicode.org/reports/tr31/). Please see [CLDR-17043](https://unicode-org.atlassian.net/browse/CLDR-17043) for more details. +> In the future, this attribute’s definition is expected to be updated to align with [UAX#31](https://www.unicode.org/reports/tr31/). * * * @@ -1272,36 +1273,18 @@ After loading, the above example will be the equivalent of the following. ### Element: displays -The displays can be used to describe what is to be displayed on the keytops for various keys. For the most part, such explicit information is unnecessary since the `@to` element from the `keys/key` element can be used. But there are some characters, such as diacritics, that do not display well on their own and so explicit overrides for such characters can help. -Another useful scenario is where there are doubled diacritics, or multiple characters with spacing issues. - -The `displays` consists of a list of display subelements. - -`displays` elements are designed to be shared across many different keyboard layout descriptions, and imported with `` where needed. - -For combining characters, U+25CC `◌` is used as a base. It is an error to use a combining character without a base in the `display` attribute. - -For example, a key which outputs a combining tilde (U+0303) can be represented as follows: - -```xml - -``` - -This way, a key which outputs a combining tilde (U+0303) will be represented as `◌̃` (a tilde on a dotted circle). - -Some scripts/languages may prefer a different base than U+25CC. -See [``](#element-displayoptions). +The `displays` element consists of a list of [`display`](#element-display) subelements. **Syntax** ```xml - {a set of display elements} + + + … ``` -**Note**: There is currently no way to indicate a custom display for a key without output (i.e. without a `to=` attribute), nor is there a way to indicate that such a key has a standardized identity (e.g. that a key should be identified as a “Shift”). These may be addressed in future versions of this standard. - > > > Parents: [keyboard3](#element-keyboard3) @@ -1316,12 +1299,34 @@ See [``](#element-displayoptions). ### Element: display -The `display` element describes how a character, that has come from a `keys/key` element, should be displayed on a keyboard layout where such display is possible. +The `display` elements can be used to describe what is to be displayed on the keytops for various keys. For the most part, such explicit information is unnecessary since the `@to` element from the `keys/key` element will be used for keytop display. + +- Some characters, such as diacritics, do not display well on their own. +- Another useful scenario is where there are doubled diacritics, or multiple characters with spacing issues. +- Finally, the `display` element provides a way to specify the keytop for keys which do not otherwise produce output. Keys which switch layers using the `@layerId` attribute typically do not produce output. + +> Note: `displays` elements are designed to be shared across many different keyboard layout descriptions, and imported with `` where needed. + +#### Non-spacing marks on keytops + +For non-spacing marks, U+25CC `◌` is used as a base. It is an error to use a nonspacing character without a base in the `display` attribute. For example, `display="\u{0303}"` would produce an error. + +A key which outputs a combining tilde (U+0303) could be represented as either of the following: + +```xml + + +``` + +This way, a key which outputs a combining tilde (U+0303) will be represented as `◌̃` (a tilde on a dotted circle). + +Users of some scripts/languages may prefer a different base than U+25CC. See [``](#element-displayoptions). + **Syntax** ```xml - + ``` > @@ -1336,6 +1341,9 @@ The `display` element describes how a character, that has come from a `keys/key` One of the `output` or `id` attributes is required. +**Note**: There is currently no way to indicate a custom display for a key without output (i.e. without a `to=` attribute), nor is there a way to indicate that such a key has a standardized identity (e.g. that a key should be identified as a “Shift”). These may be addressed in future versions of this standard. + + _Attribute:_ `output` (optional) > Specifies the character or character sequence from the `keys/key` element that is to have a special display. @@ -1419,7 +1427,7 @@ This attribute may be escaped with `\u` notation, see [Escaping](#escaping). ### Element: forms -This element represents a set of `form` elements which define the layout of a particular hardware form. +This element contains a set of `form` elements which define the layout of a particular hardware form. > @@ -1437,10 +1445,10 @@ This element represents a set of `form` elements which define the layout of a pa ```xml
- +
- +
``` @@ -1449,7 +1457,7 @@ This element represents a set of `form` elements which define the layout of a pa ### Element: form -This element represents a specific `form` element which defines the layout of a particular hardware form. +This element contains a specific `form` element which defines the layout of a particular hardware form. > *Note:* Most keyboards will not need to use this element directly, and the CLDR repository will not accept keyboards which define a custom `form` element. This element is provided for two reasons: @@ -1507,7 +1515,7 @@ Here is a summary of the implied form elements. Keyboards included in the CLDR R ### Element: scanCodes -This element represents a keyboard row, and defines the scan codes for the non-frame keys in that row. +This element contains a keyboard row, and defines the scan codes for the non-frame keys in that row. > > @@ -1533,7 +1541,7 @@ This element represents a keyboard row, and defines the scan codes for the non-f ### Element: layers -This element represents a set of `layer` elements with a specific physical form factor, whether +This element contains a set of `layer` elements with a specific physical form factor, whether hardware or touch layout. > @@ -1554,7 +1562,7 @@ _Attribute:_ `form` (required) > or that the form is a `touch` layout. > > When using an on-screen touch keyboard, if the keyboard does not specify a `` -> element, a `` element can be used as an fallback alternative. +> element, a `` element can be used as an fallback alternative. > If there is no `hardware` form, the implementation may need > to choose a different keyboard file, or use some other fallback behavior when using a > hardware keyboard. @@ -1715,7 +1723,7 @@ A `row` element describes the keys that are present in the row of a keyboard. **Syntax** ```xml - + ``` > @@ -1782,7 +1790,7 @@ Note that the `id=` attribute value must be unique across all children of the `v > Occurrence: optional, multiple > -> This element represents a single string which is used by the [transform](#element-transform) elements for string matching and substitution, as well as by the [key](#element-key) and [display](#element-display) elements. +> This element contains a single string which is used by the [transform](#element-transform) elements for string matching and substitution, as well as by the [key](#element-key) and [display](#element-display) elements. _Attribute:_ `id` (required) @@ -1839,7 +1847,7 @@ These may be then used in multiple contexts: > Occurrence: optional, multiple > -> This element represents a set of strings used by the [transform](#element-transform) elements for string matching and substitution. +> This element contains a set of strings used by the [transform](#element-transform) elements for string matching and substitution. _Attribute:_ `id` (required) @@ -1897,7 +1905,7 @@ See [transform](#element-transform) for further details and syntax. > Occurrence: optional, multiple > -> This element represents a set, using a subset of the [UnicodeSet](tr35.md#Unicode_Sets) format, used by the [`transform`](#element-transform) elements for string matching and substitution. +> This element contains a set, using a subset of the [UnicodeSet](tr35.md#Unicode_Sets) format, used by the [`transform`](#element-transform) elements for string matching and substitution. > Note important restrictions on the syntax below. _Attribute:_ `id` (required) @@ -1946,8 +1954,10 @@ There can be multiple `` elements, but only one for each `type`. **Syntax** ```xml - - {a set of transform groups} + + + + … ``` @@ -1975,7 +1985,7 @@ There are other keying behaviors that are needed particularly in handing complex Markers are placeholders which record some state, but without producing normal visible text output. They were designed particularly to support dead-keys. -The marker ID is any valid `NMTOKEN` (But see [CLDR-17043](https://unicode-org.atlassian.net/browse/CLDR-17043) for future discussion.) +The marker ID is any valid `NMTOKEN`. Consider the following abbreviated example: @@ -2094,7 +2104,7 @@ Such implementations must take care to remove all such markers (see prior sectio > Occurrence: optional, multiple > -A `transformGroup` represents a set of transform elements or reorder elements. +A `transformGroup` contains a set of transform elements or reorder elements. Each `transformGroup` is processed entirely before proceeding to the next one. @@ -2111,7 +2121,7 @@ This is a `transformGroup` that consists of one or more [`transform`](#element-t ```xml - + @@ -2126,8 +2136,8 @@ This is a `transformGroup` that consists of one or more [`transform`](#element-t ```xml - - + + ``` @@ -2136,7 +2146,7 @@ This is a `transformGroup` that consists of one or more [`transform`](#element-t ### Element: transform -This element represents a single transform that may be performed using the keyboard layout. A transform is an element that specifies a set of conversions from sequences of code points into (one or more) other code points. For example, in most French keyboards hitting the `^` dead-key followed by the `e` key produces `ê`. +This element contains a single transform that may be performed using the keyboard layout. A transform is an element that specifies a set of conversions from sequences of code points into (one or more) other code points. For example, in most French keyboards hitting the `^` dead-key followed by the `e` key produces `ê`. Matches are processed against the "input context", a temporary buffer containing all relevant text up to the insertion point. If the user moves the insertion point, the input context is discarded and recreated from the application’s text buffer. Implementations may discard the input context at any time. @@ -2149,7 +2159,7 @@ All of the `transform` elements in a `transformGroup` are tested for a match, in **Syntax** ```xml - + ``` > @@ -2438,13 +2448,13 @@ The relative ordering of `` elements is not significant. ```xml - - + + ``` @@ -2704,7 +2714,7 @@ In text editing mode, different keyboard layouts may behave differently in the s ```xml - + ``` @@ -2796,41 +2806,6 @@ Beyond what the DTD imposes, certain other restrictions on the data are imposed Please note the constraints given under each element section above. DTD validation alone is not sufficient to verify a keyboard file. - - * * * ## Keyboard IDs @@ -2977,7 +2952,7 @@ This attribute value specifies a name for this overall test file. These names co > Occurrence: Optional, Multiple > -This element represents a repertoire test, to validate the available characters and their reachability. This test ensures that each of the specified characters is somehow typeable on the keyboard, after transforms have been applied. The characters in the repertoire will be matched against the complete set of possible generated outputs, post-transform, of all keys on the keyboard. +This element contains a repertoire test, to validate the available characters and their reachability. This test ensures that each of the specified characters is somehow typeable on the keyboard, after transforms have been applied. The characters in the repertoire will be matched against the complete set of possible generated outputs, post-transform, of all keys on the keyboard. _Attribute:_ `name` (required) @@ -3112,7 +3087,7 @@ Specifies the starting context. This text may be escaped with `\u` notation, see > Occurrence: Optional, Multiple > -This element represents a single keystroke or other gesture event, identified by a particular key element. +This element contains a single keystroke or other gesture event, identified by a particular key element. Optionally, one of the gesture attributes, either `flick`, `longPress`, or `tapCount` may be specified. If none of the gesture attribute values are specified, then a regular keypress is effected on the key. It is an error to specify more than one gesture attribute. @@ -3157,7 +3132,7 @@ This attribute value specifies that a multi-tap gesture should be performed on t > Occurrence: Optional, Multiple > -This element also represents an input event, except that the input is specified in terms of textual value rather than key or gesture identity. This element is particularly useful for testing transforms. +This element also contains an input event, except that the input is specified in terms of textual value rather than key or gesture identity. This element is particularly useful for testing transforms. Processing of the specified text continues with the transform and other elements before updating the test output buffer. @@ -3186,7 +3161,7 @@ This attribute value may be escaped with `\u` notation, see [Escaping](#escaping > Occurrence: Optional, Multiple > -This element represents a backspace action, as if the user typed the backspace key +This element contains a backspace action, as if the user typed the backspace key **Example** @@ -3205,7 +3180,7 @@ This element represents a backspace action, as if the user typed the backspace k > Occurrence: Optional, Multiple > -This element represents a check on the current output buffer. +This element contains a check on the current output buffer. _Attribute:_ `result` (required) @@ -3247,3 +3222,6 @@ This attribute value specifies the expected resultant text in a document after p Copyright © 2001–2024 Unicode, Inc. All Rights Reserved. The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical report. The Unicode [Terms of Use](https://www.unicode.org/copyright.html) apply. Unicode and the Unicode logo are trademarks of Unicode, Inc., and are registered in some jurisdictions. + + +[keyboard-workgroup]: https://cldr.unicode.org/index/keyboard-workgroup