diff --git a/unicodetools/data/ucd/dev/NamesList.html b/unicodetools/data/ucd/dev/NamesList.html index 8b1d51a67..31b0e52f5 100644 --- a/unicodetools/data/ucd/dev/NamesList.html +++ b/unicodetools/data/ucd/dev/NamesList.html @@ -1,66 +1,130 @@ - - "http://www.w3.org/TR/html4/loose.dtd"> - - + - - -UCD: Unicode NamesList File Format + +Unicode NamesList Format + - - - - - - - - - -
- - [Unicode] - - - Unicode Character Database -
 
+ +
 
+

UnicodeĀ® NamesList File Format

- +
- - + + - - + + - - + + - - + + - - + + - - + +
Revision15.0.0Revision15.1.0
AuthorsAsmus Freytag, Ken WhistlerAuthorsAsmus Freytag, Ken Whistler
Date2022-08-08Date2023-01-19
This Version - - http://www.unicode.org/Public/15.0.0/ucd/NamesList.htmlThis Version + + https://www.unicode.org/Public/15.1.0/ucd/NamesList.html
Previous Version - - http://www.unicode.org/Public/14.0.0/ucd/NamesList.htmlPrevious Version + + https://www.unicode.org/Public/15.0.0/ucd/NamesList.html
Latest Versionhttp://www.unicode.org/Public/UCD/latest/ucd/NamesList.htmlLatest Versionhttps://www.unicode.org/Public/UCD/latest/ucd/NamesList.html
@@ -71,13 +135,13 @@

Summary

Status

-

The file and the files described herein are part of the Unicode - Character Database (UCD). The Unicode +

The file and the files described herein are part of the Unicode + Character Database (UCD). The Unicode Terms of Use apply.

-
+
-

1.0 Introduction

+

1.0 Introduction

The Unicode name list file NamesList.txt (also NamesList.lst) is a plain text file used to drive the layout of the character code charts in the Unicode @@ -103,16 +167,16 @@

1.0 Introduction

information in the name list file that is not needed (and in fact removed during parsing) for the Unicode code charts.

-

With access to the layout program (Unibook) it is a simple matter of +

With access to the layout program (Unibook) it is a simple matter of creating name lists for the purpose of formatting working drafts or other documents containing proposed characters.

The content of the NamesList.txt file is optimized for code chart creation. Some information that can be inferred by the reader from context has been suppressed to make the code charts more readable. See the chapter on Code - Charts in the Unicode + Charts in the Unicode Standard.

-

1.1 NamesList File Overview

+

1.1 NamesList File Overview

The NamesList files are plain text files which in their most simple form look like this:

@@ -135,12 +199,12 @@

1.1 NamesList File Overview

The full syntax with all the options is provided in the following sections.

-

2.0 NamesList File Structure

+

2.0 NamesList File Structure

This section defines the overall file structure

-
NAMELIST:     TITLE_PAGE* EXTENDED_BLOCK*
-
+
NAMELIST:     TITLE_PAGE* EXTENDED_BLOCK*
+
 TITLE_PAGE:   TITLE 
 		| TITLE_PAGE SUBTITLE 
 		| TITLE_PAGE SUBHEADER 
@@ -181,8 +245,8 @@ 

2.0 NamesList File Structure

+ | CHAR_ENTRY VARIATION_LINE +

In other words:

@@ -202,7 +266,7 @@

2.0 NamesList File StructureThe conventional order of elements in a char entry: NAME_LINE, FORMALALIAS_LINE, ALIAS, COMMENT_LINE or NOTICE_LINE, CROSS_REFs, VARIATION_LINE, and optionally ending in either DECOMPOSITION or COMPAT_MAPPING is not enforced by the layout program - (Unibook). + (Unibook).

Except for CROSS_REF, NOTICE_LINE, SIDEBAR_LINE, EMPTY_LINE, IGNORED_LINE and FILE_COMMENT, none of these lines may @@ -224,7 +288,7 @@

2.0 NamesList File Structure

Several of these elements, while part of the formal definition of the file format, do not occur in final published versions of - NamesList.txt in the UCD.

+ NamesList.txt in the
UCD.

Blocks followed by Summaries

A block may be extended by a summary of standard variation sequences or selected alternate glyphs (or both) defined for characters in the block:

@@ -239,7 +303,7 @@

Blocks followed by Summaries

VARIATION_SUMMARY: VARIATION_SUBHEADER | VARIATION_SUMMARY SUMMARY_LINE - + MIXED_SUMMARY: MIXED_SUBHEADER | MIXED_SUMMARY SUMMARY_LINE @@ -264,7 +328,7 @@

Blocks followed by Summaries

information via SUBHEADER and NOTICE elements. The final published version of the Unicode names list is machine generated and will always explicitly provide any summary subheaders.

-

2.1 NamesList File Elements

+

2.1 NamesList File Elements

This section provides the details of the syntax for the individual elements.

@@ -275,7 +339,7 @@

2.1 NamesList File Elements< // followed by the name as given in NAME | CHAR TAB "<" LCNAME ">" LF - // Control and noncharacters use this form of + // Control and noncharacters use this form of // lowercase, bracketed pseudo character name | CHAR TAB NAME SP COMMENT LF @@ -284,11 +348,11 @@

2.1 NamesList File Elements< | CHAR TAB "<" LCNAME ">" SP COMMENT LF // Control and noncharacters may also have comments - + RESERVED_LINE: CHAR TAB "<reserved>" LF // The CHAR is echoed followed by an icon for the // reserved character and a fixed string e.g. "<reserved>" - + COMMENT_LINE: TAB "*" SP EXPAND_LINE // * is replaced by BULLET, output line as comment @@ -299,15 +363,15 @@

2.1 NamesList File Elements< // Replace = by itself, output line as alias FORMALALIAS_LINE: - TAB "%" SP NAME LF + TAB "%" SP NAME LF // Replace % by U+203B, output line as formal alias -CROSS_REF: TAB "x" SP CHAR SP LCNAME LF +CROSS_REF: TAB "x" SP CHAR SP LCNAME LF | TAB "x" SP CHAR SP "<" LCNAME ">" LF // x is replaced by a right arrow - | TAB "x" SP "(" LCNAME SP "-" SP CHAR ")" LF - | TAB "x" SP "(" "<" LCNAME ">" SP "-" SP CHAR ")" LF + | TAB "x" SP "(" LCNAME SP "-" SP CHAR ")" LF + | TAB "x" SP "(" "<" LCNAME ">" SP "-" SP CHAR ")" LF // x is replaced by a right arrow; // (second type as used for control and noncharacters) @@ -321,11 +385,11 @@

2.1 NamesList File Elements< // and is used for ideographs) VARIATION_LINE: TAB "~" SP CHAR VARSEL SP LABEL LF - | TAB "~" SP CHAR VARSEL SP LABEL "(" LCTAG ")"LF + | TAB "~" SP CHAR VARSEL SP LABEL "(" LCTAG ")" LF // output standardized variation sequence or simply the char code in case of alternate // glyphs, followed by the alternate glyph or variation glyph and the label and context -FILE_COMMENT: ";" LINE +FILE_COMMENT: ";" LINE EMPTY_LINE: LF // Empty and ignored lines as well as @@ -337,15 +401,15 @@

2.1 NamesList File Elements< SIDEBAR_LINE: ";;" LINE // Output LINE as marginal note -DECOMPOSITION: TAB ":" SP EXPAND_LINE - | TAB ":" SP "<" TAG ">" SP EXPAND_LINE +DECOMPOSITION: TAB ":" SP EXPAND_LINE + | TAB ":" SP "<" TAG ">" SP EXPAND_LINE // Replace ':' by EQUIV, expand line into decomposition // The <tag> gives optional information, - // e.g., about composition exclusion. + // e.g., about composition exclusion. // by convention the tag has initial lowercase -COMPAT_MAPPING: TAB "#" SP EXPAND_LINE - | TAB "#" SP "<" TAG ">" SP EXPAND_LINE +COMPAT_MAPPING: TAB "#" SP EXPAND_LINE + | TAB "#" SP "<" TAG ">" SP EXPAND_LINE // Replace '#' by APPROX, output line as mapping // The <tag> is the optional compatibility decomposition tag. // by convention the tag has initial lowercase @@ -361,44 +425,44 @@

2.1 NamesList File Elements< // a character code apply to the page/block/column // and are italicized, but not indented -TITLE: "@@@" TAB LINE +TITLE: "@@@" TAB LINE // Output LINE as text // Title is used in page headers -SUBTITLE: "@@@+" TAB LINE +SUBTITLE: "@@@+" TAB LINE // Output LINE as subtitle -SUBHEADER: "@" TAB LINE +SUBHEADER: "@" TAB LINE // Output LINE as column header -VARIATION_SUBHEADER: "@~" TAB LINE +VARIATION_SUBHEADER: "@~" TAB LINE // Output LINE as column header (summary subheader) - | "@~" + | "@~" LF // Output a default standard variation sequences summary subheader - | "@~" TAB "!" + | "@~" TAB "!" LF // Suppress output of a default standard variant sequences summary subheader // and disable display of summary - | "@~" TAB "!" VARSEL_LIST + | "@~" TAB "!" VARSEL_LIST LF | "@~" TAB "!" VARSEL_LIST LINE // Output a standard summary subheader, using default or LINE respectively // Suppress any std variation sequences using selectors from the list - -ALTGLYPH_SUBHEADER: "@@~" TAB LINE + +ALTGLYPH_SUBHEADER: "@@~" TAB LINE // Output LINE as column header (summary subheader) - | "@@~" + | "@@~" LF // Output a default alternate glyph summary subheader - | "@@~" TAB "!" + | "@@~" TAB "!" LF // Suppress output of a default alternate glyph summary subheader // and disable display of summary -MIXED_SUBHEADER: "@@@~" TAB LINE +MIXED_SUBHEADER: "@@@~" TAB LINE // Output LINE as column header (summary subheader) - | "@@@~" + | "@@@~" LF // Output a default combined variation and alternate glyph summary subheader - | "@@@~" TAB "!" + | "@@@~" TAB "!" LF // Suppress output of a default alternate glyph summary subheader // and disable display of summary - | "@@@~" TAB "!" VARSEL_LIST + | "@@@~" TAB "!" VARSEL_LIST LF | "@@@~" TAB "!" VARSEL_LIST LINE // Output a combined summary subheader, using default or LINE respectively // Suppress any std variation sequences using selectors from the list @@ -406,14 +470,14 @@

2.1 NamesList File Elements< BLOCKHEADER: "@@" TAB BLOCKSTART TAB BLOCKNAME TAB BLOCKEND LF // Cause a page break and optional // blank page, then output one or more charts - // followed by the list of character names. + // followed by the list of character names. // Use BLOCKSTART and BLOCKEND to define // what characters belong to a block. // Use BLOCKNAME in page and table headers BLOCKNAME: LABEL - | LABEL SP "(" LABEL ")" - // If an alternate label is present it replaces + | LABEL SP "(" LABEL ")" + // If an alternate label is present it replaces // the BLOCKNAME when an ISO-style names list is // laid out; it is ignored in the Unicode charts @@ -423,11 +487,11 @@

2.1 NamesList File Elements< INDEX_TAB: "@@+" // Start a new index tab at latest BLOCKSTART EXPAND_LINE: {ESC_CHAR | CHAR | STRING | ESC +}+ LF - // Instances of CHAR (see Notes) are replaced by + // Instances of CHAR (see Notes) are replaced by // CHAR NBSP x NBSP where x is the single Unicode // character corresponding to CHAR. // If character is combining, it is replaced with - // CHAR NBSP <circ> x NBSP where <circ> is the + // CHAR NBSP <circ> x NBSP where <circ> is the // dotted circle @@ -451,7 +515,7 @@

2.1 NamesList File Elements< -

2.2 NamesList File Primitives

+

2.2 NamesList File Primitives

The following are the primitives and terminals for the NamesList syntax.

@@ -461,15 +525,14 @@

2.2 NamesList File Primitive | "*" NAME: <sequence of uppercase ASCII letters, digits, space and hyphen> -LCNAME: <sequence of lowercase ASCII letters, digits, space and hyphen> - | LCNAME "-" CHAR +LCNAME: <sequence of lowercase ASCII letters, digits, space and hyphen> ("-" CHAR)? TAG: <sequence of ASCII letters> LCTAG: <sequence of lowercase ASCII letters> STRING: <sequence of characters in the range U+0020..U+02FF, except controls> LABEL: <sequence of characters in the range U+0020..U+02FF, except controls, "(" or ")"> VARSEL: CHAR - | ALT ( "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" ) + | "ALT" ( "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" ) VARSEL_LIST: "{" CHAR_LIST "}" CHAR_LIST: CHAR | CHAR_LIST SP CHAR @@ -477,13 +540,13 @@

2.2 NamesList File Primitive | X X X X X | X X X X X X X: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"A"|"B"|"C"|"D"|"E"|"F" -ESC_CHAR: ESC CHAR -ESC: "\" +ESC_CHAR: ESC CHAR +ESC: "\" // Special semantics of backslash (\) are supported // only in EXPAND_LINE. -TAB: <sequence of one or more ASCII tab characters 0x09> +TAB: <sequence of one or more ASCII tab characters 0x09> SP: <ASCII 20> -LF: <any sequence of ASCII 0A and 0D> +LF: <any sequence of a single ASCII 0A or 0D, or both>

Notes:

@@ -524,7 +587,7 @@

2.2 NamesList File Primitive Otherwise, characters in the range U+0020..U+02FF are allowed in STRING or LABEL elements, and elements derived from them.
  • The code chart layout program - (Unibook) + (Unibook) can accept files in several other formats. These include little-endian UTF-16, prefixed with a BOM, or UTF-8 prefixed with the UTF-8 BOM.
  • While the format allows multiple <tab> characters, by convention the @@ -535,10 +598,19 @@

    2.2 NamesList File Primitive being corrected, to retain stability of the published versions. Anyone writing a parser for older versions of this file may need to be prepared to handle such exceptions.

  • +
  • Lines are terminated by \r, \n, \r\n or \n\r. Repeated terminators imply empty lines, e.g. \r\r\n is treated as 2 lines, as is \r\n\r\n.
  • The final LF in the file must be present.
  • -

    Modifications

    +

    Modifications

    +

    Version 15.1.0

    +
      +
    • Reissued for Unicode 15.0.0.
    • +
    • Corrected and clarified the BNF statement of nameslist syntax.
    • +
    • Some literals had not been quoted, some productions were missing the trailing LF
    • +
    • The LF and LCNAME productions were clarified
    • +
    • Updated to HTML5
    • +

    Version 15.0.0

    • Reissued for Unicode 15.0.0.
    • @@ -672,19 +744,15 @@

      Modifications

      • Use of 4-6 digit hex notation is now supported.
      -
      -
      -
      - - - - -
      - Access to Copyright and terms of use
      -
      -

    +
    +
    + + Access to Copyright and terms of use +
    +