+
@@ -71,13 +135,13 @@ Summary
Status
- The file and the files described herein are part of the Unicode
- Character Database (UCD). The Unicode
+ The file and the files described herein are part of the Unicode
+ Character Database (UCD). The Unicode
Terms of Use apply.
-
+
-
+
The Unicode name list file NamesList.txt (also NamesList.lst) is a plain
text file used to drive the layout of the character code charts in the Unicode
@@ -103,16 +167,16 @@
information in the name list file that is not needed (and in fact removed
during parsing) for the Unicode code charts.
-With access to the layout program (Unibook) it is a simple matter of
+
With access to the layout program (Unibook) it is a simple matter of
creating name lists for the purpose of formatting working drafts or other documents containing
proposed characters.
The content of the NamesList.txt file is optimized for code chart creation.
Some information that can be inferred by the reader from context has been
suppressed to make the code charts more readable. See the chapter on Code
- Charts in the Unicode
+ Charts in the Unicode
Standard.
-
+
The NamesList files are plain text files which in their most simple form look
like this:
@@ -135,12 +199,12 @@
The full syntax with all the options is provided in the following sections.
-
+
This section defines the overall file structure
-NAMELIST: TITLE_PAGE* EXTENDED_BLOCK*
-
+NAMELIST: TITLE_PAGE* EXTENDED_BLOCK*
+
TITLE_PAGE: TITLE
| TITLE_PAGE SUBTITLE
| TITLE_PAGE SUBHEADER
@@ -181,8 +245,8 @@
+ | CHAR_ENTRY VARIATION_LINE
+
In other words:
@@ -202,7 +266,7 @@
Except for CROSS_REF, NOTICE_LINE, SIDEBAR_LINE, EMPTY_LINE, IGNORED_LINE and
FILE_COMMENT, none of these lines may
@@ -224,7 +288,7 @@
A block may be extended by a summary of standard variation sequences or selected alternate glyphs (or both) defined for characters in the block:
@@ -239,7 +303,7 @@ Blocks followed by Summaries
VARIATION_SUMMARY: VARIATION_SUBHEADER
| VARIATION_SUMMARY SUMMARY_LINE
-
+
MIXED_SUMMARY: MIXED_SUBHEADER
| MIXED_SUMMARY SUMMARY_LINE
@@ -264,7 +328,7 @@ Blocks followed by Summaries
information via SUBHEADER and NOTICE elements. The final published version of the Unicode names list
is machine generated and will always explicitly provide any summary subheaders.
-2.1 NamesList File Elements
+2.1 NamesList File Elements
This section provides the details of the syntax for the individual elements.
@@ -275,7 +339,7 @@ 2.1 NamesList File Elements<
// followed by the name as given in NAME
| CHAR TAB "<" LCNAME ">" LF
- // Control and noncharacters use this form of
+ // Control and noncharacters use this form of
// lowercase, bracketed pseudo character name
| CHAR TAB NAME SP COMMENT LF
@@ -284,11 +348,11 @@ 2.1 NamesList File Elements<
| CHAR TAB "<" LCNAME ">" SP COMMENT LF
// Control and noncharacters may also have comments
-
+
RESERVED_LINE: CHAR TAB "<reserved>" LF
// The CHAR is echoed followed by an icon for the
// reserved character and a fixed string e.g. "<reserved>"
-
+
COMMENT_LINE: TAB "*" SP EXPAND_LINE
// * is replaced by BULLET, output line as comment
@@ -299,15 +363,15 @@ 2.1 NamesList File Elements<
// Replace = by itself, output line as alias
FORMALALIAS_LINE:
- TAB "%" SP NAME LF
+ TAB "%" SP NAME LF
// Replace % by U+203B, output line as formal alias
-CROSS_REF: TAB "x" SP CHAR SP LCNAME LF
+CROSS_REF: TAB "x" SP CHAR SP LCNAME LF
| TAB "x" SP CHAR SP "<" LCNAME ">" LF
// x is replaced by a right arrow
- | TAB "x" SP "(" LCNAME SP "-" SP CHAR ")" LF
- | TAB "x" SP "(" "<" LCNAME ">" SP "-" SP CHAR ")" LF
+ | TAB "x" SP "(" LCNAME SP "-" SP CHAR ")" LF
+ | TAB "x" SP "(" "<" LCNAME ">" SP "-" SP CHAR ")" LF
// x is replaced by a right arrow;
// (second type as used for control and noncharacters)
@@ -321,11 +385,11 @@ 2.1 NamesList File Elements<
// and is used for ideographs)
VARIATION_LINE: TAB "~" SP CHAR VARSEL SP LABEL LF
- | TAB "~" SP CHAR VARSEL SP LABEL "(" LCTAG ")"LF
+ | TAB "~" SP CHAR VARSEL SP LABEL "(" LCTAG ")" LF
// output standardized variation sequence or simply the char code in case of alternate
// glyphs, followed by the alternate glyph or variation glyph and the label and context
-FILE_COMMENT: ";" LINE
+FILE_COMMENT: ";" LINE
EMPTY_LINE: LF
// Empty and ignored lines as well as
@@ -337,15 +401,15 @@ 2.1 NamesList File Elements<
SIDEBAR_LINE: ";;" LINE
// Output LINE as marginal note
-DECOMPOSITION: TAB ":" SP EXPAND_LINE
- | TAB ":" SP "<" TAG ">" SP EXPAND_LINE
+DECOMPOSITION: TAB ":" SP EXPAND_LINE
+ | TAB ":" SP "<" TAG ">" SP EXPAND_LINE
// Replace ':' by EQUIV, expand line into decomposition
// The <tag> gives optional information,
- // e.g., about composition exclusion.
+ // e.g., about composition exclusion.
// by convention the tag has initial lowercase
-COMPAT_MAPPING: TAB "#" SP EXPAND_LINE
- | TAB "#" SP "<" TAG ">" SP EXPAND_LINE
+COMPAT_MAPPING: TAB "#" SP EXPAND_LINE
+ | TAB "#" SP "<" TAG ">" SP EXPAND_LINE
// Replace '#' by APPROX, output line as mapping
// The <tag> is the optional compatibility decomposition tag.
// by convention the tag has initial lowercase
@@ -361,44 +425,44 @@ 2.1 NamesList File Elements<
// a character code apply to the page/block/column
// and are italicized, but not indented
-TITLE: "@@@" TAB LINE
+TITLE: "@@@" TAB LINE
// Output LINE as text
// Title is used in page headers
-SUBTITLE: "@@@+" TAB LINE
+SUBTITLE: "@@@+" TAB LINE
// Output LINE as subtitle
-SUBHEADER: "@" TAB LINE
+SUBHEADER: "@" TAB LINE
// Output LINE as column header
-VARIATION_SUBHEADER: "@~" TAB LINE
+VARIATION_SUBHEADER: "@~" TAB LINE
// Output LINE as column header (summary subheader)
- | "@~"
+ | "@~" LF
// Output a default standard variation sequences summary subheader
- | "@~" TAB "!"
+ | "@~" TAB "!" LF
// Suppress output of a default standard variant sequences summary subheader
// and disable display of summary
- | "@~" TAB "!" VARSEL_LIST
+ | "@~" TAB "!" VARSEL_LIST LF
| "@~" TAB "!" VARSEL_LIST LINE
// Output a standard summary subheader, using default or LINE respectively
// Suppress any std variation sequences using selectors from the list
-
-ALTGLYPH_SUBHEADER: "@@~" TAB LINE
+
+ALTGLYPH_SUBHEADER: "@@~" TAB LINE
// Output LINE as column header (summary subheader)
- | "@@~"
+ | "@@~" LF
// Output a default alternate glyph summary subheader
- | "@@~" TAB "!"
+ | "@@~" TAB "!" LF
// Suppress output of a default alternate glyph summary subheader
// and disable display of summary
-MIXED_SUBHEADER: "@@@~" TAB LINE
+MIXED_SUBHEADER: "@@@~" TAB LINE
// Output LINE as column header (summary subheader)
- | "@@@~"
+ | "@@@~" LF
// Output a default combined variation and alternate glyph summary subheader
- | "@@@~" TAB "!"
+ | "@@@~" TAB "!" LF
// Suppress output of a default alternate glyph summary subheader
// and disable display of summary
- | "@@@~" TAB "!" VARSEL_LIST
+ | "@@@~" TAB "!" VARSEL_LIST LF
| "@@@~" TAB "!" VARSEL_LIST LINE
// Output a combined summary subheader, using default or LINE respectively
// Suppress any std variation sequences using selectors from the list
@@ -406,14 +470,14 @@ 2.1 NamesList File Elements<
BLOCKHEADER: "@@" TAB BLOCKSTART TAB BLOCKNAME TAB BLOCKEND LF
// Cause a page break and optional
// blank page, then output one or more charts
- // followed by the list of character names.
+ // followed by the list of character names.
// Use BLOCKSTART and BLOCKEND to define
// what characters belong to a block.
// Use BLOCKNAME in page and table headers
BLOCKNAME: LABEL
- | LABEL SP "(" LABEL ")"
- // If an alternate label is present it replaces
+ | LABEL SP "(" LABEL ")"
+ // If an alternate label is present it replaces
// the BLOCKNAME when an ISO-style names list is
// laid out; it is ignored in the Unicode charts
@@ -423,11 +487,11 @@ 2.1 NamesList File Elements<
INDEX_TAB: "@@+" // Start a new index tab at latest BLOCKSTART
EXPAND_LINE: {ESC_CHAR | CHAR | STRING | ESC +}+ LF
- // Instances of CHAR (see Notes) are replaced by
+ // Instances of CHAR (see Notes) are replaced by
// CHAR NBSP x NBSP where x is the single Unicode
// character corresponding to CHAR.
// If character is combining, it is replaced with
- // CHAR NBSP <circ> x NBSP where <circ> is the
+ // CHAR NBSP <circ> x NBSP where <circ> is the
// dotted circle
@@ -451,7 +515,7 @@ 2.1 NamesList File Elements<
-2.2 NamesList File Primitives
+2.2 NamesList File Primitives
The following are the primitives and terminals for the NamesList syntax.
@@ -461,15 +525,14 @@ 2.2 NamesList File Primitive
| "*"
NAME: <sequence of uppercase ASCII letters, digits, space and hyphen>
-LCNAME: <sequence of lowercase ASCII letters, digits, space and hyphen>
- | LCNAME "-" CHAR
+LCNAME: <sequence of lowercase ASCII letters, digits, space and hyphen> ("-" CHAR)?
TAG: <sequence of ASCII letters>
LCTAG: <sequence of lowercase ASCII letters>
STRING: <sequence of characters in the range U+0020..U+02FF, except controls>
LABEL: <sequence of characters in the range U+0020..U+02FF, except controls, "(" or ")">
VARSEL: CHAR
- | ALT ( "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" )
+ | "ALT" ( "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" )
VARSEL_LIST: "{" CHAR_LIST "}"
CHAR_LIST: CHAR
| CHAR_LIST SP CHAR
@@ -477,13 +540,13 @@ 2.2 NamesList File Primitive
| X X X X X
| X X X X X X
X: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"A"|"B"|"C"|"D"|"E"|"F"
-ESC_CHAR: ESC CHAR
-ESC: "\"
+ESC_CHAR: ESC CHAR
+ESC: "\"
// Special semantics of backslash (\) are supported
// only in EXPAND_LINE.
-TAB: <sequence of one or more ASCII tab characters 0x09>
+TAB: <sequence of one or more ASCII tab characters 0x09>
SP: <ASCII 20>
-LF: <any sequence of ASCII 0A and 0D>
+LF: <any sequence of a single ASCII 0A or 0D, or both>
Notes:
@@ -524,7 +587,7 @@ 2.2 NamesList File Primitive
Otherwise, characters in the range U+0020..U+02FF
are allowed in STRING or LABEL elements, and elements derived from them.
The code chart layout program
- (Unibook)
+ (Unibook)
can accept files in several other formats. These include little-endian UTF-16,
prefixed with a BOM, or UTF-8 prefixed with the UTF-8 BOM.
While the format allows multiple <tab> characters, by convention the
@@ -535,10 +598,19 @@ 2.2 NamesList File Primitive
being corrected, to retain stability of the published versions. Anyone
writing a parser for older versions of this file may need to be prepared to
handle such exceptions.
+ Lines are terminated by \r, \n, \r\n or \n\r. Repeated terminators imply empty lines, e.g. \r\r\n is treated as 2 lines, as is \r\n\r\n.
The final LF in the file must be present.
- Modifications
+ Modifications
+ Version 15.1.0
+
+ - Reissued for Unicode 15.0.0.
+- Corrected and clarified the BNF statement of nameslist syntax.
+- Some literals had not been quoted, some productions were missing the trailing LF
+- The LF and LCNAME productions were clarified
+- Updated to HTML5
+
Version 15.0.0
- Reissued for Unicode 15.0.0.
@@ -672,19 +744,15 @@ Modifications
- Use of 4-6 digit hex notation is now supported.
-
-
-
-
-
-
- |
-
-
-
-
+
+
+
+
+
+