Skip to content

Latest commit

 

History

History
54 lines (32 loc) · 3.45 KB

Entity and numeric character references.md

File metadata and controls

54 lines (32 loc) · 3.45 KB

{{($page.frontmatter.start = 321) ? null : null}}

Entity and numeric character references

Valid HTML entity references and numeric character references can be used in place of the corresponding Unicode character, with the following exceptions:

  • Entity and character references are not recognized in code blocks and code spans.

  • Entity and character references cannot stand in place of special characters that define structural elements in CommonMark. For example, although * can be used in place of a literal * character, * cannot replace * in emphasis delimiters, bullet list markers, or thematic breaks.

Conforming CommonMark parsers need not store information about whether a particular character was represented in the source using a Unicode character or an entity reference.

Entity references consist of & + any of the valid HTML5 entity names + ;. The document https://html.spec.whatwg.org/multipage/entities.json is used as an authoritative source for the valid entity references and their corresponding code points.

Decimal numeric character consist of &# + a string of 1–7 arabic digits + ;. A numeric character reference is parsed as the corresponding Unicode character. Invalid Unicode code points will be replaced by the REPLACEMENT CHARACTER (U+FFFD). For security reasons, the code point U+0000 will also be replaced by U+FFFD.

Hexadecimal numeric character consist of &# + either X or x + a string of 1-6 hexadecimal digits + ;. They too are parsed as the corresponding Unicode character (this time specified with a hexadecimal numeral instead of decimal).

Here are some nonentities:

Although HTML5 does accept some entity references without a trailing semicolon (such as &copy), these are not recognized here, because it makes the grammar too ambiguous:

Strings that are not on the list of HTML5 named entities are not recognized as entity references either:

Entity and numeric character references are recognized in any context besides code spans or code blocks, including URLs, link titles, and fenced code block info strings:

Entity and numeric character references are treated as literal text in code spans and code blocks:

=

Entity and numeric character references cannot be used in place of symbols indicating structure in CommonMark documents.