Skip to content
/ mdast Public
forked from syntax-tree/mdast

Markdown Abstract Syntax Tree format

Notifications You must be signed in to change notification settings

royoan/mdast

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 

Repository files navigation

mdast

Markdown Abstract Syntax Tree.


mdast is a specification for representing Markdown in a syntax tree. It implements the unist spec. It can represent several flavours of Markdown, such as CommonMark, and GitHub Flavored Markdown extensions.

This document may not be released. See releases for released documents. The latest released version is 3.0.0.

Contents

Introduction

This document defines a format for representing Markdown as an abstract syntax tree. Development of mdast started in July 2014, in remark, before unist existed. This specification is written in a Web IDL-like grammar.

Where this specification fits

mdast extends unist, a format for syntax trees, to benefit from its ecosystem of utilities.

mdast relates to JavaScript in that it has a rich ecosystem of utilities for working with compliant syntax trees in JavaScript. However, mdast is not limited to JavaScript and can be used in other programming languages.

mdast relates to the unified and remark projects in that mdast syntax trees are used throughout their ecosystems.

Nodes

Parent

interface Parent <: UnistParent {
  children: [Content]
}

Parent (UnistParent) represents a node in mdast containing other nodes (said to be children).

Its content is limited to only other mdast content.

Literal

interface Literal <: UnistLiteral {
  value: string
}

Literal (UnistLiteral) represents a node in mdast containing a value.

Its value field is a string.

Root

interface Root <: Parent {
  type: "root"
}

Root (Parent) represents a document.

Root can be used as the root of a tree, never as a child. Its content model is not limited to top-level content, but can contain any content with the restriction that all content must be of the same category.

Paragraph

interface Paragraph <: Parent {
  type: "paragraph"
  children: [PhrasingContent]
}

Paragraph (Parent) represents a unit of discourse dealing with a particular point or idea.

Paragraph can be used where block content is expected. Its content model is phrasing content.

For example, the following Markdown:

Alpha bravo charlie.

Yields:

{
  type: 'paragraph',
  children: [{type: 'text', value: 'Alpha bravo charlie.'}]
}

Heading

interface Heading <: Parent {
  type: "heading"
  depth: 1 <= number <= 6
  children: [PhrasingContent]
}

Heading (Parent) represents a heading of a section.

Heading can be used where block content is expected. Its content model is phrasing content.

A depth field must be present. A value of 1 is said to be the highest rank and 6 the lowest.

For example, the following Markdown:

# Alpha

Yields:

{
  type: 'heading',
  depth: 1,
  children: [{type: 'text', value: 'Alpha'}]
}

ThematicBreak

interface ThematicBreak <: Node {
  type: "thematicBreak"
}

ThematicBreak (Node) represents a thematic break, such as a scene change in a story, a transition to another topic, or a new document.

ThematicBreak can be used where block content is expected. It has no content model.

For example, the following Markdown:

***

Yields:

{type: 'thematicBreak'}

Blockquote

interface Blockquote <: Parent {
  type: "blockquote"
  children: [BlockContent]
}

Blockquote (Parent) represents a section quoted from somewhere else.

Blockquote can be used where block content is expected. Its content model is also block content.

For example, the following Markdown:

> Alpha bravo charlie.

Yields:

{
  type: 'blockquote',
  children: [{
    type: 'paragraph',
    children: [{type: 'text', value: 'Alpha bravo charlie.'}]
  }]
}

List

interface List <: Parent {
  type: "list"
  ordered: boolean?
  start: number?
  spread: boolean?
  children: [ListContent]
}

List (Parent) represents a list of items.

List can be used where block content is expected. Its content model is list content.

An ordered field can be present. It represents that the items have been intentionally ordered (when true), or that the order of items is not important (when false or not present).

A start field can be present. It represents, when the ordered field is true, the starting number of the list.

A spread field can be present. It represents that one or more of its children are separated with a blank line from its siblings (when true), or not (when false or not present).

For example, the following Markdown:

1. [x] foo

Yields:

{
  type: 'list',
  ordered: true,
  start: 1,
  spread: false,
  children: [{
    type: 'listItem',
    checked: true,
    spread: false,
    children: [{
      type: 'paragraph',
      children: [{type: 'text', value: 'foo'}]
    }]
  }]
}

ListItem

interface ListItem <: Parent {
  type: "listItem"
  checked: boolean?
  spread: boolean?
  children: [BlockContent]
}

ListItem (Parent) represents an item in a List.

ListItem can be used where list content is expected. Its content model is block content.

A checked field can be present. It represents whether the item is done (when true), not done (when false), or indeterminate or not applicable (when null or not present).

A spread field can be present. It represents that the item contains two or more children separated by a blank line (when true), or not (when false or not present).

For example, the following Markdown:

* [x] bar

Yields:

{
  type: 'listItem',
  checked: true,
  spread: false,
  children: [{
    type: 'paragraph',
    children: [{type: 'text', value: 'bar'}]
  }]
}

Table

interface Table <: Parent {
  type: "table"
  align: [alignType]?
  children: [TableContent]
}

Table (Parent) represents two-dimensional data.

Table can be used where block content is expected. Its content model is table content.

The head of the node represents the labels of the columns.

An align field can be present. If present, it must be a list of alignTypes. It represents how cells in columns are aligned.

For example, the following Markdown:

| foo | bar |
| :-- | :-: |
| baz | qux |

Yields:

{
  type: 'table',
  align: ['left', 'center'],
  children: [
    {
      type: 'tableRow',
      children: [
        {
          type: 'tableCell',
          children: [{type: 'text', value: 'foo'}]
        },
        {
          type: 'tableCell',
          children: [{type: 'text', value: 'bar'}]
        }
      ]
    },
    {
      type: 'tableRow',
      children: [
        {
          type: 'tableCell',
          children: [{type: 'text', value: 'baz'}]
        },
        {
          type: 'tableCell',
          children: [{type: 'text', value: 'qux'}]
        }
      ]
    }
  ]
}

TableRow

interface TableRow <: Parent {
  type: "tableRow"
  children: [RowContent]
}

TableRow (Parent) represents a row of cells in a table.

TableRow can be used where table content is expected. Its content model is row content.

If the node is a head, it represents the labels of the columns for its parent Table.

For an example, see Table.

TableCell

interface TableCell <: Parent {
  type: "tableCell"
  children: [PhrasingContent]
}

TableCell (Parent) represents a header cell in a Table, if its parent is a head, or a data cell otherwise.

TableCell can be used where row content is expected. Its content model is phrasing content excluding Break nodes.

For an example, see Table.

HTML

interface HTML <: Literal {
  type: "html"
}

HTML (Literal) represents a fragment of raw HTML.

HTML can be used where block or phrasing content is expected. Its content is represented by its value field.

HTML nodes do not have the restriction of being valid or complete HTML ([HTML]) constructs.

For example, the following Markdown:

<div>

Yields:

{type: 'html', value: '<div>'}

Code

interface Code <: Literal {
  type: "code"
  lang: string?
  meta: string?
}

Code (Literal) represents a block of preformatted text, such as ASCII art or computer code.

Code can be used where block content is expected. Its content is represented by its value field.

This node relates to the phrasing content concept InlineCode.

A lang field can be present. It represents the language of computer code being marked up.

If the lang field is present, a meta field can be present. It represents custom information relating to the node.

For example, the following Markdown:

    foo()

Yields:

{
  type: 'code',
  lang: null,
  meta: null,
  value: 'foo()'
}

And the following Markdown:

```js highlight-line="2"
foo()
bar()
baz()
```

Yields:

{
  type: 'code',
  lang: 'javascript',
  meta: 'highlight-line="2"',
  value: 'foo()\nbar()\nbaz()'
}

YAML

interface YAML <: Literal {
  type: "yaml"
}

YAML (Literal) represents a collection of metadata for the document in the YAML ([YAML]) data serialisation language.

YAML can be used where frontmatter content is expected. Its content is represented by its value field.

For example, the following Markdown:

---
foo: bar
---

Yields:

{type: 'yaml', value: 'foo: bar'}

Definition

interface Definition <: Node {
  type: "definition"
}

Definition includes Association
Definition includes Resource

Definition (Node) represents a resource.

Definition can be used where definition content is expected. It has no content model.

Definition includes the mixins Association and Resource.

Definition should be associated with LinkReferences and ImageReferences.

For example, the following Markdown:

[Alpha]: https://example.com

Yields:

{
  type: 'definition',
  identifier: 'alpha',
  label: 'Alpha',
  url: 'https://example.com',
  title: null
}

FootnoteDefinition

interface FootnoteDefinition <: Parent {
  type: "footnoteDefinition"
  children: [BlockContent]
}

FootnoteDefinition includes Association

FootnoteDefinition (Parent) represents content relating to the document that is outside its flow.

FootnoteDefinition can be used where definition content is expected. Its content model is block content.

FootnoteDefinition includes the mixin Association.

FootnoteDefinition should be associated with FootnoteReferences.

For example, the following Markdown:

[^alpha]: bravo and charlie.

Yields:

{
  type: 'footnoteDefinition',
  identifier: 'alpha',
  label: 'alpha',
  children: [{
    type: 'paragraph',
    children: [{type: 'text', value: 'bravo and charlie.'}]
  }]
}

Text

interface Text <: Literal {
  type: "text"
}

Text (Literal) represents everything that is just text.

Text can be used where phrasing content is expected. Its content is represented by its value field.

For example, the following Markdown:

Alpha bravo charlie.

Yields:

{type: 'text', value: 'Alpha bravo charlie.'}

Emphasis

interface Emphasis <: Parent {
  type: "emphasis"
  children: [PhrasingContent]
}

Emphasis (Parent) represents stress emphasis of its contents.

Emphasis can be used where phrasing content is expected. Its content model is also phrasing content.

For example, the following Markdown:

*alpha* _bravo_

Yields:

{
  type: 'paragraph',
  children: [
    {
      type: 'emphasis',
      children: [{type: 'text', value: 'alpha'}]
    },
    {type: 'text', value: ' '},
    {
      type: 'emphasis',
      children: [{type: 'text', value: 'bravo'}]
    }
  ]
}

Strong

interface Strong <: Parent {
  type: "strong"
  children: [PhrasingContent]
}

Strong (Parent) represents strong importance, seriousness, or urgency for its contents.

Strong can be used where phrasing content is expected. Its content model is also phrasing content.

For example, the following Markdown:

**alpha** __bravo__

Yields:

{
  type: 'paragraph',
  children: [
    {
      type: 'strong',
      children: [{type: 'text', value: 'alpha'}]
    },
    {type: 'text', value: ' '},
    {
      type: 'strong',
      children: [{type: 'text', value: 'bravo'}]
    }
  ]
}

Delete

interface Delete <: Parent {
  type: "delete"
  children: [PhrasingContent]
}

Delete (Parent) represents contents that are no longer accurate or no longer relevant.

Delete can be used where phrasing content is expected. Its content model is also phrasing content.

For example, the following Markdown:

~~alpha~~

Yields:

{
  type: 'delete',
  children: [{type: 'text', value: 'alpha'}]
}

InlineCode

interface InlineCode <: Literal {
  type: "inlineCode"
}

InlineCode (Literal) represents a fragment of computer code, such as a file name, computer program, or anything a computer could parse.

InlineCode can be used where phrasing content is expected. Its content is represented by its value field.

This node relates to the block content concept Code.

For example, the following Markdown:

`foo()`

Yields:

{type: 'inlineCode', value: 'foo()'}

Break

interface Break <: Node {
  type: "break"
}

Break (Node) represents a line break, such as in poems or addresses.

Break can be used where phrasing content is expected. It has no content model.

For example, the following Markdown:

foo··
bar

Yields:

{
  type: 'paragraph',
  children: [
    {type: 'text', value: 'foo'},
    {type: 'break'},
    {type: 'text', value: 'bar'}
  ]
}

Link

interface Link <: Parent {
  type: "link"
  children: [StaticPhrasingContent]
}

Link includes Resource

Link (Parent) represents a hyperlink.

Link can be used where phrasing content is expected. Its content model is static phrasing content.

Link includes the mixin Resource.

For example, the following Markdown:

[alpha](https://example.com "bravo")

Yields:

{
  type: 'link',
  url: 'https://example.com',
  title: 'bravo',
  children: [{type: 'text', value: 'alpha'}]
}

Image

interface Image <: Node {
  type: "image"
}

Image includes Resource
Image includes Alternative

Image (Node) represents an image.

Image can be used where phrasing content is expected. It has no content model, but is described by its alt field.

Image includes the mixins Resource and Alternative.

For example, the following Markdown:

![alpha](https://example.com/favicon.ico "bravo")

Yields:

{
  type: 'image',
  url: 'https://example.com/favicon.ico',
  title: 'bravo',
  alt: 'alpha'
}

LinkReference

interface LinkReference <: Parent {
  type: "linkReference"
  children: [StaticPhrasingContent]
}

LinkReference includes Reference

LinkReference (Parent) represents a hyperlink through association, or its original source if there is no association.

LinkReference can be used where phrasing content is expected. Its content model is static phrasing content.

LinkReference includes the mixin Reference.

LinkReferences should be associated with a Definition.

For example, the following Markdown:

[alpha][Bravo]

Yields:

{
  type: 'linkReference',
  identifier: 'bravo',
  label: 'Bravo',
  referenceType: 'full',
  children: [{type: 'text', value: 'alpha'}]
}

ImageReference

interface ImageReference <: Node {
  type: "imageReference"
}

ImageReference includes Reference
ImageReference includes Alternative

ImageReference (Node) represents an image through association, or its original source if there is no association.

ImageReference can be used where phrasing content is expected. It has no content model, but is described by its alt field.

ImageReference includes the mixins Reference and Alternative.

ImageReference should be associated with a Definition.

For example, the following Markdown:

![alpha][bravo]

Yields:

{
  type: 'imageReference',
  identifier: 'bravo',
  label: 'bravo',
  referenceType: 'full',
  alt: 'alpha'
}

Footnote

interface Footnote <: Parent {
  type: "footnote"
  children: [PhrasingContent]
}

Footnote (Parent) represents content relating to the document that is outside its flow.

Footnote can be used where phrasing content is expected. Its content model is also phrasing content.

For example, the following Markdown:

[^alpha bravo]

Yields:

{
  type: 'footnote',
  children: [{type: 'text', value: 'alpha bravo'}]
}

FootnoteReference

interface FootnoteReference <: Node {
  type: "footnoteReference"
}

FootnoteReference includes Association

FootnoteReference (Node) represents a marker through association.

FootnoteReference can be used where phrasing content is expected. It has no content model.

FootnoteReference includes the mixin Association.

FootnoteReference should be associated with a FootnoteDefinition.

For example, the following Markdown:

[^alpha]

Yields:

{
  type: 'footnoteReference',
  identifier: 'alpha',
  label: 'alpha'
}

Mixin

Resource

interface mixin Resource {
  url: string
  title: string?
}

Resource represents a reference to resource.

A url field must be present. It represents a URL to the referenced resource.

A title field can be present. It represents advisory information for the resource, such as would be appropriate for a tooltip.

Association

interface mixin Association {
  identifier: string
  label: string?
}

Association represents an internal relation from one node to another.

An identifier field must be present. It can match an identifier field on another node.

A label field can be present. It represents the original value of the normalised identifier field.

Whether the value of identifier is expected to be a unique identifier or not depends on the type of node including the Association. An example of this is that identifier on Definition should be a unique identifier, whereas multiple LinkReferences can have the same identifier and be associated with one definition.

Reference

interface mixin Reference {
  referenceType: string
}

Reference includes Association

Reference represents a marker that is associated to another node.

A referenceType field must be present. Its value must be a referenceType. It represents the explicitness of the reference.

Alternative

interface mixin Alternative {
  alt: string?
}

Alternative represents a node with a fallback

An alt field should be present. It represents equivalent content for environments that cannot represent the node as intended.

Enumeration

alignType

enum alignType {
  "left" | "right" | "center" | null
}

alignType represents how phrasing content is aligned ([CSSTEXT]).

  • 'left': See the left value of the text-align CSS property
  • 'right': See the right value of the text-align CSS property
  • 'center': See the center value of the text-align CSS property
  • null: phrasing content is aligned as defined by the host environment

referenceType

enum referenceType {
  "shortcut" | "collapsed" | "full"
}

referenceType represents the explicitness of a reference.

  • shortcut: the reference is implicit, its identifier inferred from its content
  • collapsed: the reference is explicit, its identifier inferred from its content
  • full: the reference is explicit, its identifier explicitly set

Content

type Content =
  TopLevelContent | ListContent | TableContent | RowContent | PhrasingContent

Each node in mdast falls into one or more categories of Content that group nodes with similar characteristics together.

TopLevelContent

type TopLevelContent = BlockContent | FrontmatterContent | DefinitionContent

Top-level content represent the sections of document (block content), and metadata such as frontmatter and definitions.

BlockContent

type BlockContent =
  Paragraph | Heading | ThematicBreak | Blockquote | List | Table | HTML | Code

Block content represent the sections of document.

FrontmatterContent

type FrontmatterContent = YAML

Frontmatter content represent out-of-band information about the document.

If frontmatter is present, it must be limited to one node in the tree, and can only exist as a head.

DefinitionContent

type DefinitionContent = Definition | FootnoteDefinition

Definition content represents out-of-band information that typically affects the document through Association.

ListContent

type ListContent = ListItem

List content represent the items in a list.

TableContent

type TableContent = TableRow

Table content represent the rows in a table.

RowContent

type RowContent = TableCell

Row content represent the cells in a row.

PhrasingContent

type PhrasingContent = StaticPhrasingContent | Link | LinkReference

Phrasing content represent the text in a document, and its markup.

StaticPhrasingContent

type StaticPhrasingContent =
  Text | Emphasis | Strong | Delete | HTML | InlineCode | Break | Image |
  ImageReference | Footnote | FootnoteReference

StaticPhrasing content represent the text in a document, and its markup, that is not intended for user interaction.

Glossary

See the unist glossary.

List of utilities

See the unist list of utilities for more utilities.

References

Security

As mdast can contain HTML and be used to represent HTML, and improper use of HTML can open you up to a cross-site scripting (XSS) attack, improper use of mdast is also unsafe. When transforming to HTML (typically through hast), always be careful with user input and use hast-util-santize to make the hast tree safe.

Related

  • hast — Hypertext Abstract Syntax Tree format
  • nlcst — Natural Language Concrete Syntax Tree format
  • xast — Extensible Abstract Syntax Tree

Contribute

See contributing.md in syntax-tree/.github for ways to get started. See support.md for ways to get help. Ideas for new utilities and tools can be posted in syntax-tree/ideas.

A curated list of awesome syntax-tree, unist, mdast, hast, xast, and nlcst resources can be found in awesome syntax-tree.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

Acknowledgments

The initial release of this project was authored by @wooorm.

Special thanks to @eush77 for their work, ideas, and incredibly valuable feedback!

Thanks to @anandthakker, @arobase-che, @BarryThePenguin, @chinesedfan, @ChristianMurphy, @craftzdog, @d4rekanguok, @detj, @dominictarr, @gkatsev, @Hamms, @Hypercubed, @ikatyang, @izumin5210, @jasonLaster, @Justineo, @justjake, @KyleAMathews, @laysent, @macklinu, @mike-north, @Murderlon, @nevik, @Rokt33r, @rhysd, @rubys, @Sarah-Seo, @sethvincent, @silvenon, @simov, @staltz, @stefanprobst, @tmcw, and @vhf for contributing to mdast and related projects!

License

CC-BY-4.0 © Titus Wormer

About

Markdown Abstract Syntax Tree format

Resources

Stars

Watchers

Forks

Packages

No packages published