Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically generated unique heading identifiers #504

Closed
wants to merge 10 commits into from
121 changes: 89 additions & 32 deletions spec.txt
Original file line number Diff line number Diff line change
Expand Up @@ -328,8 +328,10 @@ that is not a [whitespace character].

An [ASCII punctuation character](@)
is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`,
`*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`,
`[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`.
`*`, `+`, `,`, `-`, `.`, `/` (U+0021–2F),
`:`, `;`, `<`, `=`, `>`, `?`, `@` (U+003A–0040),
`[`, `\`, `]`, `^`, `_`, `` ` `` (U+005B–0060),
`{`, `|`, `}`, or `~` (U+007B–007E).

A [punctuation character](@) is an [ASCII
punctuation character] or anything in
Expand Down Expand Up @@ -525,7 +527,7 @@ Markdown document.

## Thematic breaks

A line consisting of 0-3 spaces of indentation, followed by a sequence
A line consisting of up to three spaces of indentation, followed by a sequence
of three or more matching `-`, `_`, or `*` characters, each followed
optionally by any number of spaces, forms a
[thematic break](@).
Expand Down Expand Up @@ -570,7 +572,7 @@ __</p>
````````````````````````````````


One to three spaces indent are allowed:
Indentation of one to three spaces is allowed:

```````````````````````````````` example
***
Expand All @@ -581,8 +583,6 @@ One to three spaces indent are allowed:
<hr />
<hr />
````````````````````````````````


Four spaces is too many:

```````````````````````````````` example
Expand Down Expand Up @@ -657,6 +657,21 @@ a------
<p>---a---</p>
````````````````````````````````

Unlike within text, two spaces at the end do not establish a continuation on the next line of code:

.
---
---

---
a
.
<hr />
<hr />

<hr />
<p>a</p>
.

It is required that all of the [non-whitespace characters] be the same.
So, this is not a thematic break:
Expand Down Expand Up @@ -756,7 +771,7 @@ closing sequence of any number of unescaped `#` characters.
The opening sequence of `#` characters must be followed by a
[space] or by the end of line. The optional closing sequence of `#`s must be
preceded by a [space] and may be followed by spaces only. The opening
`#` character may be indented 0-3 spaces. The raw contents of the
`#` character may be indented up to three spaces. The raw contents of the
heading are stripped of leading and trailing spaces before being parsed
as inline content. The heading level is equal to the number of `#`
characters in the opening sequence.
Expand Down Expand Up @@ -832,8 +847,6 @@ Leading and trailing blanks are ignored in parsing inline content:
.
<h1>foo</h1>
````````````````````````````````


One to three spaces indentation are allowed:

```````````````````````````````` example
Expand Down Expand Up @@ -968,12 +981,61 @@ ATX headings can be empty:
<h3></h3>
````````````````````````````````

### Heading identifiers

Headings may be used to generate a table of contents (ToC).
Authors and readers commonly want to target links at headings.
Therefore, an implementation may choose to
add unique anchors to all headings
which are automatically generated from their textual content

.
# Foo
## Bar
## Baz
.
<h1 id="foo">Foo</h1>
<h2 id="bar">Bar</h2>
<h2 id="baz">Baz</h2>
.

If necessary to disambiguate,
the textual content of their ancestral headings can also be used.

.
# Foo
## Bar
# Quuz
## Bar
.
<h1 id="foo">Foo</h1>
<h2 id="bar_(foo)">Bar</h2>
<h1 id="quuz">Quuz</h1>
<h2 id="bar_(quuz)">Baz</h2>
.

If that does not result in unique identifiers,
a counter may be added.

.
# Foo
# Foo
.
<h1 id="foo-1">Foo</h1>
<h1 id="foo-2">Foo</h1>
.

The exact algorithm to generate the identifier is not mandated.
Implementations may choose to apply Unicode normalization,
case folding, white space collapsing, affix addition,
romanization, removal of diacritic marks etc.
as required to conform to the requirements of the target language.

## Setext headings

A [setext heading](@) consists of one or more
lines of text, each containing at least one [non-whitespace
character], with no more than 3 spaces indentation, followed by
character], with no more than three spaces indentation, followed by
a [setext heading underline]. The lines of text must be such
that, were they not followed by the setext heading underline,
they would be interpreted as a paragraph: they cannot be
Expand Down Expand Up @@ -1056,8 +1118,6 @@ not line up with the underlining:
<h2>Foo</h2>
<h1>Foo</h1>
````````````````````````````````


Four spaces indent is too much:

```````````````````````````````` example
Expand All @@ -1075,7 +1135,6 @@ Foo
<hr />
````````````````````````````````


The setext heading underline can be indented up to three spaces, and
may have trailing spaces:

Expand All @@ -1085,8 +1144,6 @@ Foo
.
<h2>Foo</h2>
````````````````````````````````


Four spaces is too much:

```````````````````````````````` example
Expand Down Expand Up @@ -1617,6 +1674,9 @@ as inlines. The first word of the [info string] is typically used to
specify the language of the code sample, and rendered in the `class`
attribute of the `code` tag. However, this spec does not mandate any
particular treatment of the [info string].
That means, extensions may use the info, for instance,
to highlight the syntax with different colors
or to parse and evaluate the code, e.g. to create a diagram.

Here is a simple example with backticks:

Expand Down Expand Up @@ -1813,7 +1873,7 @@ aaa
````````````````````````````````


Four spaces indentation produces an indented code block:
Indentation of four spaces produces an indented code block:

```````````````````````````````` example
```
Expand All @@ -1827,7 +1887,7 @@ aaa
````````````````````````````````


Closing fences may be indented by 0-3 spaces, and their indentation
Closing fences may be indented by up to three spaces, and their indentation
need not match that of the opening fence:

```````````````````````````````` example
Expand All @@ -1850,7 +1910,7 @@ aaa
````````````````````````````````


This is not a closing fence, because it is indented 4 spaces:
This is not a closing fence, because it is indented four spaces:

```````````````````````````````` example
```
Expand Down Expand Up @@ -2571,7 +2631,7 @@ function matchwo(a,b)
````````````````````````````````


The opening tag can be indented 1-3 spaces, but not 4:
The opening tag can be indented one to three spaces, but not four:

```````````````````````````````` example
<!-- foo -->
Expand Down Expand Up @@ -3228,7 +3288,7 @@ these constructions. (A recipe is provided below in the section entitled
## Block quotes

A [block quote marker](@)
consists of 0-3 spaces of initial indent, plus (a) the character `>` together
consists of up to three spaces of initial indent, plus (a) the character `>` together
with a following space, or (b) a single character `>` not followed by a space.

The following rules define [block quotes]:
Expand Down Expand Up @@ -3283,7 +3343,7 @@ baz</p>
````````````````````````````````


The `>` characters can be indented 1-3 spaces:
The `>` characters can be indented one to three spaces:

```````````````````````````````` example
> # Foo
Expand Down Expand Up @@ -3993,7 +4053,7 @@ A start number may not be negative:

An indented code block will have to be indented four spaces beyond
the edge of the region where text will be included in the list item.
In the following case that is 6 spaces:
In the following case that is six spaces:

```````````````````````````````` example
- foo
Expand All @@ -4010,7 +4070,7 @@ In the following case that is 6 spaces:
````````````````````````````````


And in this case it is 11 spaces:
And in this case it is eleven spaces:

```````````````````````````````` example
10. foo
Expand Down Expand Up @@ -4118,7 +4178,7 @@ bar


This is not a significant restriction, because when a block begins
with 1-3 spaces indent, the indentation can always be removed without
with one to three spaces indent, the indentation can always be removed without
a change in interpretation, allowing rule #1 to be applied. So, in
the above case:

Expand Down Expand Up @@ -4275,7 +4335,7 @@ foo

4. **Indentation.** If a sequence of lines *Ls* constitutes a list item
according to rule #1, #2, or #3, then the result of indenting each line
of *Ls* by 1-3 spaces (the same for each line) also constitutes a
of *Ls* by one to three spaces (the same for each line) also constitutes a
list item with the same contents and attributes. If a line is
empty, then it need not be indented.

Expand Down Expand Up @@ -4487,7 +4547,6 @@ So, in this case we need two spaces indent:
</ul>
````````````````````````````````


One is not enough:

```````````````````````````````` example
Expand Down Expand Up @@ -4520,7 +4579,6 @@ Here we need four, because the list marker is wider:
</ol>
````````````````````````````````


Three is not enough:

```````````````````````````````` example
Expand Down Expand Up @@ -4599,7 +4657,7 @@ John Gruber's Markdown spec says the following about list items:
But if you don't want to, you don't have to."

3. "List items may consist of multiple paragraphs. Each subsequent
paragraph in a list item must be indented by either 4 spaces or one
paragraph in a list item must be indented by either four spaces or one
tab."

4. "It looks nice if you indent every line of the subsequent paragraphs,
Expand All @@ -4609,7 +4667,7 @@ John Gruber's Markdown spec says the following about list items:
delimiters need to be indented."

6. "To put a code block within a list item, the code block needs to be
indented twice — 8 spaces or two tabs."
indented twice — eight spaces or two tabs."

These rules specify that a paragraph under a list item must be indented
four spaces (presumably, from the left margin, rather than the start of
Expand Down Expand Up @@ -4692,7 +4750,7 @@ The choice of four spaces is arbitrary. It can be learned, but it is
not likely to be guessed, and it trips up beginners regularly.

Would it help to adopt a two-space rule? The problem is that such
a rule, together with the rule allowing 1--3 spaces indentation of the
a rule, together with the rule allowing one to three spaces indentation of the
initial list marker, allows text that is indented *less than* the
original list marker to be included in the list item. For example,
`Markdown.pl` parses
Expand Down Expand Up @@ -4777,8 +4835,7 @@ takes four spaces (a common case), but diverge in other cases.

A [list](@) is a sequence of one or more
list items [of the same type]. The list items
may be separated by any number of blank lines.

may be separated by any number of blank lines.
Two list items are [of the same type](@)
if they begin with a [list marker] of the same type.
Two list markers are of the
Expand Down