-
-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline HTML comments at start of line #760
Comments
I agree that this is surprising behavior, and undesirable (for example, if you rewrap a paragraph you might easily make an inline HTML comment into block HTML). Your proposal would be one way to go. More radically, one could disallow raw HTML blocks from interrupting paragraphs. (But there may be a reason not to do this?) |
Not sure if there are reasons to avoid the more radical solution. I can imagine scenarios where I'd welcome the more radical solution (e.g., I might want an HTML comment to occupy an entire line of a running paragraph, but not to introduce a paragraph break). I can't easily imagine scenarios where I'd regret the more radical solution --- if I wanted a HTML block element I could always just ensure it's surrounded by block boundaries e.g. blank lines. What if there's a block boundary (e.g. blank line or edge of the document) on one side of a comment, and running text on the other side? Then you count as an inline element with the adjacent paragraph? That'd be the least surprising. |
If you do go with (something like) my more conservative proposal, you should be careful not to parse a line whose text is exactly The commonmark parser currently in pandoc (correctly) treats that as the start of unclosed comment.
|
Why? According to HTML that’s a comment.
In my experience, many markdown authors are kinda messy, and often don’t use blank lines. I’m not so sure about this. The downside isn’t that bad. There are also others kinds of “HTML” “blocks” than comments in CommonMark. Do you only want to change comments? |
Huh, I didn't realize that. Have just confirmed that Chrome will render
These are the only HTML blocks where the subtleties of the parsing matter to me. For other raw html, I'd probably always make it a block set off by blank lines. I'd be happy with a good solution to the parsing of comments, that is then extrapolated to the other elements in the way that seems to interested parties to make most sense / be most natural. Maybe that means changing all the elements to have consistent parsing rules. Maybe it means treating comments specially. I don't have an opinion. |
The kind of issue I had is I'd have source text like this:
And then the |
True, but there are block-level elements that we currently don't allow to interrupt paragraphs: notably, ordered lists (other than those beginning with 1.). And it would be relatively natural to insist that HTML blocks begin with a blank line (or container edge), since they generally have to end with a blank line. The only down side I see is that in things like
the tags would be parsed as inline HTML. This would be a breaking backwards-incompatible change, which of course is a big strike against it. But also easily fixed by inserting a blank line. The thing I don't like about the more moderate proposal is that it special-cases comments. Why should comments behave differently from other raw HTML? |
For For comments (blocks type 2), proposal is that:
We could modify the end condition for blocks type 3, 4, and 5 in the same way. Blocks type 6 and 7 don't need their end condition modified. |
I think @jgm was thinking more about the concept of interruption, which already exists in CM, in several places but also HTML kind 7. Which does what you want. <!--
xxx
-->. Currently, markdown will interpret this entirely as raw HTML and emit it as is. The end user will see a dot. Which could be a mistake by the author but not terrible. With your proposal, markdown parsers will not see any of this as HTML. It will create three paragraphs. And emit all those characters for users to see. There’s also adjacent comments and more cases I can think of: <!-- --><!-- -->
Hmm, not sure. Only the tags do, kind 6 and 7. But that has more to do with them starting at something that looks as XML and otherwise never exit. A blank line is the only thing we can look for. And we want to interleave markdown and HTML. And because tags (in markdown) never contain blank lines. You previously mentioned “And then the I don’t think that’s correct. Everything’s working fine: https://spec.commonmark.org/dingus/?text=*%20abc%0A%20%20%3C!--%20def%20--%3E%20ghi%0A%20%20jkl. I think I’d currently recommend not doing this. HTML in markdown is surprising. There is no way around that. So authors need to learn some rules. The 2 proposals are different, but also have edge cases that need to be learned. |
Sorry, I made a mistake. When the second and third lines are indented as in the example I gave:
it does indeed parse correctly. My actual difficulties were because I had omitted the indent from those lines, which works fine without a comment at the start of the line:
But if the comment is at the start of the line (unindented) then it's parsed as an HTML block:
I guess the simplest fix for this issue would have been to indent the continuation lines. But I didn't realize that until now. |
Let's forget lists then...I think that is not an issue. There's still the following consideration. Suppose you have a document
And then you reflow the lines to
All of a sudden, you have a new paragraph "bop"! (Instead of [Paragraph], you have [Paragraph, RawBlock, Paragraph] -- and "bim" is absorbed in the RawBlock.) I find this confusing and undesirable. This could be addressed by saying that none of the types of HTML blocks can interrupt paragraphs (thanks for pointing out that we already say this for type 7 HTML blocks, I'd forgotten that). That seems to me a pretty reasonable rule. I suspect there are problems with it that I'm not thinking of. |
Well, it’s kinda important. Because fixing this issue with interruption doesn’t solve the root problem the user had, with lazy lines.
My main thing is that it isn’t that bad. Everything that’s supposed to be shown is still displayed, and everything that’s supposed to be hidden is hidden. Because the “block” comment is also an “inline” comment. Sure there’s some whitespace visually, but it all shows. And, it’s also a “sudden” thing that so many things in markdown already have too: Foo # bar
baz.
Foo
# bar
baz. Adding on,
This becomes a problem when a) people expect the current state or don’t think blank lines are that important, and b) people have blank lines in things, breaking out of the paragraph. 1: aaa
<!--
# bbb
--> Current: <p>aaa</p>
<!--
# bbb
--> Proposed: <p>aaa\n<!--</p>
<h1>bbb</h1>
<p>--></p> 2: aaa
<pre>
[bbb](ccc)
</pre> Current: <p>aaa</p>
<pre>
[bbb](ccc)
</pre> Proposed: <p>aaa<pre></p>
<p><a href="ccc">bbb</a></p>
</pre> Interpreted by browser as: <p>aaa</p><pre><p></p>
<p><a href="ccc">bbb</a></p>
</pre> 3: aaa
<div>
# bbb
</div> Current: <p>aaa</p>
<div>
<h1>bbb</h1>
</div> Proposed: <p>aaa<div></p>
<h1>bbb</h1>
</div> Interpreted by browser as: <p>aaa</p><div><p></p>
<h1>bbb</h1>
</div> |
The semantics completely changes, and depending on your output format, "bim" may disappear entirely (since raw HTML isn't passed on to non-HTML formats).
That is true. I agree that you might find people getting confused with things like 1 and 2, and backwards incompatible changes already have a big strike against them. On the other hand, all you have to learn is that you need a blank line here to avoid these results. That's similar to what people already need to learn about the difference between
and
|
A paragraph accidentally split up, I wouldn’t call completely changes semantics. If that’s important, stuff like this seems more important to solve: <h1>
asd
</h1> ^-- Also dropped entirely.
To me this is the same as the original root question. “All you have to learn is to indent your list items to put things in list items” and “all you have to learn is that paragraphs are interrupted by blocks”. And how far to take blank lines? Blank lines before fenced code / headings / thematic breaks / block quotes / other lists / definitions? Common extensions such as GFM footnote definitions/GFM tables/math/directives?
^-- Weird indent style that some people like, so not seen as indented code.
^-- Probably not a numbered list as it doesn’t start with CM specifically doesn’t need blank lines before like all other things. Because people often don’t write them. I think HTML kind 1 through 6 are similar. |
According to the current spec, the HTML comment in this source is an inline element:
But the line containing the HTML comment in this source is an HTML block element (meeting condition 2 in section 4.6):
(The same holds if the comment is preceded by 1-3 spaces.) Intuitively, though, one would expect the comment in the second example to still be inline. (I was surprised by how some of my markup was being rendered, and traced the difficulty back to here. It forces me to pay attention in my source text to where a comment occurs on a line, or where line breaks happen in a longer paragraph containing comments.)
Perhaps consider changing the Commonmark spec so that HTML block condition 2 requires instead:
Start condition: line begins with the string
<!--
.End condition: line contains the string
-->
, and the first occurrence of-->
in the string is at the end of the line.Perhaps the End condition could allow whitespace to occur after the
-->
.The text was updated successfully, but these errors were encountered: