You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(I'm unsure whether parsing "normal HTML" - and not only "custom HTML tags" - is in scope of the library. As a user I expected to be able to parse "normal HTML" as it is, so I opened this issue; if it's not in scope, feel free to close!)
HTML5 elements are not necessarily valid XML, e.g.
HTML void elements such as <br> must not have a closing tag (here: </br>).
Above MDN document even states that - in general, "Self-closing tags (<tag />) do not exist in HTML." - but they can be added to void elements in order to be XHTML compliant. Something like <p /> isn't valid HTML5 though!
In contrast, HtmlParser requires all "HTML" to be valid XML.
In practice this means that:
Void elements currently require a closing tag (e.g. <br><br />) or a self-closing tag (<br /), just (<br>) doesn't work.
Non-void elements are currently allowed to be self-closing (e.g. <p />) despite this being non-valid.
My pragmatic suggestion would be to
fix the former, i.e. assume void elements always self-close -> treat <br> as <br />
and tolerate the latter, as <p /> is still valid XHTML and nobody would write this "by accident" anyway -> leave it as it is.
The text was updated successfully, but these errors were encountered:
(I'm unsure whether parsing "normal HTML" - and not only "custom HTML tags" - is in scope of the library. As a user I expected to be able to parse "normal HTML" as it is, so I opened this issue; if it's not in scope, feel free to close!)
It's not a clear cut answer, but I laid out a few possible approaches. Still need to gather ideas and decide on a final design decision there, would be glad to hear your thoughts there as well!
My pragmatic suggestion would be to
fix the former, i.e. assume void elements always self-close -> treat as
and tolerate the latter, as
is still valid XHTML and nobody would write this "by accident" anyway -> leave it as it is.
I agree with your reasoning here. With a self-closing tag, the intention is very clear so I don't see any harm in supporting that in any context regardless of whether it is valid XHTML or HTML.
And void elements should work as expected in HTML, so since <br> works in the Browser (and I believe might be arguably the preferred syntax according to some standards), it makes sense to have that be valid. I'm not sure the scope of the void elements with regard to parsers, but I guess that would mean always treating void tags as self-closing, and always ignoring/throwing away a closing tag for a void element during parsing (such as </br>).
I don't have bandwidth to work on this, but I'd be happy to help review and merge a PR for either of these changes, they both sound like great improvements.
Thank you for the thoughtful writeup and context on this!
(I'm unsure whether parsing "normal HTML" - and not only "custom HTML tags" - is in scope of the library. As a user I expected to be able to parse "normal HTML" as it is, so I opened this issue; if it's not in scope, feel free to close!)
HTML5 elements are not necessarily valid XML, e.g.
<br>
must not have a closing tag (here:</br>
).<tag />
) do not exist in HTML." - but they can be added to void elements in order to be XHTML compliant. Something like<p />
isn't valid HTML5 though!In contrast,
HtmlParser
requires all "HTML" to be valid XML.In practice this means that:
<br><br />
) or a self-closing tag (<br /
), just (<br>
) doesn't work.<p />
) despite this being non-valid.My pragmatic suggestion would be to
<br>
as<br />
<p />
is still valid XHTML and nobody would write this "by accident" anyway -> leave it as it is.The text was updated successfully, but these errors were encountered: