-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not silently ignore unclosed tags #338
Conversation
This is not desirable -- at least not for HTML templates. HTML defines which tags can be closed automatically and which must not be closed at all. For XML, on the other hand, all tags must be explicitly closed. It has a special short hand notation to let a tag function as both start and end tag. The HTML/XHTML compatibility rules indicate how the HTML/XHTML discrepancy concerning element definitions should be handled. I can see that parsing HTML and XML with the same parser is not easy. In my view, however, the In my opinion, the parser should handle HTML and XML differently. For HTML, it should implement the HTML rules for empty and automatically closed tags; for XML, it should follow the XML rules - with the HTML/XHMTL compatibility guidelines. |
Jens Vagelpohl wrote at 2021-2-19 06:59 -0800:
Fixes #327
@d-maurer This is a draft as food for discussion on a fix for #327. The code change will flag up any unclosed tag - but is that really desirable? As the failing test shows it will also flag up simple tags that few people close, like `<br>`.
When I remember right, then `zope.pagetemplate` insists that
elements with TAL/METAL attributes are explicitly closed
and leaves everything else to the renderer. This might be a good
compromise.
|
I had the same idea after looking through the code in |
@d-maurer It would be possible to inspect the parsing results for the tag attributes and see if any of them are from the namespaces we consider interesting ( |
Jens Vagelpohl wrote at 2021-2-24 06:47 -0800:
@d-maurer It would be possible to inspect the parsing results for the tag attributes and see if any of them are from the namespaces we consider interesting (`tal`, `metal`, `i18n`) but the parse result is nested several levels and that seems a performance drag.
`chameleon` uses quite aggressive caching. This means that it parses
not very often. This might indicate that some parsing slowdown
could be acceptable.
Instead, I have taken inspiration from `zope.tal` to use the same definition they have for tags that are always supposed to be empty. Those will not be put onto the temporary stack that is used to hold unclosed tags. What's your opinion on that?
As I wrote earlier, XML requires that all tags are closed explicitly.
Whatever you do, it should correctly work for X(HT)ML.
If you know that you have a HTML (in contrast to an X(TM)ML) template,
tags declared empty by the HTML DTD need not be put onto the stack.
However, I doubt that this will resolve the initial problem
which has been that unclosed elements were silently ignored even if
they carried TAL/METAL attributes.
|
OK, I'll try that next
I don't make these changes blindly or into the blue. If you look at the PR changes you will notice that it includes a test that, among others, proves that your original examples are covered. Unclosed elements that are not among those considered empty by the HTML specs will throw up a parsing error. |
@d-maurer Only putting tags with attributes we care about onto the parser's stack of unclosed tags won't help. It breaks as closing tags are encountered that are not on that stack because they didn't contain "interesting" attributes. Consider this PR a suggestion. It attempts to get closer to a solution without rewriting larger parts of that parser. It works correctly with XHTML/XML, the list of "empty" tags will be left off the stack regardless of them being unclosed for HTML ( If you believe this is a bad solution or it is too simplistic then you're welcome to say so. Like I said, I'm not going to rewrite the parser. The underlying issue is not urgent for me, anyway, and I don't mind closing the PR and leaving the issue around. |
Jens Vagelpohl wrote at 2021-2-25 03:31 -0800:
...
If you believe this is a bad solution or it is too simplistic then you're welcome to say so.
I do not know: it depends on what `chameleon` wants to achieve.
If it restricts itself to valid X[HT]ML documents, the solution
would be sufficient
(and the special handling of empty tags could even be omitted).
If it wants to support HTML, then the current solution will
reject many valid HTML templates -- most of those using
that HTML defines many end tags as optional, e.g.
table related end tags such as in the following example.
```python:
from chameleon.zpt.template import PageTemplate
ts = """<table><tr><th>H1<th>H2<tr><td>D1><td>D2</table>"""
t = PageTemplate(ts)
```
My feeling is that a good solution would require
separate parsers for HTML and XML templates to properly handle
the quite important differences between HTML (e.g. empty elements,
elements with optional (start and end) tags --
see "https://html.spec.whatwg.org/#syntax-tag-omission")
and XML (all elements must be opened and closed explicitly).
Because the template engine needs to actively process only
elements with "active" ("TAL", "METAL", ...) attributes, it might be an option
to check proper structure only for those elements and
pass on tags for all other elements unchecked (to be handled by upper levels).
I have the impression that `zope.pagetemplate` does something like this.
|
Like I said, I won't get into reimplementing the parser, and my attempt at ignoring tags that don't have TAL/METAL attributes did not work. I'll close this PR and create a release then. |
It might be better to require proper XML formatting and then have an output option to render to HTML5, e.g. render down a |
Requiring proper XML formatting will break thousands of existing templates... |
@dataflake yeah I suppose an HTML parsing layer is desirable. That shouldn't be hard, it's just a matter of modifying |
Malthe Borch wrote at 2021-2-26 00:15 -0800:
@dataflake yeah I suppose an HTML parsing layer is desirable. That shouldn't be hard, it's just a matter of modifying `ElementParser` to automatically close tags that appear in a list of self-closing tags.
The rules (linked to in a previous comment) are not simple
(or more precisely: there are lots of rules (each on not very complex))..
|
It might be good enough to start with the subset of typically used rules. I don't think very many people interested in Chameleon will use an |
Malthe Borch wrote at 2021-2-26 01:30 -0800:
It might be good enough to start with the subset of typically used rules. I don't think very many people interested in Chameleon will use an `<li>` without closing it.
I am one of those people:
<ul>
<li> ...
<li> ...
...
</ul>
|
Fixes #327
@d-maurer This is a draft as food for discussion on a fix for #327. The code change will flag up any unclosed tag - but is that really desirable? As the failing test shows it will also flag up simple tags that few people close, like
<br>
. But it feels a fix shouldn't introduce code to introspect the unclosed tag to see if it's OK to ignore it.