Do not silently ignore unclosed tags #338

dataflake · 2021-02-19T14:58:58Z

Fixes #327

@d-maurer This is a draft as food for discussion on a fix for #327. The code change will flag up any unclosed tag - but is that really desirable? As the failing test shows it will also flag up simple tags that few people close, like <br>. But it feels a fix shouldn't introduce code to introspect the unclosed tag to see if it's OK to ignore it.

d-maurer · 2021-02-19T17:14:30Z

the code change will flag up any unclosed tag - but is that really desirable?

This is not desirable -- at least not for HTML templates. HTML defines which tags can be closed automatically and which must not be closed at all. br, for example, is defined as empty which means, it cannot have an end tag.

For XML, on the other hand, all tags must be explicitly closed. It has a special short hand notation to let a tag function as both start and end tag. The HTML/XHTML compatibility rules indicate how the HTML/XHTML discrepancy concerning element definitions should be handled.

I can see that parsing HTML and XML with the same parser is not easy. In my view, however, the zope.pagetemplate parser did a better job in this respect than that of chameleon.

In my opinion, the parser should handle HTML and XML differently. For HTML, it should implement the HTML rules for empty and automatically closed tags; for XML, it should follow the XML rules - with the HTML/XHMTL compatibility guidelines.

d-maurer · 2021-02-20T08:49:48Z

Jens Vagelpohl wrote at 2021-2-19 06:59 -0800:

Fixes #327 @d-maurer This is a draft as food for discussion on a fix for #327. The code change will flag up any unclosed tag - but is that really desirable? As the failing test shows it will also flag up simple tags that few people close, like `<br>`.

When I remember right, then `zope.pagetemplate` insists that elements with TAL/METAL attributes are explicitly closed and leaves everything else to the renderer. This might be a good compromise.

dataflake · 2021-02-20T09:05:30Z

When I remember right, then zope.pagetemplate insists that elements with TAL/METAL attributes are explicitly closed and leaves everything else to the renderer. This might be a good compromise.

I had the same idea after looking through the code in zope.pagetemplate and zope.tal, their processing is a lot more involved and the pain of getting that right seemed greater than the pain of unclear error messages for users who forget to close a tag with tal statements on it. I'll look at that option next week.

dataflake · 2021-02-24T14:47:36Z

@d-maurer It would be possible to inspect the parsing results for the tag attributes and see if any of them are from the namespaces we consider interesting (tal, metal, i18n) but the parse result is nested several levels and that seems a performance drag. Instead, I have taken inspiration from zope.tal to use the same definition they have for tags that are always supposed to be empty. Those will not be put onto the temporary stack that is used to hold unclosed tags. What's your opinion on that?

d-maurer · 2021-02-24T17:07:14Z

Jens Vagelpohl wrote at 2021-2-24 06:47 -0800:

@d-maurer It would be possible to inspect the parsing results for the tag attributes and see if any of them are from the namespaces we consider interesting (`tal`, `metal`, `i18n`) but the parse result is nested several levels and that seems a performance drag.

`chameleon` uses quite aggressive caching. This means that it parses not very often. This might indicate that some parsing slowdown could be acceptable.

Instead, I have taken inspiration from `zope.tal` to use the same definition they have for tags that are always supposed to be empty. Those will not be put onto the temporary stack that is used to hold unclosed tags. What's your opinion on that?

As I wrote earlier, XML requires that all tags are closed explicitly. Whatever you do, it should correctly work for X(HT)ML. If you know that you have a HTML (in contrast to an X(TM)ML) template, tags declared empty by the HTML DTD need not be put onto the stack. However, I doubt that this will resolve the initial problem which has been that unclosed elements were silently ignored even if they carried TAL/METAL attributes.

dataflake · 2021-02-25T08:03:12Z

chameleon uses quite aggressive caching. This means that it parses not very often. This might indicate that some parsing slowdown could be acceptable.

OK, I'll try that next

However, I doubt that this will resolve the initial problem which has been that unclosed elements were silently ignored even if they carried TAL/METAL attributes.

I don't make these changes blindly or into the blue. If you look at the PR changes you will notice that it includes a test that, among others, proves that your original examples are covered. Unclosed elements that are not among those considered empty by the HTML specs will throw up a parsing error.

…the queue

dataflake · 2021-02-25T11:31:04Z

@d-maurer Only putting tags with attributes we care about onto the parser's stack of unclosed tags won't help. It breaks as closing tags are encountered that are not on that stack because they didn't contain "interesting" attributes.

Consider this PR a suggestion. It attempts to get closer to a solution without rewriting larger parts of that parser. It works correctly with XHTML/XML, the list of "empty" tags will be left off the stack regardless of them being unclosed for HTML (<br>) or closed for XML (<br/>). It does solve the problem you pointed out, see the tests.

If you believe this is a bad solution or it is too simplistic then you're welcome to say so. Like I said, I'm not going to rewrite the parser. The underlying issue is not urgent for me, anyway, and I don't mind closing the PR and leaving the issue around.

d-maurer · 2021-02-26T07:36:43Z

Jens Vagelpohl wrote at 2021-2-25 03:31 -0800:

... If you believe this is a bad solution or it is too simplistic then you're welcome to say so.

I do not know: it depends on what `chameleon` wants to achieve. If it restricts itself to valid X[HT]ML documents, the solution would be sufficient (and the special handling of empty tags could even be omitted). If it wants to support HTML, then the current solution will reject many valid HTML templates -- most of those using that HTML defines many end tags as optional, e.g. table related end tags such as in the following example. ```python: from chameleon.zpt.template import PageTemplate ts = """<table><tr><th>H1<th>H2<tr><td>D1><td>D2</table>""" t = PageTemplate(ts) ``` My feeling is that a good solution would require separate parsers for HTML and XML templates to properly handle the quite important differences between HTML (e.g. empty elements, elements with optional (start and end) tags -- see "https://html.spec.whatwg.org/#syntax-tag-omission") and XML (all elements must be opened and closed explicitly). Because the template engine needs to actively process only elements with "active" ("TAL", "METAL", ...) attributes, it might be an option to check proper structure only for those elements and pass on tags for all other elements unchecked (to be handled by upper levels). I have the impression that `zope.pagetemplate` does something like this.

dataflake · 2021-02-26T07:40:29Z

Like I said, I won't get into reimplementing the parser, and my attempt at ignoring tags that don't have TAL/METAL attributes did not work. I'll close this PR and create a release then.

malthe · 2021-02-26T07:50:22Z

It might be better to require proper XML formatting and then have an output option to render to HTML5, e.g. render down a <br /> to <br>.

dataflake · 2021-02-26T07:51:45Z

Requiring proper XML formatting will break thousands of existing templates...

malthe · 2021-02-26T08:14:50Z

@dataflake yeah I suppose an HTML parsing layer is desirable. That shouldn't be hard, it's just a matter of modifying ElementParser to automatically close tags that appear in a list of self-closing tags.

d-maurer · 2021-02-26T09:27:29Z

Malthe Borch wrote at 2021-2-26 00:15 -0800:

@dataflake yeah I suppose an HTML parsing layer is desirable. That shouldn't be hard, it's just a matter of modifying `ElementParser` to automatically close tags that appear in a list of self-closing tags.

The rules (linked to in a previous comment) are not simple (or more precisely: there are lots of rules (each on not very complex))..

malthe · 2021-02-26T09:30:35Z

It might be good enough to start with the subset of typically used rules. I don't think very many people interested in Chameleon will use an <li> without closing it.

d-maurer · 2021-02-26T11:02:57Z

Malthe Borch wrote at 2021-2-26 01:30 -0800:

It might be good enough to start with the subset of typically used rules. I don't think very many people interested in Chameleon will use an `<li>` without closing it.

I am one of those people: <ul> <li> ... <li> ... ... </ul>

- Do not silently ignore unclosed tags

b236a32

dataflake self-assigned this Feb 19, 2021

dataflake added 2 commits February 24, 2021 15:26

- don't put empty HTML taks on the stack

91dadc3

Merge branch 'master' into dataflake/issue_327

28faba2

- add another test case tha fails if only tags containing TAL are in …

409804d

…the queue

dataflake closed this Feb 26, 2021

dataflake deleted the dataflake/issue_327 branch February 26, 2021 07:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not silently ignore unclosed tags #338

Do not silently ignore unclosed tags #338

dataflake commented Feb 19, 2021

d-maurer commented Feb 19, 2021

d-maurer commented Feb 20, 2021 via email

dataflake commented Feb 20, 2021

dataflake commented Feb 24, 2021

d-maurer commented Feb 24, 2021 via email

dataflake commented Feb 25, 2021

dataflake commented Feb 25, 2021

d-maurer commented Feb 26, 2021 via email

dataflake commented Feb 26, 2021

malthe commented Feb 26, 2021

dataflake commented Feb 26, 2021

malthe commented Feb 26, 2021

d-maurer commented Feb 26, 2021 via email

malthe commented Feb 26, 2021

d-maurer commented Feb 26, 2021 via email

Do not silently ignore unclosed tags #338

Do not silently ignore unclosed tags #338

Conversation

dataflake commented Feb 19, 2021

d-maurer commented Feb 19, 2021

d-maurer commented Feb 20, 2021 via email

dataflake commented Feb 20, 2021

dataflake commented Feb 24, 2021

d-maurer commented Feb 24, 2021 via email

dataflake commented Feb 25, 2021

dataflake commented Feb 25, 2021

d-maurer commented Feb 26, 2021 via email

dataflake commented Feb 26, 2021

malthe commented Feb 26, 2021

dataflake commented Feb 26, 2021

malthe commented Feb 26, 2021

d-maurer commented Feb 26, 2021 via email

malthe commented Feb 26, 2021

d-maurer commented Feb 26, 2021 via email