-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ProtectEsiTags Document filter #343
Conversation
Should this be disabled by default? The vast majority of sites wouldn't be needing it. But sites that do should easily be able to add it. |
Good point! I think we can have the filter class in our core but make it optional by not including it in the @schlessera please share your thought on this. |
We have a lot of work-arounds in place that only a minority will profit from. But unless this is a huge performance overhead, I'd prefer to keep it enabled by default, rather than to have unnecessary support requests and people with frustrating hard-to-diagnose breakage from time to time. |
To me, ESI seems like a different category of filter. If you aren't intending to use ESI tags in the first place, then they should not be passed through or else they'll cause validation errors. |
Absolutely agree with @westonruter here, are there places we would consider esi tags valid without an edge server? Would validation be the edge server's concern in this model? |
If someone is using ESI then it would be their responsibility to ensure they're composing a valid AMP page. Any validation would have to be happening on the edge server and it could not be done on the origin server in PHP here. My concern was that PHP sanitization (#195) and validation (#194) wouldn't be able to see the ESI tags in the DOM. But after looking at the changes here more deeply, I see that since it's basically just converting I was originally thinking the implementation would replace ESI tags with placeholders anywhere they appear, as I seemed to remember that ESI tags could be used inside HTML attributes. But after trying to find examples of that, I can't find anything. I didn't think that ESI parsers actually parsed HTML, but apparently they do. I may have been led astray by old server-side includes. Or maybe I'm just wrong. 😄 So if the additional processing time for doing two search-and-replacements isn't a concern (once before parsing and once after serialization), then I'm fine with it being enabled by default. |
This is not about letting them pass through or not. This is only about making sure we're not corrupting the DOM when they are encountered. They were being passed through already, only the subsequent HTML was parsed in a broken way. This ensures that we can parse a document that contains ESI as well without creating garbage or fatals. |
Yes, or our own validator's concern depending on setup. The above PR is really just about "not breaking during parsing". |
The processing here will most likely happen on the origin server. So if directives for edge servers are included, I think we should keep these and not break them, unless we're going for strict valid AMP. For non-strict validation, ESI tags should be perfectly fine, I think, as we're not expecting the content to run from an AMP cache, but be distributed through some CDN. |
If this is the case, aren't there other examples of XML-namespaced tags that should also be accounted for? I'm thinking of There are some instances of XML namespaces being used for tags in HTML documents, however. There's the (deprecated) XFBML, for example. These would have the same parsing problem as ESI tags, so maybe the solution should be generalized from |
50e787a
to
c059a53
Compare
c059a53
to
8749350
Compare
I'd keep these as separate filters. With the ESI tags, they are technically not really XML namespace tags, they are not even meant to be part of the document. They are more like Mustache tags in that they are placeholders that have to be replaced before the "actual" HTML/XML hits the browser. |
The PHP DOMDocument alters the ESI tags that use colon in the tag and remove the
esi:
part and hence breaks the system. This PR adds a new Document filter to fix this issue.Fixes #202