Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for conditional HTML commenting e.g. <!--[if mso]> #379

Open
mitchelljy opened this issue Jul 11, 2023 · 6 comments
Open

Support for conditional HTML commenting e.g. <!--[if mso]> #379

mitchelljy opened this issue Jul 11, 2023 · 6 comments

Comments

@mitchelljy
Copy link

mitchelljy commented Jul 11, 2023

The syntax specified here for conditional Outlook CSS does not seem to be compatible with HTMLPurifier, everything inside the conditional comment will be removed.

The simplified version of the syntax I'm trying to use is:

<!--[if mso]>
   <a href="https://swifthalf.com/" target="_blank">
      conditional link
   </a>
<![endif]-->
<!--[if !mso]>
  <!-- -->
    <a href="https://swifthalf.com"  target="_blank">
        default link
    </a>
  <!--
<![endif]-->

As these are HTML comments, I initially thought I might be able to use the HTML.AllowedComments config option, however it does not have an option to allow ALL comments. The exact comment body has to be specified, which will not work for this use case as the content is not always known.

I next tried HTML.AllowedCommentsRegexp with a regexp that should allow everything:

$config->set('HTML.AllowedCommentsRegexp', '/^.*?$/');

However this also does not work. It seems that in the <!--[if mso]> comment marker, the non-standard [if mso] seems to cause the comment to be removed regardless.

Is there a way to support this syntax currently or is it not possible?

Thank you for any help!

@bytestream
Copy link
Contributor

This is because of

protected static function removeIEConditional($string)

Conditional comments are only supported by IE 5 - 9, so lack of support only really affects a tiny minority of people?

@mitchelljy
Copy link
Author

mitchelljy commented Jul 12, 2023

This is because of

protected static function removeIEConditional($string)

Conditional comments are only supported by IE 5 - 9, so lack of support only really affects a tiny minority of people?

These conditionals are automatically inserted by a service that some of our users use to build email HTML documents. I'm guessing the service is including these conditionals in order to support older email clients.

It would be really nice if these conditional comments didn't unavoidably break a HTML document that's been passed through HTMLPurifier, as it seems they are still in use sometimes in modern browsers and email clients for maximum backwards compatibility.

If it's easy enough to implement, could that be made optional?

@fkoyer
Copy link
Contributor

fkoyer commented Mar 7, 2024

Conditional comments are still used extensively in email to target Outlook clients.

Why do we even need to remove these? If you simply remove all comments, then conditional comments will get removed also. The problem with removeIEConditional is that it removes too much. Consider this example:

<!--[if mso]>
    This is inside an HTML comment. It will be displayed by Outlook only
<!-- <![endif]-->
<!--[if !mso]><!---->
    This is not inside an HTML comment. It will be displayed by all non-Outlook email clients
<!-- <![endif]-->

Unfortunately, HTMLPurifier removes both of these so neither one is displayed. I suggest getting rid of removeIEConditional altogether but if it's absolutely necessary then at least modify the regex like so

diff --git a/library/HTMLPurifier/Lexer.php b/library/HTMLPurifier/Lexer.php
index 1f552a17..bbe47769 100644
--- a/library/HTMLPurifier/Lexer.php
+++ b/library/HTMLPurifier/Lexer.php
@@ -277,7 +277,7 @@ class HTMLPurifier_Lexer
     protected static function removeIEConditional($string)
     {
         return preg_replace(
-            '#<!--\[if [^>]+\]>.*?<!\[endif\]-->#si', // probably should generalize for all strings
+            '#<!--\[if (?!!)[^>]+\]>.*?<!\[endif\]-->#si', // probably should generalize for all strings
             '',
             $string
         );

Currently I'm working around the issue by removing all HTML comments before passing to HTMLPurifier.

@ezyang
Copy link
Owner

ezyang commented Mar 12, 2024

I'm unlikely to ever merge PRs that add support for this, because the conditional comment syntax is not well specified and I don't have time to exactly reverse engineer how it is actually implemented to ensure it can't be used as the basis for an XSS bypass.

@fkoyer
Copy link
Contributor

fkoyer commented Mar 12, 2024

I agree with you. That's why I think the best course of action is to eliminate removeIEConditional altogether. If you really want to remove conditional comments then you'd have to follow the spec to parse and remove them properly. The current method of removing conditional comments is flawed and actually does more harm than good.

It's not necessary to remove conditional comments because HTMLPurifier removes all comments by default. Conditional comments are just specially formatted HTML comments. I submitted PR #401 that includes a unit test to demonstrate that conditional comments are converted to objects of HTMLPurifier_Token_Comment.

By doing it this way:

  • Code is simplified. No special handling is required.
  • Conditional comments still get removed by default.
  • If users want to keep them they can use HTML.AllowedCommentsRegexp.

The title of this issue could be changed from Support for conditional HTML commenting to Remove special handling of conditional HTML commenting

@fkoyer
Copy link
Contributor

fkoyer commented Mar 14, 2024

Thanks for merging the PR. I think that fixes the issue. Although conditional comments are still not supported by HTMLPurifier, users can opt to allow them with this regexp:

$config->set('HTML.AllowedCommentsRegexp', '/\[(if|endif)\b/');

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants