New methods for excluding elements with specific missing or empty attributes #45

forum-is · 2015-10-09T14:53:12Z

We recently came across a problem where an img-element with missing src-attribute caused our PDF engine to break. As the current version of the html sanitizer only contains a method for disallowing elements without any attributes, we suggest adding methods to target a specific missing/empty attribute.
As a convenience we also included a method to check for a non-matching regex pattern.

… missing or emtpy attributes +Removed AutoCloseableHtmlStreamRenderer for Java SE 6 compatibility

…ttributes or elements that do NOT match a pattern

jmanico · 2015-10-09T14:56:43Z

Thank you very, very much for submitting a patch with your report. More from Mike soon!

Aloha,

Jim Manico
@manicode

On Oct 9, 2015, at 4:53 PM, FORUM Gesellschaft für Informationssicherheit mbH [email protected] wrote:

We recently came across a problem where an img-element with missing src-attribute caused our PDF engine to break. As the current version of the html sanitizer only contains a method for disallowing elements without any attributes, we suggest adding methods to target a specific missing/empty attribute.
As a convenience we also included a method to check for a non-matching regex pattern.

You can view, comment on, or merge this pull request online at:

#45

Commit Summary

+Added HtmlPolicyBuilder methods for excluding elements with specific missing or emtpy attributes
Reverted changes
added methods for excluding elements with specific empty or missing attributes or elements that do NOT match a pattern
File Changes

M src/main/java/org/owasp/html/HtmlPolicyBuilder.java (49)
Patch Links:

https://github.com/OWASP/java-html-sanitizer/pull/45.patch
https://github.com/OWASP/java-html-sanitizer/pull/45.diff
—
Reply to this email directly or view it on GitHub.

mikesamuel · 2015-10-09T17:40:22Z

src/main/java/org/owasp/html/HtmlPolicyBuilder.java

+  /**
+   * Disallows the given element from appearing without the given attribute.
+   */
+  public HtmlPolicyBuilder disallowWithoutAttribute(String elementName, final String attributeName) {


Right now, we can say

myPolicyBuilder.allowAttributes("src", ...).onElements("img")

and if I understand your goal, that is problematic because it allows but does not require src="..." on <img ...>.

I'd prefer

myPolicyBuilder.withAttributes("src", ...).required().onElements("img")

which allows mixing required() into the existing flow by which elements are associated with attributes instead of creating a new API.

mikesamuel · 2015-10-09T17:51:27Z

This change seems to be missing any tests. Could you at least check that it works with the kind of policy you want to work.

mikesamuel · 2015-10-09T17:54:30Z

We recently came across a problem where an img-element with missing src-attribute caused our PDF engine to break. As the current version of the html sanitizer only contains a method for disallowing elements without any attributes, we suggest adding methods to target a specific missing/empty attribute.
As a convenience we also included a method to check for a non-matching regex pattern.

What PDF engine is this? If you don't maintain it then it might be worth filing a bug with them as well.

Should this change also require src="..." on <img> in Sanitizers.IMAGES?

forum-is · 2015-10-27T11:32:23Z

Hello Mike,

sorry for the wait.

Our goal was indeed to be able to require certain attributes when their element would make no sense or as in this case even cause harm without them, which was not possible with the current allowAttributes()-implementation.

Your proposed mimic of a chainable required() seems indeed a cleaner approach. Yet my understanding of the project code is currently not sufficient to change the implementation to that within reasonable effort. I will however try and provide a test case to show the validity of the current implementation shortly. Maybe then you could refactor it to the required()-mimic as you find the time.

We use Apache FOP to export PDFs from our web application.

As you mention, it might actually be useful to require the attribute in the standard sanitizer so others do not run into that kind of issue unsuspectingly. At least no valid use case comes to mind for an img without a src that would forbid it from being required.

Best regards,
Sebastian

forum-is · 2015-11-09T14:30:07Z

Hello Mike,

I added a TestCase for the disallowWithoutAttribute() functionality that represents the constellation that caused us problems (arbitrary alt, missing src).

Best regards,
Sebastian

jmanico · 2016-07-17T20:41:29Z

It looks like this is mostly resolved. @mikesamuel - can we merge in the test case or resolve this pull request somehow? It does not look like major work is needed to resolve this..

OWASP#206) * Do not lcase element or attribute names that match SVG or MathML names exactly > Currently all names are converted to lowercase which is ok when > you're using it for HTML only, but if there is an SVG image nested > inside the HTML it breaks. For example, when `viewBox` attribute is > converted to `viewbox` the image is not displayed correctly. This commit splits *HtmlLexer*.*canonicalName* into variants which preserve items on whitelists derived from the SVG and MathML specifications, and adjusts callers of *canonicalName* to use the appropriate variant. Fixes OWASP#182 * add unittests for mixed-case SVG names

jmanico · 2020-09-17T04:50:38Z

Bump @mikesamuel ?

mikesamuel · 2020-09-18T21:24:27Z

Sorry for ghosting. I'll take a look shortly.

mikesamuel · 2020-09-22T15:06:02Z

{dis,}allowWithoutAttributes were reworked recently. https://github.com/OWASP/java-html-sanitizer/blob/main/change_log.md says

Release 20200615.1

Change .and when combining two policies to respect explicit skipIfEmpty decisions.

which affected

java-html-sanitizer/src/main/java/org/owasp/html/HtmlPolicyBuilder.java

Lines 314 to 343 in ca40697

    
             /** 
        
              * Assuming the given elements are allowed, allows them to appear without 
        
              * attributes. 
        
              * 
        
              * @see #DEFAULT_SKIP_TAG_MAP_IF_EMPTY_ATTR 
        
              * @see #disallowWithoutAttributes 
        
              */ 
        
             public HtmlPolicyBuilder allowWithoutAttributes(String... elementNames) { 
        
               invalidateCompiledState(); 
        
               for (String elementName : elementNames) { 
        
                 elementName = HtmlLexer.canonicalElementName(elementName); 
        
                 skipIssueTagMap.put(elementName, HtmlTagSkipType.DO_NOT_SKIP); 
        
               } 
        
               return this; 
        
             } 
        
             /** 
        
              * Disallows the given elements from appearing without attributes. 
        
              * 
        
              * @see #DEFAULT_SKIP_TAG_MAP_IF_EMPTY_ATTR 
        
              * @see #allowWithoutAttributes 
        
              */ 
        
             public HtmlPolicyBuilder disallowWithoutAttributes(String... elementNames) { 
        
               invalidateCompiledState(); 
        
               for (String elementName : elementNames) { 
        
                 elementName = HtmlLexer.canonicalElementName(elementName); 
        
                 skipIssueTagMap.put(elementName, HtmlTagSkipType.SKIP); 
        
               } 
        
               return this; 
        
             }

IIUC, what you need is a way to drop specific elements that are missing some core attributes.

The following passes for me, which uses an ElementPolicy to reject <img> tags that don't have "src" in a key position. It may spuriously drop <img alt="src" src="foo.png"> but works as a POC.

    String input = "<img src=foo.png><img alt=bar>";
    PolicyFactory imgOkWithSrc = new HtmlPolicyBuilder()
            .allowElements(
                    ((elementName, attrs) -> (attrs.indexOf("src") & 1) == 0 ? elementName : null),
                    "img"
            )
            .allowAttributes("alt", "src").onElements("img")
            .toFactory();
    assertEquals("<img src=\"foo.png\" />", imgOkWithSrc.sanitize(input));

Bumps [junit](https://github.com/junit-team/junit4) from 4.12 to 4.13.1. - [Release notes](https://github.com/junit-team/junit4/releases) - [Changelog](https://github.com/junit-team/junit4/blob/main/doc/ReleaseNotes4.12.md) - [Commits](junit-team/junit4@r4.12...r4.13.1) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* allowAtributes("style") * Global style test

This may still be overridden with `-Dguava-version=...`.

This addresses a vulnerability where policies that allow `<style>` elements with text in `<option>` elements are vulnerable to XSS as disclosed in https://docs.google.com/document/d/11SoX296sMS0XoQiQbpxc5pNxSdbJKDJkm5BDv0zrX50/edit?usp=sharing This changes behavior for rendering of `<style>` element text so may change behavior. Specifically, `<style>` element text that includes the strings `-->` or `]]>` will no longer sanitize.

Rather than mucking with `<style>` tag content in all cases, this is a more tailored fix to the recent vulnerability that just closes `<style>` elements when we realize they're in a dodgy parsing context.

As described in issue OWASP#254 `&para` is a full complete character reference when decoding text node content, but not when decoding attribute content which causes problems for URL attribute values like /test?param1=foo&param2=bar As shown via JS test code in that issue, a small set of next characters prevent a character reference name match from being considered complete. This commit: - modifies the decode functions to take an extra parameter `boolean inAttribute`, and modifies the Trie traversal loops to not store a longest match so far based on that parameter and some next character tests - modifies the HTML lexer to pass that attribute appropriately - for backwards compat, leaves the old APIs in place but `@deprecated` - adds unit tests for the decode functions - adds a unit test for the specific input from the issue This change should make us more conformant with observed browser behaviour so is not expected to cause compatibility problems for existing users. Fixes OWASP#254

…P#266) CssTokens code assumed that consumeIdentOrUrlOrFunctions always returned a token type and consumed characters. This commit audits all uses of that function and checks that they make progress.

Sebastian Uecker added 3 commits October 9, 2015 15:48

+Added HtmlPolicyBuilder methods for excluding elements with specific…

73b68bc

… missing or emtpy attributes +Removed AutoCloseableHtmlStreamRenderer for Java SE 6 compatibility

Reverted changes

e75d980

added methods for excluding elements with specific empty or missing a…

ea02e71

…ttributes or elements that do NOT match a pattern

mikesamuel reviewed Oct 9, 2015
View reviewed changes

Added TestCase for disallowWithoutAttribute()

2537933

mikesamuel added 8 commits June 15, 2020 11:44

s/master/main/ for default branch

e6dd2ea

Release candidate 20200615.1

f3f56d4

Bumped dev version

fd6b2dd

Release candidate 20200713.1

25c3d64

Bumped dev version

ffe5cfa

we use spotbugs now instead of findbugs

c7db2d4

s/master/main/ in doc URLs

ca40697

mikesamuel self-assigned this Sep 18, 2020

dependabot bot and others added 5 commits December 7, 2020 14:30

hsl and hsla (OWASP#216)

acaf3f2

Fix code formatting lint checks (OWASP#217)

33d319f

Fixed allowAtributes("style").globally() (OWASP#218)

020d5d0

* allowAtributes("style") * Global style test

Upgrade to a modern guava dependency

ad287c3

This may still be overridden with `-Dguava-version=...`.

mikesamuel and others added 13 commits October 18, 2021 09:23

Release candidate 20211018.1

374ea2f

Bumped dev version

7d76ba9

Update vulnerabilities.md

e2b29e8

Recognize that <style> is not really workable inside <select>

14f84fd

Rather than mucking with `<style>` tag content in all cases, this is a more tailored fix to the recent vulnerability that just closes `<style>` elements when we realize they're in a dodgy parsing context.

Release candidate 20211018.2

62a0715

Bumped dev version

06b299c

Fix missing null checks in uses of consumeIdentOrUrlOrFunctions (OWAS…

c2c74fc

…P#266) CssTokens code assumed that consumeIdentOrUrlOrFunctions always returned a token type and consumed characters. This commit audits all uses of that function and checks that they make progress.

Release candidate 20220608.1

e35ef4f

Bumped dev version

3756979

Merge branch 'OWASP:master' into master

0372f4f

Merge remote-tracking branch 'upstream/main'

ccb4c18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New methods for excluding elements with specific missing or empty attributes #45

New methods for excluding elements with specific missing or empty attributes #45

forum-is commented Oct 9, 2015

jmanico commented Oct 9, 2015

mikesamuel Oct 9, 2015

mikesamuel commented Oct 9, 2015

mikesamuel commented Oct 9, 2015

forum-is commented Oct 27, 2015

forum-is commented Nov 9, 2015

jmanico commented Jul 17, 2016

jmanico commented Sep 17, 2020

mikesamuel commented Sep 18, 2020

mikesamuel commented Sep 22, 2020

New methods for excluding elements with specific missing or empty attributes #45

Are you sure you want to change the base?

New methods for excluding elements with specific missing or empty attributes #45

Conversation

forum-is commented Oct 9, 2015

jmanico commented Oct 9, 2015

Aloha,

mikesamuel Oct 9, 2015

Choose a reason for hiding this comment

mikesamuel commented Oct 9, 2015

mikesamuel commented Oct 9, 2015

forum-is commented Oct 27, 2015

forum-is commented Nov 9, 2015

jmanico commented Jul 17, 2016

jmanico commented Sep 17, 2020

mikesamuel commented Sep 18, 2020

mikesamuel commented Sep 22, 2020