-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more examples. #196
Add more examples. #196
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -96,6 +96,8 @@ that navigate. | |||||
The 'unsafe' methods will not apply any filtering if no explicit config is | ||||||
supplied. | ||||||
|
||||||
> [!Note] The 'unsafe' methods are being worked on here: https://github.com/whatwg/html/pull/9538 | ||||||
|
||||||
## Major differences to previously proposed APIs: | ||||||
|
||||||
The currently proposed API differs in a number of aspects: | ||||||
|
@@ -106,6 +108,7 @@ The currently proposed API differs in a number of aspects: | |||||
- Enforcement of a security baseline depends on the method. The filter/sanitizer | ||||||
config can now be used differently, either in a guaranteed-secure way or in | ||||||
use-config-as-written way. | ||||||
- The configuration dictionary differs substantially in syntax. | ||||||
|
||||||
## Open questions: | ||||||
|
||||||
|
@@ -115,10 +118,6 @@ The currently proposed API differs in a number of aspects: | |||||
dictionary? (As-is, it should probably be a dictionary. An object would | ||||||
require either compelling performance numbers, or a compelling operation that | ||||||
would only work with a pre-processed dictionary.) | ||||||
- Exact filter options syntax. I'm assuming this will follow the discussion in | ||||||
#181. | ||||||
- Naming is TBD. Here I'm trying to follow the preferences expressed in the | ||||||
recent 'sync' meeting. | ||||||
|
||||||
## Examples | ||||||
|
||||||
|
@@ -148,6 +147,8 @@ Document.parseHTML(example_tr); // <html><head></head><body>A table row.</body> | |||||
All of these would have had identical results if the "unsafe" variants had | ||||||
been used. | ||||||
|
||||||
### Parsing XML | ||||||
|
||||||
Parsing follows HTML parsing rules, unlike `innerHTML`, where it depends on the | ||||||
document type: | ||||||
```js | ||||||
|
@@ -161,39 +162,42 @@ element_xml.setHTML(example_not_xml); // <div xmlns="http://www.w3.org/1999/xht | |||||
element.setHTML(example_not_xml); // Same as above. | ||||||
``` | ||||||
|
||||||
### Safe vs Unsafe methods | ||||||
|
||||||
The "safe" methods remove all script-y content defined by the platform and | ||||||
let the rest pass: | ||||||
```js | ||||||
element.setHTML(`<a href=about:blank onclick=alert(1) onload=alert(2) id=myid class=something><script>alert(3);</script>`); | ||||||
// <div><a href="about:blank" id="myid" class="something"></a></div> | ||||||
``` | ||||||
|
||||||
### Configuration Options: Basic use and namespaces | ||||||
|
||||||
The operation of the built-in sanitizer can be configured to suit your | ||||||
applications' needs. Both "safe" and "unsafe" versions can take a configuration. | ||||||
(Please note that naming and structure here is rather preliminary, | ||||||
but we expect these capabilities to be in the final standard.) | ||||||
|
||||||
The "safe" version will ignore configuration items that break its security | ||||||
guarantees: | ||||||
```js | ||||||
const an_unsafe_config = { 'allowElements': [ { name: 'script' } ] }; | ||||||
const an_unsafe_config = { 'elements': [ { name: 'script' } ] }; | ||||||
element.setHTML("<script>", { sanitizer: an_unsafe_config }); // <div></div> | ||||||
element.setHTMLUnsafe("<script>", { sanitizer: an_unsafe_config }); // You now have a script. Congrats. | ||||||
``` | ||||||
|
||||||
For elements, the HTML namespace is default. For attributes, the null namespace. | ||||||
For elements the HTML namespace is default. For attributes, the null namespace. | ||||||
Other namespaces can be supported. A string entry stands for a dictionary with | ||||||
only the name, in the HTML/null namespace (for elements/attributes, | ||||||
respectively). | ||||||
|
||||||
``` js | ||||||
const config_with_namespaces = { | ||||||
allowElements: [ | ||||||
elements: [ | ||||||
'a', // The HTML anchor element. | ||||||
{ name: 'a' }, // Also the HTML anchor element. | ||||||
{ name: 'a', namespace: 'http://www.w3.org/1999/xhtml' }, // Another one. | ||||||
{ name: 'a', namespace: 'http://www.w3.org/2000/svg' } // SVG's anchor element | ||||||
], | ||||||
allowAttributes: [ | ||||||
attributes: [ | ||||||
'href', // An href attribute. The one you'd expect on an HTML anchor. | ||||||
{ name: 'href' }, // The very same. | ||||||
{ name: 'href', namespace: '' }, // There it is again. | ||||||
|
@@ -205,21 +209,213 @@ const config_with_namespaces = { | |||||
}; | ||||||
``` | ||||||
|
||||||
> [!NOTE] | ||||||
> The `config_with_namespaces` example contains multiple entries for the same | ||||||
> element or attribute, to illustrate the syntax. Note that this isn't actually | ||||||
> allowed. | ||||||
|
||||||
### Configuration Options: Allowing or removing (blocking) or flattening | ||||||
|
||||||
There are two ways you can build up a config: Specify the elements & attributes | ||||||
you wish to allow. This is easy to read and makes it easy to understand what | ||||||
to expect in the sanitizer output. Or you can specify what elements & attributes | ||||||
you wish to block. This effectively specifies the sanitizer output relative to | ||||||
the built-in list. This can be useful if you wish to mostly retain the built-in | ||||||
defaults. | ||||||
you wish to remove. Or to block, as other sanitizer libraries might call it. | ||||||
This effectively specifies the sanitizer output relative to the built-in list. | ||||||
This can be useful if you wish to mostly retain the built-in defaults. | ||||||
|
||||||
```js | ||||||
const config_allow = { | ||||||
allowElements: [ "div", "p", "em", "b" ] // Allows only those four elements. | ||||||
// Output with "safe" and "unsafe" methods should be the same. | ||||||
const config_allow_some_formatting = { | ||||||
elements: [ "div", "p", "em", "b", "img" ], // Allows only 5 elements. | ||||||
attributes: [ "class" ] // Allows only class attributes. | ||||||
// Output with "safe" and "unsafe" methods are the same for this config. | ||||||
}; | ||||||
const config_block = { | ||||||
blockElements: [ "style" ] // Allows a lot of things. But not <style>. | ||||||
const config_disallow_style_definitions = { | ||||||
removeElements: [ "style" ], // Allows the defaults, but without <style>. | ||||||
removeAttributes: [ "class", "style" ] // No style or class attribute either. | ||||||
// And not XSS-y stuff, either, if used with a "safe" method. | ||||||
// Output with "safe" and "unsafe" methods might be quite different. | ||||||
}; | ||||||
``` | ||||||
|
||||||
You may also wish to remove elements, but retain their contents. This is | ||||||
chiefly useful to remove unwanted formatting from user input, while | ||||||
preserving its textual content. | ||||||
|
||||||
```js | ||||||
const config_that_removes_elements_but_preserves_their_text_content = { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||
flattenElements: ["span", "em", "u", "s", "i", "b"] | ||||||
}; | ||||||
|
||||||
element.setHTML( | ||||||
"Fancy <b>text</b> with <span style='color:blue'>pizzazz</span>.", | ||||||
otherdaniel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
{ sanitizer: config_that_removes_elements_but_preserves_their_text_content }); | ||||||
// <div>Fancy text with pizzazz.</div> | ||||||
``` | ||||||
|
||||||
There is no `flattenAttribute` because attribute nodes do not have children. | ||||||
|
||||||
|
||||||
### Configuring attributes per element | ||||||
|
||||||
A common use case is to allow or remove all instances of a given attribute, | ||||||
but this isn't always sufficient. Attribute interpretation depends on the | ||||||
element they are attached to, and so one may also want to act on attributes | ||||||
on specific elements. | ||||||
|
||||||
In the example `config_allow_some_formatting` in the previous chapter | ||||||
we have allowed the `class` attribute on any of allowed elements. | ||||||
If one wanted to allow `class` everywhere, but `src` only on `<img>`, the | ||||||
following would do: | ||||||
|
||||||
```js | ||||||
const config_with_element_specific_attributes = { | ||||||
elements: [ | ||||||
"div", "p","em", "b", | ||||||
{ name: "img", attributes: [ "src" ] } | ||||||
mozfreddyb marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
], | ||||||
attributes: ["class"], | ||||||
}; | ||||||
``` | ||||||
|
||||||
If you want to remove `src` attributes from `<input>` elements but retain them | ||||||
elsewhere, you can use: | ||||||
|
||||||
```js | ||||||
const remove_src_attribute_from_input = { | ||||||
elements: [{ name: "input", removeAttributes: ["src"]}], | ||||||
} | ||||||
``` | ||||||
|
||||||
Note that the `removeAttributes` key is on an allowed element, since removing | ||||||
the element itself would also remove all the attributes that are part of that | ||||||
element. | ||||||
|
||||||
### Matching any or no attribute on a given element | ||||||
|
||||||
For an element that allows any of the default-allowed attributes, you can | ||||||
use the special string `"*"`. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this goes into the discussion of how you combine configurations, which we haven't really explored recently. I guess what this is saying is that if the default configuration allows an element E with attributes A1, A2, and A3, and you create a configuration with just E, none of the attributes are allowed. However, if you specify * they all are. However, what if the default configuration listed A3 as a global attribute? Or would the default configuration not have those as we can enumerate all the elements? (Can we though with custom elements?) I think this needs some more exploration or at least agreement on what the default configuration is going to be, which #188 plays a fairly big role in. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah. I'm no longer sure whether we need this at all. I think for the allow-case, we don't even need the And I think for the remove-case, we don't really need this either: Remove-nothing is taken care of by the default; and remove-everything is taken care of by allowing nothing. So... I guess we don't even need this section, or the |
||||||
|
||||||
```js | ||||||
const config_div_without_anything = { | ||||||
elements: [ { name: "div", attributes: [] } ] | ||||||
}; | ||||||
const config_div_with_everything = { | ||||||
elements: [ { name: "div", attributes: "*" } ] | ||||||
}; | ||||||
|
||||||
// I guess one could also write `removeAttributes: []`. Not sure if that's nicer. | ||||||
``` | ||||||
|
||||||
### Comments | ||||||
|
||||||
Handling of HTML comment nodes can be controlled by an option. Setting | ||||||
`allowComments` to `true` allows them: | ||||||
|
||||||
```js | ||||||
const config_comments: { allow_comments: true }; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Maybe this should be just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||
element.setHTML("XXX<!-- Hello world! -->XXX", {sanitizer: config_comments}); | ||||||
// <div>XXX<!-- Hello world! -->XXX</div> | ||||||
``` | ||||||
|
||||||
### Shadow Roots | ||||||
|
||||||
Declarative Shadow Roots are an HTML parser feature that parses `<template>` | ||||||
elements with `shadowrootmode` attributes and attaches the result as a | ||||||
shadow root to its parent elements. If this is not desired the | ||||||
`allowShadowRoots` attribute can be set to `false`. In either case, filtering | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I forgot we were going to offer a filter option for these. Currently there's not really a way to remove a shadow root once you attach one so I'm not sure how this will work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've removed this for now, but I do think this is important. Is this a matter of the spec infrastructure not being there - that is, we still need to define a remove operation - or is there a reason why removal is difficult? Naively, I'd guess one could always copy the node that is the shadow host, but without copying the attached root. |
||||||
rules will apply to the shadow roots as well. | ||||||
|
||||||
```js | ||||||
const example_with_shadow_root = "<template shadowrootmode=open><b onlick='alert(1)'>In the shadows.</b></template>"; | ||||||
|
||||||
element.setHTML(example_with_shadow_root); | ||||||
//<div> | ||||||
// #shadow-root: <b>In the shadows.</b> | ||||||
|
||||||
element.setHTMLUnsafe(example_with_shadow_root); | ||||||
//<div> | ||||||
// #shadow-root: <b onlick='alert(1)'>In the shadows.</b> | ||||||
|
||||||
element.setHTML(example_with_shadow_root, {sanitizer: {allowShadowRoot: false}}); | ||||||
// <div> | ||||||
|
||||||
element.setHTMLUnsafe(example_with_shadow_root, {sanitizer: {allowShadowRoot: false}}); | ||||||
// <div> | ||||||
``` | ||||||
|
||||||
### Configuration Errors | ||||||
|
||||||
The configuration allows expressing redundant or even contradictory options. | ||||||
For example, allowing and removing the same element. In cases where the | ||||||
meaning of a configuration dictionary isn't clear, we will | ||||||
throw a `TypeError` instead of making a best effort attempt at interpreting | ||||||
the configuration. A well-formed configuration has the following properties: | ||||||
|
||||||
|
||||||
* It contains either an allow-list or a remove-list, but not both. | ||||||
* This applies to both element and attribute lists, seperately. | ||||||
* Note that any config with both, an allow-list and remove-list, can be | ||||||
rewritten by removing the remove-list items from the allow-list and then | ||||||
droping the remove-list entirely. | ||||||
* A config with an allow-list and a flatten-list makes sense since the items | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about flatten-list + block-list? I think it also makes sense.
In all cases, passing a flatten-list makes sense because it's changing the implicit default. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True. Changed the text to match. |
||||||
in the flatten list preserve their child contents, while the allow-list does | ||||||
not. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They both preserve their children, no? But their children are subject to the configuration as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. :-) I can no longer figure out what I was actually trying to say there... I removed that sentence in favour of "Both allow-lists and remove-lists can be combined with flatten-lists." |
||||||
* Any allow-, remove-, or flatten-list should contain each name at most once. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Each name must not appear in a given list (allow-, flatten-, remove-) several times, but the same name shouldn't appear in several list either. Is this missing? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True. I adapted the description. |
||||||
* This would apply to short forms as well. | ||||||
E.g., `["div", { name: "div", namespace: "http://www.w3.org/1999/xhtml" }]` | ||||||
contains the same name twice and would thus throw. | ||||||
* While lists with duplicate element or attribute names could be coalesced, | ||||||
it is ambiguous what the meaning of duplicate elements with different | ||||||
element-dependent attribute lists would be. | ||||||
* The name must be set. If an `attributes` or `removeAttributes` key is present, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: s/name/element name/ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was meant as applying to both element and attribute lists, so that a dictionary without a name-key set would be invalid. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair enough. Keep it :) |
||||||
it must be non-empty. It may contain an empty list, though. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Something cannot be both present and empty so this requirement is a bit redundant. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. Removed it. |
||||||
|
||||||
```js | ||||||
// Mixing allow and block lists throws. | ||||||
const config_that_mixes_allow_and_block_lists = { | ||||||
elements: ["i", "u"], | ||||||
removeElements: ["u", "s"], | ||||||
}; | ||||||
element.setHTML("bla", {sanitizer: config_that_mixes_allow_and_block_lists}); // throws | ||||||
|
||||||
// Mixing allow and flatten lists works. | ||||||
const config_that_retains_simple_styling_but_most_text = { | ||||||
elements: ["p", "b", "i"], | ||||||
flattenElements: ["div", "span", "em", "u", "s", "li"], | ||||||
}; | ||||||
const styled_text = "<p>Some <span style='color: blue'>colourful</span> <u>styled</u> <b>text</b>"; | ||||||
|
||||||
// <div><p>Some colourful styled <b>text</b></p></div> | ||||||
element.setHTML(styled_text, {sanitizer: config_that_retains_simple_styling_but_most_text}); | ||||||
|
||||||
// Duplicate entries throw. | ||||||
const config_with_dupes = { | ||||||
elements: [ "div", { name: "div", namespace: "http://www.w3.org/1999/xhtml" } ] | ||||||
}; | ||||||
element.setHTML("bla", {sanitizer: config_with_dupes}); // throws. | ||||||
|
||||||
const config_with_dupes2 = { | ||||||
elements: [ | ||||||
{ name: "div", attributes: ["class"] }, | ||||||
{ name: "div", attributes: ["style"] } | ||||||
] }; | ||||||
element.setHTML("bla", config_with_dupes2); // throws. | ||||||
|
||||||
// Undefined attributes. What does it mean? | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is that a WebIDL questions? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was meant to highlight that But.. you're right. Maybe WebIDL solves that problem for us and interprets it in a way that does make it clear. I didn't think of that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the member is optional then There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. Removed the example. |
||||||
const config_with_undefined = { elements: [ { name: "div", attributes: undefined } ] }; | ||||||
element.setHTML("bla", {sanitizer: config_with_undefined}); // throws. | ||||||
``` | ||||||
|
||||||
Listing an attribute in the "global" allow-list and in an element specific one | ||||||
is allowed. In this case, the specific action takes precedence. | ||||||
|
||||||
``` | ||||||
const config_with_local_and_global_attributes = { | ||||||
elements: [ "span", { name: "b", removeAttributes: [ "class" ] } ], | ||||||
attributes: ["class"] | ||||||
}; | ||||||
|
||||||
// <div><span class="a">abc</span> <b>def</b></div> | ||||||
element.setHTML("<span class='a'>abc</span> <b class='b'>def</b>", | ||||||
{sanitizer: config_with_local_and_global_attributes}); | ||||||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this say "XML documents" instead, since
setHTML
parses as HTML?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.