Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more examples. #196

Merged
merged 5 commits into from
Nov 24, 2023
Merged

Add more examples. #196

merged 5 commits into from
Nov 24, 2023

Conversation

otherdaniel
Copy link
Collaborator

This adds additional examples to the explainer and largely reaches "feature parity" with the existing spec, but based on our current thinking.

@otherdaniel otherdaniel requested review from annevk and mozfreddyb July 14, 2023 15:34
Copy link
Collaborator

@mozfreddyb mozfreddyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Let's make sure we have a shared understanding for the TODOs before we merge.

explainer.md Outdated
Comment on lines 280 to 277
How do multiple element list types interact? In order for an element to
pass, it must pass _all_ lists.

You can use allow-lists and block-lists together, but note that this doesn't
usually make much sense since any combination of allow- and block-lists can
be rewritten as an allow-lists of equal or shorter length by removing any
block-list element from the allow-list, or dropping it altogether if there
was no matching allow-list entry in the first place.

```js
// equivalent to: { allowElements: ["i"] }
const config_that_mixes_allow_and_block_lists = {
allowElements: ["i", "u"],
blockElements: ["u", "s"],
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's all correct and matches my understanding. I think it would make more sense for a reader if we defined and then pointed to the processing model (Roughly "check first drop / block first, then allow." like we already have in the "determine the sanitize action for an element" algorithm)

explainer.md Outdated
Comment on lines 345 to 349
TODO: What does `{ name: "bla", attributes: [] }` mean? My gut feeling would
be that `attributes: []` is a no-op and it allows `<bla>` plus whatever
attributes are allowed globally. But if so, the only way to describe an
element without any attributes in presence of a global `allowAttributes`
would be to add a dropElements for it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this question is for allowAttributes: { name: "bla", attributes: [] } right?
Hm, what if empty actually meant "no attributes for this element"? That would make the list look-up "just work" and allow special-casing elements where the global allowAttributes shouldn't apply (I am assuming both rules must allow the attribute).

Copy link
Collaborator Author

@otherdaniel otherdaniel Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant allowElements, not allowAttributes.

Given the input "<img src=bla srcset=blubb alt=blerk>", what do these do:

  • allowElements: [{ name: "img", attributes: [] }]
  • allowElements: ["img"]
  • allowElements: [{ name: "img", attributes: undefined }]

My gut feeling is that these three should mean the same thing. (I could certainly be convinced otherwise.)

On: "allow special-casing elements where the global allowAttributes shouldn't apply (I am assuming both rules must allow the attribute)."

  • Would allowElements: [{ name: "img", attributes: [] }], allowAttributes: ["src", "srcset"] turn the input example above into "", because the attributes are not allowed by both rules?

I'm worried that we're making the "grab bag" of elements and attributes very laborious, which was the primary use case we found when looking at existing sanitizer usage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes, I think all of those three examples should resulti n <img> with no attributes.
  2. Great point. We shouldn't make this too cumbersome, I agree. I guess this could work if our step for "finding the allowed attributes for this current element" were to build the union of what allowElement and allowAttributes are specifying. But then, how would you build an instruction that is "allow an element img with no attributes at all, not even the ones specified elsewhere". Can't think of something but happy to go with a suggestion from you.

explainer.md Outdated
Comment on lines 351 to 356
TODO: I find the syntax weird. In the allowElements case, the `img` entry
matches (and hence allows) both the element and the attribute. In the drop
case it matches only the attribute (but not the element). If we were to
define this identically in both cases - always match element and attributes -
then it simply wouldn't work for drop, because a drop-rule with attributes
would automatically drop the element (and hence all attributes with it).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand? I'm not sure I fully understand the example again.
My understanding:

  • We go through all elements and if it is allowed, the list of attributes
  • We go through all attributes and keep them if allowed

These "keep if allowed" steps would go through drop/block/allow for elements and block/allow for attributes in that order. If all of those 3 (for elements) or 2 (for attributes) lists allow the node, then it should be kept.

If the attribute-list is a list of attributes, we would match as "the attribute is in list" just according to the attribute. If the attribute-list is a list of attributes+elements, then it would only match as "is in list" if both element & attribute fit.

Does that make sense? Did I misunderstand the example?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does blockElements: [{ name: "img", attributes: ["srcset"] }] mean? Does it mean - given the sample input from the earlier comment - an empty tree, or <img src=bla alt=blerk>? If the former, why have the attributes: key at all? If the latter, then what does blockElements: [{ name: "img", attributes: [] }] mean?

The currently proposed syntax allows multiple clauses for one element. What do these mean:

  • allowElements: [{ name: "img", attributes: ["src"]}, { name: "img", attributes: ["srcset"]}]
  • blockElements: [{ name: "img", attributes: ["src"]}, { name: "img", attributes: ["srcset"]}]

(Btw, I'm not complaining; I'm just trying to sound out the edge cases.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the top of my head (still not fully back at my desk mentally), I'm not sure what the attribute-pieces even means for the blockElements case. Block the element if it has those attributes? Block the attributes but only on these elements? I guess the latter..

explainer.md Outdated
Comment on lines 358 to 360
How do multiple attribute rules interact? An attribute is allowed, if at
least one allow-rule (global or per element) allows it, and no drop
rule drops it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would go through the lists in order. If it is dropped, then we don't continue looking into the allow list. Similar to existing spec text. No?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion from today's meeting to explain the issue at hand a bit better.

Example: allowing {name: "div", attribute: "foo"}, {name: "div", attribute: "bar"}and input <div foo="" bar="">

If the algorithm would go through each entry individually, leading to <div>.
If the algorithm would collect all entries and combine it would return the input.

Copy link
Collaborator

@mozfreddyb mozfreddyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for writing these cases down! Some questions above.

explainer.md Outdated
};
const config_block = {
blockElements: [ "style" ] // Allows a lot of things. But not <style>.
blockElements: [ "style" ], // Allows a lot of things. But not <style>.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Allows a lot of things/Allows the default, except <style>/, maybe?

explainer.md Outdated
If you have a specific set of functionality in mind that you wish to
retain and would like to drop everything else as a matter of caution, this
can be accomplished with allow lists. This is probably the most common case.
For example, you might want allow user comments with simple styling, but
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: they are called "formatting elements"

explainer.md Outdated
Comment on lines 301 to 287
If you want to drop `src` attributes from `<input>` elements but retain them
elsewhere, you can use:

```js
const drop_src_attribute_from_input = {
dropElements: [{ name: "input", attributes: ["src"]}],
}
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this drop the element or the attribute?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back at #181, we had dropElements just accepting elements and dropAttributes to accept {name: 'src', elements: ['input'] }, which is a bit inverse of what the allow-properties do

explainer.md Outdated

```js
const config_div_without_anything = {
allow_elements: [ { name: "div", attributes: [] } ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/allow_elements/allowElements/

explainer.md Outdated

### Matching any attribute on a given element

For some use cases, one may wish to allow (or drop) any attribute:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as above

explainer.md Outdated
* While lists with duplicate element or attribute names could be coalesced,
it is ambiguous what the meaning of duplicate elements with different
element-dependent attribute lists would be.
* The name must be set. If an `attributes` key is present, it must be non-empty.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-empty? We have "empty array" examples above. Maybe disambiguate

@otherdaniel otherdaniel force-pushed the examples-2 branch 2 times, most recently from 0d0fb22 to 8974ebd Compare October 5, 2023 15:54
@otherdaniel
Copy link
Collaborator Author

I updated the PR, based on discussion on #199, #198, and the recent meetings. In particular I updated the naming to now use short names for the allow lists (elements and attributes), and to use remove and flatten as the new terms for block and drop.

Would be happy if everyone could take a look.

explainer.md Outdated
Comment on lines 232 to 234
const config_anything_but_style_definitions = {
removeElements: [ "style" ], // Allows a lot of things. But not <style>.
removeAttributes: [ "class", "style" ] // No style or class attribute either.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe instead of "anything" or "a lot of things", let's say "the defaults except ..."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

explainer.md Outdated
* While lists with duplicate element or attribute names could be coalesced,
it is ambiguous what the meaning of duplicate elements with different
element-dependent attribute lists would be.
* The name must be set. If an `attributes` or `removeAttributes` key is present,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/name/element name/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was meant as applying to both element and attribute lists, so that a dictionary without a name-key set would be invalid.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Keep it :)

explainer.md Outdated
] };
element.setHTML("bla", config_with_dupes2); // throws.

// Undefined attributes. What does it mean?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a WebIDL questions?
I don't even know where to look that up... @annevk can you help?

Copy link
Collaborator Author

@otherdaniel otherdaniel Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was meant to highlight that { name: "div", attributes: undefined } could mean the same as either { name: "div" } (allows the default attrs for div) or { name: "div", attributes: [] } (allows no attrs for div). At least to me it's not obvious what interpretation to pick.

But.. you're right. Maybe WebIDL solves that problem for us and interprets it in a way that does make it clear. I didn't think of that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the member is optional then undefined means the same thing as omission. And IDL takes care of that indeed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Removed the example.

Copy link
Collaborator

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the quick turnaround @otherdaniel!

This looks good and I'm happy to see this merged with these caveats:

  • We'll need to continue to discuss the "default" configuration and how it "combines" with a userland configuration.
  • I'm not sure an option around shadow roots is workable per se. It might have to be a shadowHost option that you can set to "keep" (default), "remove", or "flatten", but even that could be problematic if we ever offer this outside of these parser contexts as we don't want you to be able to remove shadow roots of arbitrary elements.

explainer.md Outdated
preserving its textual content.

```js
const config_that_removes_elements_but_preserves_their_text_content = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const config_that_removes_elements_but_preserves_their_text_content = {
const config_that_removes_elements_but_preserves_their_children = {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

explainer.md Outdated
Comment on lines 295 to 296
For an element that allows any of the default-allowed attributes, you can
use the special string `"*"`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this goes into the discussion of how you combine configurations, which we haven't really explored recently.

I guess what this is saying is that if the default configuration allows an element E with attributes A1, A2, and A3, and you create a configuration with just E, none of the attributes are allowed. However, if you specify * they all are.

However, what if the default configuration listed A3 as a global attribute? Or would the default configuration not have those as we can enumerate all the elements? (Can we though with custom elements?)

I think this needs some more exploration or at least agreement on what the default configuration is going to be, which #188 plays a fairly big role in.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I'm no longer sure whether we need this at all.

I think for the allow-case, we don't even need the "*" because... that's what the default is anyhow. I think for allow, we only need to be able to specify "no attributes at all", for which the empty list should do just fine.

And I think for the remove-case, we don't really need this either: Remove-nothing is taken care of by the default; and remove-everything is taken care of by allowing nothing. So... I guess we don't even need this section, or the "*" special syntax.

explainer.md Outdated
`allowComments` to `true` allows them:

```js
const config_comments: { allow_comments: true };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const config_comments: { allow_comments: true };
const config_comments: { allowComments: true };

Maybe this should be just comments to align with the other (new) names?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

explainer.md Outdated
Declarative Shadow Roots are an HTML parser feature that parses `<template>`
elements with `shadowrootmode` attributes and attaches the result as a
shadow root to its parent elements. If this is not desired the
`allowShadowRoots` attribute can be set to `false`. In either case, filtering
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot we were going to offer a filter option for these. Currently there's not really a way to remove a shadow root once you attach one so I'm not sure how this will work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this for now, but I do think this is important.

Is this a matter of the spec infrastructure not being there - that is, we still need to define a remove operation - or is there a reason why removal is difficult?

Naively, I'd guess one could always copy the node that is the shadow host, but without copying the attached root.

explainer.md Outdated
droping the remove-list entirely.
* A config with an allow-list and a flatten-list makes sense since the items
in the flatten list preserve their child contents, while the allow-list does
not.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They both preserve their children, no? But their children are subject to the configuration as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-) I can no longer figure out what I was actually trying to say there... I removed that sentence in favour of "Both allow-lists and remove-lists can be combined with flatten-lists."

explainer.md Outdated
it is ambiguous what the meaning of duplicate elements with different
element-dependent attribute lists would be.
* The name must be set. If an `attributes` or `removeAttributes` key is present,
it must be non-empty. It may contain an empty list, though.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something cannot be both present and empty so this requirement is a bit redundant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Removed it.

explainer.md Outdated
] };
element.setHTML("bla", config_with_dupes2); // throws.

// Undefined attributes. What does it mean?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the member is optional then undefined means the same thing as omission. And IDL takes care of that indeed.

explainer.md Show resolved Hide resolved
explainer.md Outdated
* Note that any config with both, an allow-list and remove-list, can be
rewritten by removing the remove-list items from the allow-list and then
droping the remove-list entirely.
* A config with an allow-list and a flatten-list makes sense since the items
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about flatten-list + block-list? I think it also makes sense.

  1. When passing an allow-list, the implicit default is to block, so passing a block-list doesn't make sense.
  2. When passing a block-list, the implicit default is to allow.
  3. When passing both, the implicit default is to block, but then it doesn't make sense to define the block-list (back to point 1).

In all cases, passing a flatten-list makes sense because it's changing the implicit default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Changed the text to match.

explainer.md Outdated
* A config with an allow-list and a flatten-list makes sense since the items
in the flatten list preserve their child contents, while the allow-list does
not.
* Any allow-, remove-, or flatten-list should contain each name at most once.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each name must not appear in a given list (allow-, flatten-, remove-) several times, but the same name shouldn't appear in several list either. Is this missing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I adapted the description.

explainer.md Outdated Show resolved Hide resolved
Co-authored-by: Anne van Kesteren <[email protected]>
explainer.md Outdated
@@ -148,6 +147,8 @@ Document.parseHTML(example_tr); // <html><head></head><body>A table row.</body>
All of these would have had identical results if the "unsafe" variants had
been used.

### Parsing XML
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say "XML documents" instead, since setHTML parses as HTML?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@annevk
Copy link
Collaborator

annevk commented Oct 18, 2023

I don't see #101 addressed here. @otherdaniel is that intentional?

@annevk annevk mentioned this pull request Oct 18, 2023
@otherdaniel otherdaniel merged commit 0bcc942 into WICG:main Nov 24, 2023
1 check passed
github-actions bot added a commit that referenced this pull request Nov 24, 2023
SHA: 0bcc942
Reason: push, by otherdaniel

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@otherdaniel otherdaniel deleted the examples-2 branch April 22, 2024 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants