index.bs

<pre class="metadata">
Title: HTML Sanitizer API
Status: CG-DRAFT
Group: WICG
URL: https://wicg.github.io/sanitizer-api/
Repository: WICG/sanitizer-api
Shortname: sanitizer-api
Level: 1
Editor: Frederik Braun 68466, Mozilla, fbraun@mozilla.com, https://frederik-braun.com
Editor: Mario Heiderich, Cure53, mario@cure53.de, https://cure53.de
Editor: Daniel Vogelheim, Google LLC, vogelheim@google.com, https://www.google.com
Abstract:
  This document specifies a set of APIs which allow developers to take
  untrusted HTML input and sanitize it for safe insertion into a document's
  DOM.
Indent: 2
Work Status: exploring
Boilerplate: omit conformance
Markup Shorthands: css off, markdown on
</pre>
<pre class="link-defaults">
spec:html; type:attribute; text: innerHTML
spec:dom; type:method; text: createDocumentFragment
spec:html; type:dfn; text: template contents
</pre>
<pre class="anchors">
text: window.toStaticHTML(); type: method; url: https://msdn.microsoft.com/en-us/library/cc848922(v=vs.85).aspx
text: parse HTML from a string; type: dfn; url: https://html.spec.whatwg.org/#parse-html-from-a-string
</pre>
<pre class="biblio">
{
  "DOMPURIFY": {
    "href": "https://github.com/cure53/DOMPurify",
    "title": "DOMPurify",
    "publisher": "Cure53"
  },
  "MXSS": {
    "href": "https://cure53.de/fp170.pdf",
    "title": "mXSS Attacks: Attacking well-secured Web-Applications by using innerHTML Mutations",
    "publisher": "Ruhr-Universität Bochum"
  }
}
</pre>
<style>
/* Boxes around algorithms. */
[data-algorithm]:not(.heading) {
  padding: .5em;
  border: thin solid #ddd; border-radius: .5em;
  margin: .5em calc(-0.5em - 1px);
}
[data-algorithm]:not(.heading) > :first-child { margin-top: 0; }
[data-algorithm]:not(.heading) > :last-child { margin-bottom: 0; }
[data-algorithm] [data-algorithm] { margin: 1em 0; }
</style>


# Introduction # {#intro}

<em>This section is not normative.</em>

Web applications often need to work with strings of HTML on the client side,
perhaps as part of a client-side templating solution, perhaps as part of
rendering user generated content, etc. It is difficult to do so in a safe way.
The naive approach of joining strings together and stuffing them into
an {{Element}}'s {{Element/innerHTML}} is fraught with risk, as it can cause
JavaScript execution in a number of unexpected ways.

Libraries like [[DOMPURIFY]] attempt to manage this problem by carefully
parsing and sanitizing strings before insertion, by constructing a DOM and
filtering its members through an allow-list. This has proven to be a fragile
approach, as the parsing APIs exposed to the web don't always map in
reasonable ways to the browser's behavior when actually rendering a string as
HTML in the "real" DOM. Moreover, the libraries need to keep on top of
browsers' changing behavior over time; things that once were safe may turn
into time-bombs based on new platform-level features.

The browser has a fairly good idea of when it is going to
execute code. We can improve upon the user-space libraries by teaching the
browser how to render HTML from an arbitrary string in a safe manner, and do
so in a way that is much more likely to be maintained and updated along with
the browser's own changing parser implementation. This document outlines an
API which aims to do just that.

## Goals ## {#goals}

*   Mitigate the risk of DOM-based cross-site scripting attacks by providing
    developers with mechanisms for handling user-controlled HTML which prevent
    direct script execution upon injection.

*   Make HTML output safe for use within the current user agent, taking into
    account its current understanding of HTML.

*   Allow developers to override the default set of elements and attributes.
    Adding certain elements and attributes can prevent
    <a href="https://github.com/google/security-research-pocs/tree/master/script-gadgets">script gadget</a>
    attacks.

## API Summary ## {#api-summary}

The Sanitizer API offers functionality to parse a string containing HTML into
a DOM tree, and to filter the resulting tree according to a user-supplied
configuration. The methods come in two by two flavours:

* <dfn>Safe and unsafe</dfn>: The "safe" methods will not generate any markup that
  executes script. That is, they should be safe from XSS. The "unsafe" methods
  will parse and filter whatever they're supposed to.
  See also: [[#security-considerations]].
* Context: Methods are defined on {{Element}} and {{ShadowRoot}} and will
  replace these {{Node}}'s children, and are largely analogous to {{Element/innerHTML}}.
  There are also static methods on the {{Document}}, which parse an entire
  document are largely analogous to {{DOMParser}}.{{parseFromString()}}.


# Framework # {#framework}

## Sanitizer API ## {#sanitizer-api}

The {{Element}} interface defines two methods, {{Element/setHTML()}} and
{{Element/setHTMLUnsafe()}}. Both of these take a {{DOMString}} with HTML
markup, and an optional configuration.

<pre class="idl extract">
partial interface Element {
  [CEReactions] undefined setHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLUnsafeOptions options = {});
  [CEReactions] undefined setHTML(DOMString html, optional SetHTMLOptions options = {});
};
</pre>

<div algorithm>
{{Element}}'s <dfn for="Element" export>setHTMLUnsafe</dfn>(|html|, |options|) method steps are:

1. Let |compliantHTML| be the result of invoking the [$Get Trusted Type compliant string$] algorithm with
   {{TrustedHTML}}, [=this=]'s [=relevant global object=], |html|, "Element setHTMLUnsafe", and "script".
1. Let |target| be [=this=]'s [=template contents=] if [=this=] is a
   {{HTMLTemplateElement|template}} element; otherwise [=this=].
1. [=Set and filter HTML=] given |target|, [=this=], |compliantHTML|, |options|, and false.

</div>

<div algorithm>
{{Element}}'s <dfn for="Element" export>setHTML</dfn>(|html|, |options|) method steps are:

1. Let |target| be [=this=]'s [=template contents=] if [=this=] is a
   {{HTMLTemplateElement|template}}; otherwise [=this=].
1. [=Set and filter HTML=] given |target|, [=this=], |html|, |options|, and true.

</div>

<pre class="idl extract">
partial interface ShadowRoot {
  [CEReactions] undefined setHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLUnsafeOptions options = {});
  [CEReactions] undefined setHTML(DOMString html, optional SetHTMLOptions options = {});
};
</pre>

These methods are mirrored on the {{ShadowRoot}}:

<div algorithm>
{{ShadowRoot}}'s <dfn for="ShadowRoot" export>setHTMLUnsafe</dfn>(|html|, |options|) method steps are:

1. Let |compliantHTML| be the result of invoking the [$Get Trusted Type compliant string$] algorithm with
   {{TrustedHTML}}, [=this=]'s [=relevant global object=], |html|, "ShadowRoot setHTMLUnsafe", and "script".
1. [=Set and filter HTML=] using [=this=],
   [=this=]'s [=shadow host=] (as context element),
   |compliantHTML|, |options|, and false.

</div>

<div algorithm>
{{ShadowRoot}}'s <dfn for="ShadowRoot" export>setHTML</dfn>(|html|, |options|)</dfn> method steps are:

1. [=Set and filter HTML=] using [=this=] (as target), [=this=] (as context element),
   |html|, |options|, and true.

</div>

The {{Document}} interface gains two new methods which parse an entire {{Document}}:

<pre class="idl extract">
partial interface Document {
  static Document parseHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLUnsafeOptions options = {});
  static Document parseHTML(DOMString html, optional SetHTMLOptions options = {});
};
</pre>

<div algorithm>
The <dfn for="Document" export>parseHTMLUnsafe</dfn>(|html|, |options|) method steps are:

1. Let |compliantHTML| be the result of invoking the [$Get Trusted Type compliant string$] algorithm with
   {{TrustedHTML}}, [=this=]'s [=relevant global object=], |html|, "Document parseHTMLUnsafe", and "script".
1. Let |document| be a new {{Document}}, whose [=Document/content type=] is "text/html".

   Note: Since |document| does not have a browsing context, scripting is disabled.
1. Set |document|'s [=allow declarative shadow roots=] to true.
1. [=Parse HTML from a string=] given |document| and |compliantHTML|.
1. Let |sanitizer| be the result of calling [=get a sanitizer instance from options=]
   with |options|.
1. Call [=sanitize=] on |document|'s [=tree/root|root node=] with |sanitizer| and false.
1. Return |document|.

</div>


<div algorithm>
The <dfn for="Document" export>parseHTML</dfn>(|html|, |options|) method steps are:

1. Let |document| be a new {{Document}}, whose [=Document/content type=] is "text/html".

   Note: Since |document| does not have a browsing context, scripting is disabled.
1. Set |document|'s [=allow declarative shadow roots=] to true.
1. [=Parse HTML from a string=] given |document| and |html|.
1. Let |sanitizer| be the result of calling [=get a sanitizer instance from options=]
   with |options|.
1. Call [=sanitize=] on |document|'s [=tree/root|root node=] with |sanitizer| and true.
1. Return |document|.

</div>

## SetHTML options and the configuration object. ## {#configobject}

The family of {{Element/setHTML()}}-like methods all accept an options
dictionary. Right now, only one member of this dictionary is defined:

<pre class=idl>
enum SanitizerPresets { "default" };
dictionary SetHTMLOptions {
  (Sanitizer or SanitizerConfig or SanitizerPresets) sanitizer = "default";
};
dictionary SetHTMLUnsafeOptions {
  (Sanitizer or SanitizerConfig or SanitizerPresets) sanitizer = {};
};
</pre>

The {{Sanitizer}} configuration object encapsulates a filter configuration.
The same configuration can be used with both <a lt="safe and unsafe">"safe"
or "unsafe"</a> methods, where the "safe" methods perform an implicit
{{removeUnsafe}} operation on the passed in configuration and have a default
configuration when none is passed. The intent is
that one (or a few) configurations will be built-up early on in a page's
lifetime, and can then be used whenever needed. This allows implementations
to pre-process configurations.

The configuration object can be queried to return a configuration dictionary.
It can also be modified directly.

<pre class=idl>
[Exposed=(Window,Worker)]
interface Sanitizer {
  constructor(optional (SanitizerConfig or SanitizerPresets) configuration = "default");

  // Query configuration:
  SanitizerConfig get();

  // Modify a Sanitizer's lists and fields:
  undefined allowElement(SanitizerElementWithAttributes element);
  undefined removeElement(SanitizerElement element);
  undefined replaceElementWithChildren(SanitizerElement element);
  undefined allowAttribute(SanitizerAttribute attribute);
  undefined removeAttribute(SanitizerAttribute attribute);
  undefined setComments(boolean allow);
  undefined setDataAttributes(boolean allow);

  // Remove markup that executes script. May modify multiple lists:
  undefined removeUnsafe();
};
</pre>

A {{Sanitizer}} has an associated <dfn for="Sanitizer">configuration</dfn>, a {{SanitizerConfig}}.

<div algorithm>
The <dfn for="Sanitizer" export>constructor</dfn>(|configuration|)
method steps are:

1. If |configuration| is a {{SanitizerPresets}} [=string=], then:
    1. [=Assert=]: |configuration| [=is=] {{SanitizerPresets/default}}.
    1. Set |configuration| to the [=built-in safe default configuration=].
1. Let |valid| be the return value of [=set a configuration|setting=] |configuration| on [=this=].
1. If |valid| is false, then throw a {{TypeError}}.

</div>

<div algorithm>
The <dfn for="Sanitizer" export>get</dfn>() method steps are to return the value of [=this=]'s [=Sanitizer/configuration=].
</div>

<div algorithm>
The <dfn for="Sanitizer" export>allowElement</dfn>(|element|) method steps are to [=allow an element=] with |element| and [=this=]'s [=Sanitizer/configuration=].
</div>

<div algorithm>
The <dfn for="Sanitizer" export>removeElement</dfn>(|element|) method steps are
to [=remove an element=] with |element| and [=this=]'s [=Sanitizer/configuration=].
</div>

<div algorithm>
The <dfn for="Sanitizer" export>replaceElementWithChildren</dfn>(|element|) method steps are to [=replace an element with its children=] with |element| and [=this=]'s [=Sanitizer/configuration=].
</div>

<div algorithm>
The <dfn for="Sanitizer" export>allowAttribute</dfn>(|attribute|) method steps are to [=allow an attribute=] with |attribute| and [=this=]'s [=Sanitizer/configuration=].
</div>


<div algorithm>
The <dfn for="Sanitizer" export>removeAttribute</dfn>(|attribute|) method steps are to [=Sanitizer/remove an attribute=] with |attribute| and [=this=]'s [=Sanitizer/configuration=].
</div>

<div algorithm>
The <dfn for="Sanitizer" export>setComments</dfn>(|allow|) method steps to [=set comments=] with |allow| and [=this=]'s [=Sanitizer/configuration=].
</div>

<div algorithm>
The <dfn for="Sanitizer" export>setDataAttributes</dfn>(|allow|) method steps are to [=set data attributes=] with |allow| and [=this=]'s [=Sanitizer/configuration=].
</div>

<div algorithm>
The <dfn for="Sanitizer" export>removeUnsafe</dfn>() method steps are to
update [=this=]'s [=Sanitizer/configuration=] with the result of calling [=remove unsafe=]
on [=this=]'s [=Sanitizer/configuration=].
</div>

## The Configuration Dictionary ## {#config}

<pre class=idl>
dictionary SanitizerElementNamespace {
  required DOMString name;
  DOMString? _namespace = "http://www.w3.org/1999/xhtml";
};

// Used by "elements"
dictionary SanitizerElementNamespaceWithAttributes : SanitizerElementNamespace {
  sequence&lt;SanitizerAttribute> attributes;
  sequence&lt;SanitizerAttribute> removeAttributes;
};

typedef (DOMString or SanitizerElementNamespace) SanitizerElement;
typedef (DOMString or SanitizerElementNamespaceWithAttributes) SanitizerElementWithAttributes;

dictionary SanitizerAttributeNamespace {
  required DOMString name;
  DOMString? _namespace = null;
};
typedef (DOMString or SanitizerAttributeNamespace) SanitizerAttribute;

dictionary SanitizerConfig {
  sequence&lt;SanitizerElementWithAttributes> elements;
  sequence&lt;SanitizerElement> removeElements;
  sequence&lt;SanitizerElement> replaceWithChildrenElements;

  sequence&lt;SanitizerAttribute> attributes;
  sequence&lt;SanitizerAttribute> removeAttributes;

  boolean comments;
  boolean dataAttributes;
};
</pre>

# Algorithms # {#algorithms}

<div algorithm>
To <dfn>set and filter HTML</dfn>, given an {{Element}} or {{DocumentFragment}}
|target|, an {{Element}} |contextElement|, a [=string=] |html|, and a
[=dictionary=] |options|, and a [=boolean=] |safe|:

1. If |safe| and |contextElement|'s [=Element/local name=] is "`script`" and
   |contextElement|'s [=Element/namespace=] is the [=HTML namespace=] or the
   [=SVG namespace=], then return.
1. Let |sanitizer| be the result of calling [=get a sanitizer instance from options=]
   with |options|.
1. Let |newChildren| be the result of the HTML [=fragment parsing algorithm steps=]
   given |contextElement|, |html|, and true.
1. Let |fragment| be a new {{DocumentFragment}} whose [=node document=] is |contextElement|'s [=node document=].
1. [=list/iterate|For each=] |node| in |newChildren|, [=list/append=] |node| to |fragment|.
1. Run [=sanitize=] on |fragment| using |sanitizer| and |safe|.
1. [=Replace all=] with |fragment| within |target|.

</div>

<div algorithm>
To <dfn for="SanitizerConfig">get a sanitizer instance from options</dfn> from
a [=dictionary=] |options|, do:

Note: This algorithm works for both {{SetHTMLOptions}} and
    {{SetHTMLUnsafeOptions}}. They only differ in the defaults.

1. Let |sanitizerSpec| be "{{SanitizerPresets/default}}".
1. If |options|["{{SetHTMLOptions/sanitizer}}"] [=map/exists=], then:
   1. Set |sanitizerSpec| to |options|["{{SetHTMLOptions/sanitizer}}"]
1. [=Assert=]: |sanitizerSpec| is either a {{Sanitizer}} instance,
   a [=string=] which is a {{SanitizerPresets}} member, or a [=dictionary=].
1. If |sanitizerSpec| is a [=string=]:
   1. [=Assert=]: |sanitizerSpec| [=is=] "{{SanitizerPresets/default}}"
   1. Set |sanitizerSpec| to the [=built-in safe default configuration=].
1. [=Assert=]: |sanitizerSpec| is either a {{Sanitizer}} instance,
   or a [=dictionary=].
1. If |sanitizerSpec| is a [=dictionary=]:
   1. Let |sanitizer| be a new {{Sanitizer}} instance.
   1. Let |setConfigurationResult| be the result of [=set a configuration=]
      with |sanitizerSpec| on |sanitizer|.
   1. If |setConfigurationResult| is false, [=throw=] a {{TypeError}}.
   1. Set |sanitizerSpec| to |sanitizer|.
1. [=Assert=]: |sanitizerSpec| is a {{Sanitizer}} instance.
1. Return |sanitizerSpec|.

</div>

## Sanitization Algorithms ## {#sanitization}

<div algorithm>
For the main <dfn>sanitize</dfn> operation, using a {{ParentNode}} |node|, a
{{Sanitizer}} |sanitizer|, and a [=boolean=] |safe|, run these steps:

1. Let |configuration| be the value of |sanitizer|'s [=Sanitizer/configuration=].
1. If |safe| is true, then set |configuration| to the result of calling [=remove unsafe=] on |configuration|.
1. Call [=sanitize core=] on |node|, |configuration|, and with [=handleJavascriptNavigationUrls=] set to |safe|.

</div>

<div algorithm="sanitize core">
The <dfn>sanitize core</dfn> operation,
using a {{ParentNode}} |node|, a {{SanitizerConfig}} |configuration|, and a
[=boolean=] <var><dfn>handleJavascriptNavigationUrls</dfn></var>, iterates over the DOM tree
beginning with |node|, and may recurse to handle some special cases (e.g.
template contents). It consistes of these steps:

1. Let |current| be |node|.
1. [=list/iterate|For each=] |child| in |current|'s [=tree/children=]:
  1. [=Assert=]: |child| [=implements=] {{Text}}, {{Comment}}, or {{Element}}.

     Note: Currently, this algorithm is only called on output of the HTML
           parser for which this assertion should hold. If in the future
           this algorithm will be used in different contexts, this assumption
           needs to be re-examined.
  1. If |child| [=implements=] {{Text}}:
    1. [=continue=].
  1. else if |child| [=implements=] {{Comment}}:
    1. If |configuration|["{{SanitizerConfig/comments}}"] is not true:
      1. [=/remove=] |child|.
  1. else:
    1. Let |elementName| be a {{SanitizerElementNamespace}} with |child|'s
       [=Element/local name=] and [=Element/namespace=].
    1. If |configuration|["{{SanitizerConfig/removeElements}}"] [=SanitizerConfig/contains=] |elementName|, or if |configuration|["{{SanitizerConfig/elements}}"] is not [=list/empty=] and does not [=SanitizerConfig/contain=] |elementName|:
       1. [=/remove=] |child|.
    1. If |configuration|["{{SanitizerConfig/replaceWithChildrenElements}}"] [=SanitizerConfig/contains=] |elementName|:
      1. Call [=sanitize core=] on |child| with |configuration| and
          |handleJavascriptNavigationUrls|.
      1. Call [=replace all=] with |child|'s [=tree/children=] within |child|.
    1. If |elementName| [=equals=] &laquo;[ "`name`" &rightarrow; "`template`",
       "`namespace`" &rightarrow; [=HTML namespace=] ]&raquo;
      1. Then call [=sanitize core=] on |child|'s [=template contents=] with
          |configuration| and |handleJavascriptNavigationUrls|.
    1. If |child| is a [=shadow host=]:
      1. Then call [=sanitize core=] on |child|'s [=Element/shadow root=] with
          |configuration| and |handleJavascriptNavigationUrls|.
    1. [=list/iterate|For each=] |attribute| in |child|'s [=Element/attribute list=]:
      1. Let |attrName| be a {{SanitizerAttributeNamespace}} with |attribute|'s
         [=Attr/local name=] and [=Attr/namespace=].
      1. If |configuration|["{{SanitizerConfig/removeAttributes}}"]
           [=SanitizerConfig/contains=] |attrName|:
         1. Remove |attribute| from |child|.
      1. If |configuration|["{{SanitizerConfig/elements}}"]["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"]
           [=SanitizerConfig/contains=] |attrName|:
         1. Remove |attribute| from |child|.

      1. If all of the following are false, then remove |attribute| from |child|.
         - |configuration|["{{SanitizerConfig/attributes}}"] [=list/exists=] and
           [=SanitizerConfig/contains=] |attrName|
         - |configuration|["{{SanitizerConfig/elements}}"]["{{SanitizerElementNamespaceWithAttributes/attributes}}"]
           [=SanitizerConfig/contains=] |attrName|
         - "data-" is a [=code unit prefix=] of [=Attr/local name=] and
            [=Attr/namespace=] is `null` and
            |configuration|["{{SanitizerConfig/dataAttributes}}"] is true
      1. If |handleJavascriptNavigationUrls| and &laquo;[|elementName|, |attrName|]&raquo; matches an entry in the
         [=built-in navigating URL attributes list=], and if |attribute|'s [=protocol=] is
         "`javascript:`":
         1. Then remove |attribute| from |child|.

</div>

## Configuration Processing ## {#configuration-processing}

<div algorithm>
To <dfn for="SanitizerConfig">allow an element</dfn> |element| with a {{SanitizerConfig}} |configuration|, do:

1. Set |element| to the result of [=canonicalize a sanitizer element with attributes=] with |element|.
1. [=SanitizerConfig/Remove=] |element| from |configuration|["{{SanitizerConfig/elements}}"].
1. [=list/Append=] |element| to |configuration|["{{SanitizerConfig/elements}}"].
1. [=SanitizerConfig/Remove=] |element| from |configuration|["{{SanitizerConfig/removeElements}}"].
1. [=SanitizerConfig/Remove=] |element| from |configuration|["{{SanitizerConfig/replaceWithChildrenElements}}"].

NOTE: Handling of [=allowElement=] is a little more complicated than the other
    methods, because the element allow list can have per-element allow- and
    remove-attribute lists. We first remove the given element from the list
    before then adding it, which has the effect of re-setting (rather than
    merging or elsehow modifying) the per-element list to whatever is passed
    in. In other words, the per-element allow- and remove-lists can only be
    set as a whole.

NOTE: [=SanitizerConfig/Remove=] matches on name and namespace, so adding an
    element with attributes would still remove the matching element from the
    {{SanitizerConfig/removeElements}} and {{SanitizerConfig/replaceWithChildrenElements}} lists.

</div>

<div algorithm>
To <dfn for="Sanitizer">remove an element</dfn> |element| from a {{SanitizerConfig}} |configuration|, do:

1. Set |element| to the result of [=canonicalize a sanitizer element=] with |element|.
1. [=SanitizerConfig/Add=] |element| to |configuration|["{{SanitizerConfig/removeElements}}"].
1. [=SanitizerConfig/Remove=] |element| from |configuration|["{{SanitizerConfig/elements}}"] list.
1. [=SanitizerConfig/Remove=] |element| from |configuration|["{{SanitizerConfig/replaceWithChildrenElements}}"].

</div>

<div algorithm>
To <dfn for="Sanitizer">replace an element with its children</dfn> |element| from a {{SanitizerConfig}} |configuration|, do:

1. Set |element| to the result of [=canonicalize a sanitizer element=] with |element|.
1. [=SanitizerConfig/Add=] |element| to |configuration|["{{SanitizerConfig/replaceWithChildrenElements}}"].
1. [=SanitizerConfig/Remove=] |element| from |configuration|["{{SanitizerConfig/removeElements}}"].
1. [=SanitizerConfig/Remove=] |element| from |configuration|["{{SanitizerConfig/elements}}"] list.

</div>

<div algorithm>
To <dfn for="Sanitizer">allow an attribute</dfn> |attribute| on a {{SanitizerConfig}} |configuration|, do:

1. Set |attribute| to the result of [=canonicalize a sanitizer attribute=] with |attribute|.
1. [=SanitizerConfig/Add=] |attribute| to |configuration|["{{SanitizerConfig/attributes}}"].
1. [=SanitizerConfig/Remove=] |attribute| from |configuration|["{{SanitizerConfig/removeAttributes}}"].

</div>

<div algorithm>
To <dfn for="Sanitizer">remove an attribute</dfn> |attribute| from a {{SanitizerConfig}} |configuration|, do:

1. Set |attribute| to the result of [=canonicalize a sanitizer attribute=] with |attribute|.
1. [=SanitizerConfig/Add=] |attribute| to |configuration|["{{SanitizerConfig/removeAttributes}}"].
1. [=SanitizerConfig/Remove=] |attribute| from |configuration|["{{SanitizerConfig/attributes}}"].

</div>

<div algorithm>
To <dfn for="Sanitizer">set comments</dfn> with |allow| on a {{SanitizerConfig}} |configuration|, do:

1. Set |configuration|["{{SanitizerConfig/comments}}"] to |allow|.

</div>

<div algorithm>
To <dfn for="Sanitizer">set data attributes</dfn> with |allow| on a {{SanitizerConfig}} |configuration|, do:

1. Set |configuration|["{{SanitizerConfig/dataAttributes}}"] to |allow|.

</div>

<div algorithm>

Note: While this algorithm is called [=remove unsafe=], we use
    <a href="#security-considerations">the term "unsafe" strictly in the sense
    of this spec</a>, to denote content that will
    execute JavaScript when inserted into the document. In other words, this
    method will remove oportunities for XSS.

To <dfn for="SanitizerConfig">remove unsafe</dfn> from a |configuration|, do this:

1. [=Assert=]: The [=built-in safe baseline configuration=] has
   {{SanitizerConfig/removeElements}} and {{SanitizerConfig/removeAttributes}}
   keys set, but not {{SanitizerConfig/elements}},
   {{SanitizerConfig/replaceWithChildrenElements}}, or
   {{SanitizerConfig/attributes}}.
1. Let |result| be a copy of |configuration|.
1. [=list/For each=] |element| in
   [=built-in safe baseline configuration=][{{SanitizerConfig/removeElements}}]:
    1. Call [=remove an element=] with |element| and |result|.
1. [=list/For each=] |attribute| in
   [=built-in safe baseline configuration=][{{SanitizerConfig/removeAttributes}}]:
    1. Call [=Sanitizer/remove an attribute=] with |attribute| and |result|.
1. Return |result|.

</div>

<div algorithm>
To <dfn for="Sanitizer">set a configuration</dfn>, given a [=dictionary=] |configuration| and a {{Sanitizer}} |sanitizer|:

1. [=list/iterate|For each=] |element| of |configuration|["{{SanitizerConfig/elements}}"] do:
  1. Call [=allow an element=] with |element| and |sanitizer|.
1. [=list/iterate|For each=] |element| of |configuration|["{{SanitizerConfig/removeElements}}"] do:
  1. Call [=remove an element=] with |element| and |sanitizer|.
1. [=list/iterate|For each=] |element| of |configuration|["{{SanitizerConfig/replaceWithChildrenElements}}"] do:
  1. Call [=replace an element with its children=] with |element| and |sanitizer|.
1. [=list/iterate|For each=] |attribute| of |configuration|["{{SanitizerConfig/attributes}}"] do:
  1. Call [=allow an attribute=] with |attribute| and |sanitizer|.
1. [=list/iterate|For each=] |attribute| of |configuration|["{{SanitizerConfig/removeAttributes}}"] do:
  1. Call [=Sanitizer/remove an attribute=] with |attribute| and |sanitizer|.
1. Call [=set comments=] with |configuration|["{{SanitizerConfig/comments}}"] and |sanitizer|.
1. Call [=set data attributes=] with |configuration|["{{SanitizerConfig/dataAttributes}}"] and |sanitizer|.
1. Return whether all of the following are true:
    - [=list/size=] of |configuration|["{{SanitizerConfig/elements}}"] equals
      [=list/size=] of [=this=]'s [=Sanitizer/configuration=]["{{SanitizerConfig/elements}}"].
    - [=list/size=] of |configuration|["{{SanitizerConfig/removeElements}}"] equals
      [=list/size=] of [=this=]'s [=Sanitizer/configuration=]["{{SanitizerConfig/removeElements}}"].
    - [=list/size=] of |configuration|["{{SanitizerConfig/replaceWithChildrenElements}}"] equals
      [=list/size=] of [=this=]'s [=Sanitizer/configuration=]["{{SanitizerConfig/replaceWithChildrenElements}}"].
    - [=list/size=] of |configuration|["{{SanitizerConfig/attributes}}"] equals
      [=list/size=] of [=this=]'s [=Sanitizer/configuration=]["{{SanitizerConfig/attributes}}"].
    - [=list/size=] of |configuration|["{{SanitizerConfig/removeAttributes}}"] equals
      [=list/size=] of [=this=]'s [=Sanitizer/configuration=]["{{SanitizerConfig/removeAttributes}}"].
    - Either |configuration|["{{SanitizerConfig/elements}}"] or
      |configuration|["{{SanitizerConfig/removeElements}}"] [=map/exist=],
      or neither, but not both.
    - Either |configuration|["{{SanitizerConfig/attributes}}"] or
      |configuration|["{{SanitizerConfig/removeAttributes}}"] [=map/exist=],
      or neither, but not both.

Note: Previous versions of this spec had elaborate definitions of how to
  canonicalize a config. This has now effectively been moved into the method
  definitions.

Note: This operation is defined in terms of the manipulation methods on the
    {{Sanitizer}}. Those methods remove matching entries from other lists.
    The size equality steps in the last step would then catch this.
    For example:
    `{ allow: ["div", "div"] }` would create a Sanitizer with one element in
    the allow list. The final test would then return false, which would cause
    the caller to throw an exception.

Issue: This is still missing error checks for the per-element attribute lists
    and syntax errors.

</div>

<div algorithm>
In order to <dfn>canonicalize a sanitizer element with attributes</dfn> a {{SanitizerElementWithAttributes}} |element|, do this:

1. Let |result| be the result of [=canonicalize a sanitizer element=] with |element|.
1. If |element| is a [=dictionary=]:
   1. [=list/iterate|For each=] |attribute| in
      |element|["{{SanitizerElementNamespaceWithAttributes/attributes}}"]:
      1. [=SanitizerConfig/Add=] the result of [=canonicalize a sanitizer attribute=] with |attribute| to |result|["{{SanitizerElementNamespaceWithAttributes/attributes}}"].
   1. [=list/iterate|For each=] |attribute| in
      |element|["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"]:
      1. [=SanitizerConfig/Add=] the result of [=canonicalize a sanitizer attribute=] with |attribute| to |result|["{{SanitizerElementNamespaceWithAttributes/removeAttributes}}"].
1. Return |result|.

</div>


<div algorithm>
In order to <dfn>canonicalize a sanitizer element</dfn> a
{{SanitizerElement}} |element|,
return the result of [=canonicalize a sanitizer name=] with |element| and the [=HTML namespace=] as the default namespace.
</div>

<div algorithm>
In order to <dfn>canonicalize a sanitizer attribute</dfn> a
{{SanitizerAttribute}} |attribute|, 
return the result of [=canonicalize a sanitizer name=] with |attribute| and `null` as the default namespace.
</div>

<div algorithm>
In order to <dfn>canonicalize a sanitizer name</dfn> |name|, with a default
namespace |defaultNamespace|, run the following steps:

1. [=Assert=]: |name| is either a {{DOMString}} or a [=dictionary=].
1. If |name| is a {{DOMString}}, then return &laquo;[ "`name`" &rightarrow; |name|, "`namespace`" &rightarrow; |defaultNamespace|]&raquo;.
1. [=Assert=]: |name| is a [=dictionary=] and |name|["name"] [=map/exists=].
1. Return &laquo;[ <br>
  "`name`" &rightarrow; |name|["name"], <br>
  "`namespace`" &rightarrow; ( |name|["namespace"] if it [=map/exists=], otherwise |defaultNamespace| ) <br>
  ]&raquo;.

</div>

## Supporting Algorithms ## {#alg-support}

For the [=canonicalize a sanitizer name|canonicalized=]
{{SanitizerElementNamespace|element}} and {{SanitizerAttributeNamespace|attribute name}} lists
used in this spec, list membership is based on matching both "`name`" and "`namespace`"
entries:

<div algorithm>
A Sanitizer name |list| <dfn for="SanitizerConfig">contains</dfn> an |item|
if there exists an |entry| of |list| that is an [=ordered map=], and where
|item|["name"] [=equals=] |entry|["name"] and
|item|["namespace"] [=equals=] |entry|["namespace"].
</div>

<div algorithm>
To <dfn for="SanitizerConfig">remove</dfn> an |item| from a |list| that is an
[=ordered map=], [=list/remove=] all |entry| from |list|
where |item|["name"] [=equals=] |entry|["name"] and
|item|["namespace"] [=equals=] |entry|["namespace"].
</div>

<div algorithm>
To <dfn for="SanitizerConfig">add</dfn> a |name| to a |list|, where |name| is
[=canonicalize a sanitizer name|canonicalized=] and |list| is an [=ordered map=]:

1. If |list| [=SanitizerConfig/contains=] |name|, then return.
1. [=list/Append=] |name| to |list|.

</div>

<div algorithm>
Equality for [=ordered sets=] is equality of its members, but without
regard to order:
[=Ordered sets=] |A| and |B| are <dfn for=set>equal</dfn> if both |A| is a
[=superset=] of |B| and |B| is a [=superset=] of |A|.
</div>

## Defaults ## {#sanitization-defaults}

There are three builtins:

* The [=built-in safe default configuration=],
* the [=built-in safe baseline configuration=], and
* the [=built-in navigating URL attributes list=].

The <dfn>built-in safe default configuration</dfn> is as follows:
```
{
  elements: [ ... ],
  attributes: [ ... ],
}
```

The <dfn>built-in safe baseline configuration</dfn> is meant to block only
script-content, and nothing else. It is as follows:
```
{
  removeElements: [
    { name: "script", namespace: "http://www.w3.org/1999/xhtml" },
    { name: "script", namespace: "http://www.w3.org/2000/svg" }
  ],
  removeAttributes: [....],
}
```

<div>
The <dfn>built-in navigating URL attributes list</dfn>, for which "`javascript:`"
navigations are "unsafe", are as follows:

&laquo;[
  <br>
  [
    { "`name`" &rightarrow; "`a`", "`namespace`" &rightarrow; [=HTML namespace=] },
    { "`name`" &rightarrow; "`href`", "`namespace`" &rightarrow; `null` }
  ],
  <br>
  [
    { "`name`" &rightarrow; "`area`", "`namespace`" &rightarrow; [=HTML namespace=] },
    { "`name`" &rightarrow; "`href`", "`namespace`" &rightarrow; `null` }
  ],
  <br>
  [
    { "`name`" &rightarrow; "`form`", "`namespace`" &rightarrow; [=HTML namespace=] },
    { "`name`" &rightarrow; "`action`", "`namespace`" &rightarrow; `null` }
  ],
  <br>
  [
    { "`name`" &rightarrow; "`input`", "`namespace`" &rightarrow; [=HTML namespace=] },
    { "`name`" &rightarrow; "`formaction`", "`namespace`" &rightarrow; `null` }
  ],
  <br>
  [
    { "`name`" &rightarrow; "`button`", "`namespace`" &rightarrow; [=HTML namespace=] },
    { "`name`" &rightarrow; "`formaction`", "`namespace`" &rightarrow; `null` }
  ],
  <br>
]&raquo;
</div>


# Security Considerations # {#security-considerations}

The Sanitizer API is intended to prevent DOM-based Cross-Site Scripting
by traversing a supplied HTML content and removing elements and attributes
according to a configuration. The specified API must not support
the construction of a Sanitizer object that leaves script-capable markup in
and doing so would be a bug in the threat model.

That being said, there are security issues which the correct usage of the
Sanitizer API will not be able to protect against and the scenarios will be
laid out in the following sections.

## Server-Side Reflected and Stored XSS ## {#server-side-xss}

<em>This section is not normative.</em>

The Sanitizer API operates solely in the DOM and adds a capability to traverse
and filter an existing DocumentFragment. The Sanitizer does not address
server-side reflected or stored XSS.

## DOM clobbering ## {#dom-clobbering}

<em>This section is not normative.</em>

DOM clobbering describes an attack in which malicious HTML confuses an
application by naming elements through `id` or `name` attributes such that
properties like `children` of an HTML element in the DOM are overshadowed by
the malicious content.

The Sanitizer API does not protect DOM clobbering attacks in its
default state, but can be configured to remove `id` and `name` attributes.

## XSS with Script gadgets ## {#script-gadgets}

<em>This section is not normative.</em>

Script gadgets are a technique in which an attacker uses existing application
code from popular JavaScript libraries to cause their own code to execute.
This is often done by injecting innocent-looking code or seemingly inert
DOM nodes that is only parsed and interpreted by a framework which then
performs the execution of JavaScript based on that input.

The Sanitizer API can not prevent these attacks, but requires page authors to
explicitly allow unknown elements in general, and authors must additionally
explicitly configure unknown attributes and elements and markup that is known
to be widely used for templating and framework-specific code,
like `data-` and `slot` attributes and elements like `<slot>` and `<template>`.
We believe that these restrictions are not exhaustive and encourage page
authors to examine their third party libraries for this behavior.

## Mutated XSS ## {#mutated-xss}

<em>This section is not normative.</em>

Mutated XSS or mXSS describes an attack based on parser context mismatches
when parsing an HTML snippet without the correct context. In particular,
when a parsed HTML fragment has been serialized to a string, the string is
not guaranteed to be parsed and interpreted exactly the same when inserted
into a different parent element. An example for carrying out such an attack
is by relying on the change of parsing behavior for foreign content or
mis-nested tags.

The Sanitizer API offers only functions that turn a string into a node tree.
The context is supplied implicitly by all sanitizer functions:
`Element.setHTML()` uses the current element; `Document.parseHTML()` creates a
new document. Therefore Sanitizer API is not directly affected by mutated XSS.

If a developer were to retrieve a sanitized node tree as a string, e.g. via
`.innerHTML`, and to then parse it again then mutated XSS may occur.
We discourage this practice. If processing or passing of HTML as a
string should be necessary after all, then any string should be considered
untrusted and should be sanitized (again) when inserting it into the DOM. In
other words, a sanitized and then serialized HTML tree can no
longer be considered as sanitized.

A more complete treatment of mXSS can be found in [[MXSS]].

# Acknowledgements # {#ack}

Cure53's [[DOMPURIFY]] is a clear inspiration for the API this document
describes, as is Internet Explorer's {{window.toStaticHTML()}}.