Skip to content

Commit

Permalink
utils/html: Provide full control of allowed HTML elements
Browse files Browse the repository at this point in the history
Added new configuration option "strictly-allowed-html-elements" to specify only allowed HTML tags in the generated output.

Fixes #751
  • Loading branch information
pkvach committed May 6, 2024
1 parent 796357a commit b53d496
Show file tree
Hide file tree
Showing 6 changed files with 86 additions and 28 deletions.
3 changes: 1 addition & 2 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,7 @@ Bugfixes & Improvements
- Change logging to include datetime and loglevel (`#1023`_, ix5)
- Make 'text' field in 'comments' table NOT NULL and handling data migration (`#1019`_, pkvach)
- Python 3.12 support (`#1015`_, ix5)
- Provide full control of allowed-elements and allowed-attributes via the configuration
file (`#1007`_, pkvach)
- Provide full control of allowed HTML elements via the configuration file (`#1007`_, pkvach)

.. _#951: https://github.com/posativ/isso/pull/951
.. _#967: https://github.com/posativ/isso/pull/967
Expand Down
6 changes: 3 additions & 3 deletions contrib/isso-dev.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,9 @@ reply-to-self = true
[markup]
options = autolink, fenced-code, no-intra-emphasis, strikethrough, superscript
flags =
allowed-elements = a, p, hr, br, ol, ul, li, pre, code, blockquote, del, ins,
strong, em, h1, h2, h3, h4, h5, h6, sub, sup, table, thead, tbody, tr, th, td
allowed-attributes = align, href
allowed-elements =
strictly-allowed-html-elements =
allowed-attributes =

[hash]
salt = Eech7co8Ohloopo9Ol6baimi
Expand Down
39 changes: 30 additions & 9 deletions docs/docs/reference/server-config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -428,29 +428,50 @@ flags
.. versionadded:: 0.12.4

allowed-elements
HTML tags to allow in the generated output, comma-separated.
**Additional** HTML tags to allow in the generated output, comma-separated.

By default, only ``a``, ``blockquote``, ``br``, ``code``, ``del``, ``em``,
``h1``, ``h2``, ``h3``, ``h4``, ``h5``, ``h6``, ``hr``, ``ins``, ``li``,
``ol``, ``p``, ``pre``, ``strong``, ``table``, ``tbody``, ``tr``, ``td``, ``th``,
``thead`` and ``ul`` are allowed.

For a more detailed explanation, see :doc:`/docs/reference/markdown-config`.

Default: ``a, p, hr, br, ol, ul, li, pre, code, blockquote, del, ins, strong, em, h1, h2, h3, h4, h5, h6, sub, sup, table, thead, tbody, tr, th, td``
.. warning::

This option (together with ``allowed-attributes``) is frequently
misunderstood. Setting e.g. this list to only ``a, blockquote`` will
mean that ``br, code, del, ...`` and all other default allowed tags are
still allowed. You can only add *additional* elements here.

To specify a list of *only* allowed elements, use the
``strictly-allowed-html-elements`` option instead.

Default: (empty)

strictly-allowed-html-elements

.. versionchanged:: 0.14
Prior to this version, the setting worked as additional allowed elements to the predefined ones.
**Only** allow the specified HTML tags in the generated output, comma-separated.
If this option is set, the ``allowed-elements`` option is ignored.

Default: (empty)

.. versionadded:: 0.13.1

allowed-attributes
HTML attributes (independent from elements) to allow in the
**Additional** HTML attributes (independent from elements) to allow in the
generated output, comma-separated.

By default, only ``align`` and ``href`` are allowed (same caveats as for
``allowed-elements`` above apply)

For a more detailed explanation, see :doc:`/docs/reference/markdown-config`.

Default: ``align, href``
Default: (empty)

.. note:: To allow images in comments, you need to add
``allowed-elements = img`` and *also* ``allowed-attributes = src``.

.. versionchanged:: 0.14
Prior to this version, the setting worked as additional allowed attributes to the predefined ones.

Hash
----

Expand Down
20 changes: 13 additions & 7 deletions isso/isso.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -211,13 +211,19 @@ options = autolink, fenced-code, no-intra-emphasis, strikethrough, superscript
# Per Misaka's defaults, no flags are set.
flags =

# HTML tags to allow in the generated output, comma-separated.
allowed-elements = a, p, hr, br, ol, ul, li, pre, code, blockquote, del, ins,
strong, em, h1, h2, h3, h4, h5, h6, sub, sup, table, thead, tbody, tr, th, td

# HTML attributes (independent from elements) to allow in the generated output,
# comma-separated.
allowed-attributes = align, href
# Additional HTML tags to allow in the generated output, comma-separated. By
# default, only a, blockquote, br, code, del, em, h1, h2, h3, h4, h5, h6, hr,
# ins, li, ol, p, pre, strong, table, tbody, tr, td, th, thead and ul are allowed.
allowed-elements =

# Only allow the specified HTML tags in the generated output, comma-separated.
# If this option is set, the "allowed-elements" option is ignored.
strictly-allowed-html-elements =

# Additional HTML attributes (independent from elements) to allow in the
# generated output, comma-separated. By default, only align and href are
# allowed.
allowed-attributes =


[hash]
Expand Down
20 changes: 18 additions & 2 deletions isso/tests/test_html.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,22 +95,38 @@ def test_render(self):
"options": "autolink",
"flags": "",
"allowed-elements": "a, p",
"allowed-attributes": "href"
"allowed-attributes": "href",
"strictly-allowed-html-elements": ""
}
})
renderer = html.Markup(conf.section("markup")).render
self.assertIn(renderer("http://example.org/ and sms:+1234567890"),
['<p><a href="http://example.org/" rel="nofollow noopener">http://example.org/</a> and sms:+1234567890</p>',
'<p><a rel="nofollow noopener" href="http://example.org/">http://example.org/</a> and sms:+1234567890</p>'])

def test_render_with_strictly_allowed_elements(self):
conf = config.new({
"markup": {
"options": "autolink",
"flags": "",
"allowed-elements": "a, p",
"strictly-allowed-html-elements": "p",
"allowed-attributes": "href"
}
})
renderer = html.Markup(conf.section("markup")).render
self.assertEqual(renderer("http://example.org/ and sms:+1234567890"),
'<p>http://example.org/ and sms:+1234567890</p>')

def test_sanitized_render_extensions(self):
"""Options should be normalized from both dashed-case or snake_case (legacy)"""
conf = config.new({
"markup": {
"options": "no_intra_emphasis", # Deliberately snake_case
"flags": "",
"allowed-elements": "p",
"allowed-attributes": ""
"allowed-attributes": "",
"strictly-allowed-html-elements": ""
}
})
renderer = html.Markup(conf.section("markup")).render
Expand Down
26 changes: 21 additions & 5 deletions isso/utils/html.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,12 @@ def __init__(self, elements, attributes):

# allowed attributes for tags
self.attributes = {
"table": ["align"],
"a": ["href"],
"code": Sanitizer.allow_attribute_class,
"*": attributes
}

# If "code" elements are allowed, allow "language-*" CSS classes for syntax highlighting
if "code" in self.elements:
self.attributes["code"] = Sanitizer.allow_attribute_class

def sanitize(self, text):
clean_html = bleach.clean(text, tags=self.elements, attributes=self.attributes, strip=True)

Expand Down Expand Up @@ -100,9 +99,26 @@ def __init__(self, conf):
parser = Markdown(extensions=self.extensions,
flags=self.flags)
# Filter out empty strings:
allowed_elements = [x for x in conf.getlist("allowed-elements") if x]
strictly_allowed_html_elements = [x for x in conf.getlist("strictly-allowed-html-elements") if x]
allowed_attributes = [x for x in conf.getlist("allowed-attributes") if x]

# if "strictly-allowed-html-elements" option is set, use it instead of "allowed-elements"
if strictly_allowed_html_elements:
allowed_elements = strictly_allowed_html_elements
else:
allowed_elements = [x for x in conf.getlist("allowed-elements") if x]

# attributes found in Sundown's HTML serializer [1]
# - except for <img> tag, because images are not generated anyways.
# - sub and sup added
#
# [1] https://github.com/vmg/sundown/blob/master/html/html.c
allowed_elements = ["a", "p", "hr", "br", "ol", "ul", "li",
"pre", "code", "blockquote",
"del", "ins", "strong", "em",
"h1", "h2", "h3", "h4", "h5", "h6", "sub", "sup",
"table", "thead", "tbody", "tr", "th", "td"] + allowed_elements

# If images are allowed, source element should be allowed as well
if 'img' in allowed_elements and 'src' not in allowed_attributes:
allowed_attributes.append('src')
Expand Down

0 comments on commit b53d496

Please sign in to comment.