Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GHSA-5jfw-gq64-q45f] HTML Cleaner allows crafted scripts in special contexts like svg or math to pass through #5031

Open
wants to merge 1 commit into
base: byt3n33dl3/advisory-improvement-5031
Choose a base branch
from

Conversation

byt3n33dl3
Copy link

Updates

  • CVSS v3
  • Description
  • References
  • Summary

Comments
This update addresses a critical Cross-Site Scripting (XSS) vulnerability in the lxml-html-clean library, affecting versions < 0.4.0. The vulnerability arises from improper handling of special HTML tags such as , , and , allowing malicious scripts to bypass the HTML cleaning process.

The proposed improvement includes additional context about the exploit scenario, mitigation techniques, and real-world implications of the vulnerability. It also provides actionable examples and references, making the advisory more comprehensive and user-friendly. These enhancements ensure developers understand the risks and adopt best practices to secure their applications effectively.

This contribution aligns with the goal of the GitHub Security Advisory to provide detailed, actionable, and accurate information to the developer community for maintaining software security.

@github
Copy link
Collaborator

github commented Nov 22, 2024

Hi there @frenzymadness! A community member has suggested an improvement to your security advisory. If approved, this change will affect the global advisory listed at github.com/advisories. It will not affect the version listed in your project repository.

This change will be reviewed by our Security Curation Team. If you have thoughts or feedback, please share them in a comment here! If this PR has already been closed, you can start a new community contribution for this advisory

@github-actions github-actions bot changed the base branch from main to byt3n33dl3/advisory-improvement-5031 November 22, 2024 06:38
@darakian
Copy link
Contributor

I'm not sure I agree that this is an improvement. This reads to me as a fluffing up of the text which degrades readability. The two references you add are also duplicative with what we already have on record. Maybe I'm missing it, but can you point out what new context you're adding to the advisory?

@byt3n33dl3
Copy link
Author

I'm adding an improvement on the lxml_html and related stuff about more possibility related to cross-site vuln

Phishing Attacks through SVG Payloads

Scenario: An attacker crafts an HTML payload containing an <svg> element with malicious JavaScript embedded in a <script> tag. This payload is passed through `lxml_html_clean,` which fails to sanitize it effectively due to improper handling of <svg> context-switching. When the sanitized output is rendered in a browser, the JavaScript executes.

Reflected XSS in Web Applications and DOM-Based XSS through JavaScript Integration

Scenario: A web application accepts untrusted input from query parameters or form submissions and sanitizes it using lxml_html_clean. An attacker crafts a payload with <math> tags containing event handlers such as onclick or onmouseover, which bypass the sanitizer and execute in the browser.

Scenario: An attacker embeds <svg> or <noscript> elements containing scripts that interact with client-side JavaScript. When the sanitized HTML is dynamically injected into the DOM via JavaScript, the browser interprets the malicious scripts embedded in these tags, bypassing the intended sanitization.

So mostly I want to add what other possibility of exploitation from this vuln, maybe in short it was

  • Stored XSS
  • (DoS) via Resource Exhaustion
  • DOM Clobbering
  • Open Redirects via Sanitized Links
  • Code Execution via Polyglot Payloads
  • Cross-Origin Data Exfiltration
  • Arbitrary Code Execution in Legacy Browsers
  • Bypassing Content Security Policy (CSP)

Summary

While XSS is the most prominent vulnerability due to the mismanagement of these tags, the improper handling of , , and elements in lxml_html_clean creates opportunities for various exploits, from DoS and DOM clobbering to sophisticated bypass techniques. These scenarios emphasize the importance of upgrading to the patched version of lxml and implementing robust additional validation techniques when handling untrusted HTML content.

@byt3n33dl3
Copy link
Author

Scenario of Execution

HTML Injection

Attackers exploit the vulnerability to inject untrusted HTML content that appears sanitized but retains harmful structure due to context-switching issues. For example:

Misused <math> or <svg> elements with unexpected attributes.

Embedded malicious iframes or forms disguised in legitimate-looking content.
Impact: Enables phishing attacks or tricking users into submitting sensitive data to malicious endpoints.

Stored XSS

In applications that persist sanitized HTML in databases or logs, malicious content can bypass sanitization and remain dormant until displayed in a vulnerable context. For instance

Injected <noscript> tags might trigger scripts in browser contexts with JavaScript disabled.

Hidden scripts in SVG animations ( or elements) may activate under certain conditions.
Impact: Persistent execution of malicious scripts whenever the compromised content is viewed, amplifying the attack surface.

Denial of Service (DoS) via Resource Exhaustion

Scenario is when An attacker creates complex nested

<svg> or <math> elements with recursive attributes or oversized payloads 

designed to consume excessive parsing resources. Since lxml may not handle such payloads efficiently, this could lead to:
High memory or CPU consumption on the server during sanitization.
Application crashes or reduced availability. Impacting to a DoS attacks could disrupt services relying on lxml for processing untrusted HTML inputs.

DOM Clobbering

Leveraging improperly sanitized HTML to insert elements with unexpected IDs or names that overwrite critical DOM properties. For example

<svg id="submit" onclick="maliciousFunction()"> could override a legitimate form’s submit functionality.

Impact: Manipulation of the client-side DOM behavior, potentially hijacking user actions or breaking application functionality.

Open Redirects via Sanitized Links

For example Attackers inject sanitized tags with event handlers or redirection payloads hidden within or . For instance

<svg><a href="javascript:evilFunction()">Click here</a></svg>.

Impacting to a Exploitation of open redirects to conduct phishing or malware distribution campaigns.

Code Execution via Polyglot Payloads

Context-switching behavior may allow injection of polyglot payloads that are interpreted differently depending on the parser or runtime environment. For example
Mixed HTML, SVG, and JavaScript elements that produce varied behaviors when sanitized, rendered, or executed.
Leading to Bypasses both sanitization and execution safeguards, leading to remote code execution (RCE) in some contexts.

Cross-Origin Data Exfiltration

Malicious

<svg> or <math> elements exploit context-switching to bypass same-origin policies indirectly. 

For example Embedding

<svg><foreignObject> to extract sensitive data by manipulating browser rendering behavior.

Impact: Sensitive user data could be leaked to an attacker-controlled domain.

@frenzymadness
Copy link

As @darakian pointed out, the added references are duplicates.

I don't think all the examples of exploits you mentioned fit here. If you can provide examples of a crafted HTML code that pass through lxml_html_clean and cause the troubles you mentioned, it would be great and we can use them to improve the tests of lxml_html_clean.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants