Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify policies that require constraints on a URL based on its protocol #22

Open
GoogleCodeExporter opened this issue Apr 27, 2015 · 6 comments

Comments

@GoogleCodeExporter
Copy link

Once a: protocol is allowed, policy authors often want to place additional 
constraints: e.g. a data protocol with an image/... mime-type for use with <img 
src>, or a tel: protocol that contains a valid telephone number.

Right now, policy authors are tempted to do

allowUrlProtocols("data", "https", "http", "mailto")

allowAttributes("src").matching(Pattern.compile("^(data:image/(gif|png|jpeg)[,;]
|http|https|mailto|//)", Pattern.CASE_INSENSITIVE)

which requires duplicative effort.

We should provide good alternatives to writing regular expressions to match 
URLs as it is error prone.

Perhaps a URL policy that recognizes structure in URLs.

Original issue reported on code.google.com by [email protected] on 21 Jan 2014 at 4:09

@jmanico
Copy link
Member

jmanico commented Apr 7, 2016

Bump to revisit.

@cnsgithub
Copy link

cnsgithub commented Sep 27, 2017

I would love an abbreviation for this, too.

Could you please help me out getting this to work:

	private static final PolicyFactory htmlSanitizer = new HtmlPolicyBuilder()
	        .allowUrlProtocols("data", "https", "http", "mailto")
	        .allowAttributes("src")
	        .matching(Pattern.compile("^(data:image/(gif|png|jpeg)[,;]|http|https|mailto|//)", Pattern.CASE_INSENSITIVE))
	        .onElements("img")
	        .toFactory()
	        .and(Sanitizers.IMAGES)
	        .and(Sanitizers.BLOCKS);

	public static void main(String[] args) {
		System.out.println(HtmlSanitize("<img src=\"data:image/gif;base64,R0lGODlhEAAQAMQAAORHHOVSKudfOulrSOp3WOyDZu6QdvCchPGolfO0o\" /><p>test</p>"));
	}

I get <p>test</p> here, but the embedded image is missing. Changing src="data to src="http however works. What am I doing wrong?

@cnsgithub
Copy link

cnsgithub commented Sep 27, 2017

Got it to work:

	private static final PolicyFactory htmlImageSanitizer = new HtmlPolicyBuilder()
	        .allowUrlProtocols("data", "http", "https")
	        .allowElements("img")
	        .allowAttributes("src")
	        .matching(Pattern.compile("^(data:image/(gif|png|jpeg)[,;]|http|https|mailto|//).+", Pattern.CASE_INSENSITIVE))
	        .onElements("img")
	        .toFactory();
	private static final PolicyFactory htmlSanitizer = htmlImageSanitizer.and(Sanitizers.BLOCKS).and(Sanitizers.FORMATTING).and(Sanitizers.LINKS).and(Sanitizers.STYLES).and(Sanitizers.TABLES);

@mikesamuel
Copy link
Contributor

Your solution seems complicated but the API doesn't obviously allow for a better way, so it's probably the API's fault.

There seem to be some separable concerns here:

  • It's hard to restrict data URLs to particular mime-types.
  • It's hard to restrict URLs to some attributes and not others.

I think the first problem is a symptom of a larger problem: it's hard to match URLs.
If we had a way to specify a concisely specify a set of URLs, then we could solve the second problem via an API like

    allowAttributes("src")
    .matchingUrls(...)
    .onElements("img")

where the ... encapsulates (http, https, or data with content-type in image/(gif|png|jpeg)).


What do you think of https://gist.github.com/mikesamuel/e9720a0acc0601372deba3bf0896f33a as a proposed API for solving the larger problem?

Note to self: I'm finding excuses to write specifications, so I should probably figure out what work I'm subconsciously avoiding and do it.

@cnsgithub
Copy link

Your API would be great at this point and definitely its implementation would be worth the effort.

@mikesamuel
Copy link
Contributor

https://github.com/OWASP/url-classifier is an experimental URL classifier API based on that gist.

#126 integrates it into java-html-sanitizer.

Neither is ready for prime-time yet, but you can play around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants