-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent bidi phishing #291
Conversation
The old URL regex had a few issues which were revealed by fuzzing, the biggest being that it accepted non-printable characters (e.g. 0x00 or 0x01) as part of the URL. This created the scenario where a url of https://example.com/[0x00] would be rendered as %2 (and attempting to open the link would give a value like "https://example.com https://example.com " due to some odd iteraction with the regex that I haven't quite worked out. The new regex appears to work with all the iterations I have tried and rejects non-printable characters.
Realized this morning while thinking about this that my original solution was completely off. Updated with a much better solution which forces the actual url within the anchor text. |
@@ -64,7 +64,7 @@ QString LinkedText::parsed(const QString &input) | |||
|
|||
if (start > p) | |||
re.append(input.mid(p, start - p).toHtmlEscaped().replace(QLatin1Char('\n'), QStringLiteral("<br/>"))); | |||
re.append(QStringLiteral("<a href=\"%1\">%2</a>").arg(QString::fromLatin1(url.toEncoded()).toHtmlEscaped()).arg(match.capturedRef().toString().toHtmlEscaped())); | |||
re.append(QStringLiteral(" <a href=\"%1\">‪%2</a>").arg(QString::fromLatin1(url.toEncoded()).toHtmlEscaped()).arg(match.capturedRef().toString().toHtmlEscaped())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the URL is in the middle of a block of right-to-left script? I tried a URL in the middle of arabic text, and with this patch the URL somehow appeared to the left of all of the text, instead of its correct position.
Is that character added at the beginning a space? If so, why?
Also, could you add a comment explaining what this is and why it's necessary?
I found http://unicode.org/reports/tr36/#Bidirectional_Text_Spoofing - interesting reading. Wow, unicode is crazy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide the sample text for the right-to-left scenario? I will take a look to see what's going on there.
That space shouldn't be there, looks like miscommited somewhere, will fix that.
And yep. Will comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just took a random sentence from wikipedia and typed a URL somewhere in it:
هي لغة برمج http://www.google.com/ ة تستخدم...
The result is at least substantially different. I don't know enough to say if it's worse.
Prevents Bidi Phishing attempts by forcing left-to-right display for links through html entity ‪ Adding this as a seperate commit #275 so that the regex changes can be judged seperately. This is a fairly minor risk as a victim would have to go through many hoops and not see the obvious url issues. But better fixed than not.
Updated the diff, which fixes the inline rtl issue by adding a pop directional formatting blob at the end...testing with all relevant text strings, seems to work as expected now e.g. the link is embedded in the text as written. I'll take a look at the URL pull request tomorrow RE: not catching at the end. If this one looks good I'll cancel these 2 PR's and merge them into one since I also noticed '%' was missing from the regex. |
Thanks. The bidi parts are looking great. |
Closing in favor of new PR #302 |
Adding this is a separate PR, hence the duplicate url-regex commit.
Basically, this change is to prevent phishing attacks utilizing the unicode control characters to affect the rendering of the text so while the user sees https://ricochet.im the link actually goes to http://evil.com - it's a fairly low level risk due to the fact a user has to add the person as a contact, and ultimately copy and paste the link etc.
Sample of how the old code worked:
And the new code: