-
Notifications
You must be signed in to change notification settings - Fork 643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation parser fixes #4853
Conversation
var encoded = HttpUtility.HtmlEncode(readMeMd); | ||
var encodedLines = encoded.Replace("\r\n", "\n").Split('\n'); | ||
|
||
var blockQuotePattern = new Regex("^ {0,3}>"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better if blockQuotePattern
was a static readonly property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this HtmlEncodeExceptBlockquotes
method seems rather expensive. Would it be worthwhile to check if readMeMd
has the >
substring, and if it doesn't, simply return encoded
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will play with this to see if I can optimize it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to single MultiLine regex applied to entire markdown
{ | ||
CommonMarkConverter.ProcessStage3(document, htmlWriter, settings); | ||
|
||
var regex = new Regex("<a href=([\"\']).*?\\1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making this regex also a static readonly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is CommonMarkConverter
guaranteed to always generate links with the format <a href=
? If not, this regex may not match all links. Would a simple string replace <a
to <a rel="nofollow"
work too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the current version always has href attribute first, then possible title attribute. There's an open issue to add support for other link attributes, but it won't be added until the commonmark spec includes it.
{ | ||
// Block javascript in links. | ||
if ((inline.Tag == InlineTag.Link) && | ||
(inline.TargetUrl.IndexOf("javascript", StringComparison.InvariantCultureIgnoreCase) >= 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change to only allow target urls that use http/https.
@blowdart Can you have a look? |
LGTM |
|
||
namespace NuGetGallery | ||
{ | ||
internal class ReadMeService : IReadMeService | ||
{ | ||
private static readonly Regex EncodedBlockQuotePattern = new Regex("^ {0,3}>", RegexOptions.Multiline); | ||
private static readonly Regex CommonMarkLinkPattern = new Regex("<a href=([\"\']).*?\\1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel uneasy about using regexes to insert the rel=nofollow
. Are you sure CommonMark always generates links that start with <a href=
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex was added in PR #4841. I discussed with @ryuyu and we agreed the regex was probably better than forking CommonMark.Net.
For CommonMark.NET link formatting see HtmlFormatterSlim.
CommonMark.NET had a request to support link attributes, but it won't be added unless attribute extensions are added to the underlying commonmark spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add timeout for these regexes, as pointed out in other PRs.
Fixes #4783 and #4816, plus restricts markdown links to http/s