-
-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse GFM Extended Autolinks #57
Comments
@dillonkearns I'll work on this one if it's up for grabs! |
It's all yours, thank you @stephenreddek! 👌 💯 |
@dillonkearns What are your thoughts on how to handle multiple trailing "entity references" per https://github.github.com/gfm/#example-626 ? It only explicitly mentions handling a single, trailing reference, but it sure feels like it should remove multiple of them if they exist. |
One piece of supporting evidence for the idea of trimming all: the parentheses rule removes all trailing unmatched parentheses. |
Another question! The spec for url autolinks only mentions support he protocols Thanks for any guidance you have! |
Hey @stephenreddek! Good questions. So for the URL schemes, my thinking is that it should either 1) be very specific (only the explicit ones mentioned, http and https), or 2) be completely general (anything in the form of I don't like the idea of hardcoding a specific set when there are so many possible schemes: https://en.wikipedia.org/wiki/List_of_URI_schemes. And indeed, many different possible valid URLs. Babelmark tends to treat general schemes, like a slack:// link, as plain text (not autolinks): So let's go with option (1) on this, and only handle the specific cases of |
Regarding trailing entity references, that seems right to me that we should remove multiple references. What happens on babelmark? I often let that by the tie breaker when I'm not sure, with a little extra weight given to the results from the official C implementation for the |
Yep, the official implementation drops them all so I'll just go with that! |
We currently handle the CommonMark autolinks, which are links with explicit surrounding
<>
's.However, we are not parsing the GitHub-Flavored Markdown's extended autolinks, which are bare links with no explicit token. The fact that it should be parsed as a link is inferred by the format, for example content starting with
https://
and followed by a valid domain.You can see the current end-to-end spec failures here:
https://github.com/dillonkearns/elm-markdown/blob/master/test-results/failing/GFM/%5Bextension%5D%20Autolinks.md
This issue will be complete when we've made those end-to-end tests pass.
Existing Inline Parsing Code
Note that the inline parsing code does not using
elm/parser
because Markdown inline parsing using a very different algorithm than the block parsing, and it's not well-suited toelm/parser
. The details of why are not important in this issue, but it's worth being aware that this code is based on Regex processing.Here's the current area where CommonMark-style autolinks are handled:
elm-markdown/src/Markdown/InlineParser.elm
Lines 1101 to 1107 in 40f9dc4
Note that it is only applying this in the context of
angleBracketsToMatch
. We can likely reuse some of theautolinkToMatch
code, but outside of the context of an angle brackets match.The text was updated successfully, but these errors were encountered: