Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spaces before closing inline pairs don't parse #124

Open
BrianHicks opened this issue Sep 16, 2022 · 3 comments
Open

spaces before closing inline pairs don't parse #124

BrianHicks opened this issue Sep 16, 2022 · 3 comments

Comments

@BrianHicks
Copy link

Here's an example:

this is a **strong **word.

I'd expect this should be formatted as "this is a strong word" but comes out as unparsed source.

Context: I have some source coming in from Airtable, which is a "markdown-inspired" markup. It's close, but there are times like this where it does weird stuff. GitHub's parser correctly parses this, however, so I think it's reasonable.

@jhbrown-veradept
Copy link

CommonMark does not permit whitespace before a “right flanking delimiter run”.

So I believe **strong ** should not be emphasized, but instead the literal string **strong **

“A right-flanking delimiter run is a delimiter run that is (1) not preceded by Unicode whitespace, and either (2a) not preceded by a Unicode punctuation character, or (2b) preceded by a Unicode punctuation character and followed by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.” https://spec.commonmark.org/0.30/#emphasis-and-strong-emphasis

@dillonkearns
Copy link
Owner

Thank you both for the context here! I think @jhbrown-veradept is correct about the spec. In fact, if I write out your example Brian directly (not in a codeblock) it doesn't result in formatted strong text, rather it shows literal *'s:

this is a **strong **word.

Babelmark is a really handy tool for this kind of corner case. You can see that the most definitive markdown parsers get that same result, including the one GitHub maintains and uses on their site, cmark-gfm labelled GitHub Flavored Markdown in the results here:

https://babelmark.github.io/?text=this+is+a+**strong+**word.%0A

@dillonkearns
Copy link
Owner

So it seems like a possible fix here would be to provide the inline parser with a fallback to give back its raw text:

parse : References -> String -> List Inline
parse refs rawText_ =
let
rawText =
String.trim rawText_
tokens =
tokenize rawText
in
tokensToMatches tokens [] refs rawText
|> organizeMatches
|> parseTextMatches rawText []
|> matchesToInlines

I'm not sure if that would cause any other failures in the test suite, but the tests are quite robust so we'd find out pretty quickly whether that introduces new issues or can be done safely to get the same behavior that GitHub's gfm parser has.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants