spaces before closing inline pairs don't parse #124

BrianHicks · 2022-09-16T13:41:40Z

Here's an example:

this is a **strong **word.

I'd expect this should be formatted as "this is a strong word" but comes out as unparsed source.

Context: I have some source coming in from Airtable, which is a "markdown-inspired" markup. It's close, but there are times like this where it does weird stuff. GitHub's parser correctly parses this, however, so I think it's reasonable.

jhbrown-veradept · 2022-09-16T14:44:14Z

CommonMark does not permit whitespace before a “right flanking delimiter run”.

So I believe **strong ** should not be emphasized, but instead the literal string **strong **

“A right-flanking delimiter run is a delimiter run that is (1) not preceded by Unicode whitespace, and either (2a) not preceded by a Unicode punctuation character, or (2b) preceded by a Unicode punctuation character and followed by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.” https://spec.commonmark.org/0.30/#emphasis-and-strong-emphasis

dillonkearns · 2022-09-16T15:09:35Z

Thank you both for the context here! I think @jhbrown-veradept is correct about the spec. In fact, if I write out your example Brian directly (not in a codeblock) it doesn't result in formatted strong text, rather it shows literal *'s:

this is a **strong **word.

Babelmark is a really handy tool for this kind of corner case. You can see that the most definitive markdown parsers get that same result, including the one GitHub maintains and uses on their site, cmark-gfm labelled GitHub Flavored Markdown in the results here:

https://babelmark.github.io/?text=this+is+a+**strong+**word.%0A

dillonkearns · 2022-09-16T15:13:06Z

So it seems like a possible fix here would be to provide the inline parser with a fallback to give back its raw text:

elm-markdown/src/Markdown/InlineParser.elm

Lines 16 to 28 in a483add

    
           parse : References -> String -> List Inline 
        
           parse refs rawText_ = 
        
               let 
        
                   rawText = 
        
                       String.trim rawText_ 
        
                   tokens = 
        
                       tokenize rawText 
        
               in 
        
               tokensToMatches tokens [] refs rawText 
        
                   |> organizeMatches 
        
                   |> parseTextMatches rawText [] 
        
                   |> matchesToInlines

I'm not sure if that would cause any other failures in the test suite, but the tests are quite robust so we'd find out pretty quickly whether that introduces new issues or can be done safely to get the same behavior that GitHub's gfm parser has.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spaces before closing inline pairs don't parse #124

spaces before closing inline pairs don't parse #124

BrianHicks commented Sep 16, 2022

jhbrown-veradept commented Sep 16, 2022

dillonkearns commented Sep 16, 2022

dillonkearns commented Sep 16, 2022

spaces before closing inline pairs don't parse #124

spaces before closing inline pairs don't parse #124

Comments

BrianHicks commented Sep 16, 2022

jhbrown-veradept commented Sep 16, 2022

dillonkearns commented Sep 16, 2022

dillonkearns commented Sep 16, 2022