Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline HTML blocks with < and > chars don't get parsed as HTML #123

Open
thomasin opened this issue Sep 6, 2022 · 6 comments
Open

Inline HTML blocks with < and > chars don't get parsed as HTML #123

thomasin opened this issue Sep 6, 2022 · 6 comments

Comments

@thomasin
Copy link
Contributor

thomasin commented Sep 6, 2022

Pretty sure this is a bug!

Failing test: (note: first test passes, second inline test fails)
master...thomasin:elm-markdown:bug/html-attributes

I think it has something to do with the inline logic, because it only affects inline HTML blocks, but all of that code is a bit more opaque to me.

@dillonkearns
Copy link
Owner

Shouldn't it be escaped?

When I enter something similar here it gives an error:

https://validator.w3.org/nu/#textarea

Screenshot 2022-10-30 at 11 31 25 AM

So instead of this:

<img src="https://avatars2.githubusercontent.com/<u>/1384166" />

Wouldn't the correct way to write that raw HTML be something like this? <img="https://avatars2.githubusercontent.com/&lt;u&gt;/1384166" />

@thomasin
Copy link
Contributor Author

I think that error is specific to path strings like src/href, which need to contain a valid URL
Something like <div data-path="<here>"></div> passes fine
https://html.spec.whatwg.org/#attributes
Sorry I should have provided more background info!

@dillonkearns
Copy link
Owner

Oh okay, thank you for clarifying! Then it does seem like a bug indeed.

The HTML parser is forked from https://github.com/jinjor/elm-xml-parser. I searched through there but couldn't find a related issue unfortunately.

But the error code must be somewhere within this code path I think:

attributeValue : Parser String
attributeValue =
oneOf
[ succeed identity
|. symbol "\""
|= textString '"'
|. symbol "\""
, succeed identity
|. symbol "'"
|= textString '\''
|. symbol "'"
]
. Thanks for reporting the issue. I don't have bandwidth to look at it now, but this would be a good contribution if anybody is interested in giving it a go!

@LutSa
Copy link
Collaborator

LutSa commented Dec 1, 2022

This issue only exists in inline HTML blocks and doesn't exist in HTML blocks. So the error code is less likely in HtmlParser.elm.

@LutSa
Copy link
Collaborator

LutSa commented Dec 1, 2022

I can fix this by allowing <> within "" or <> in inline Html. In other worlds, clearing the tokens of "<>" when it's in <> or "" in InlineParser.elm. Is that align with the spec?

tokenize : String -> List Token
tokenize rawText =
findCodeTokens rawText
|> mergeByIndex (findAsteriskEmphasisTokens rawText)
|> mergeByIndex (findUnderlineEmphasisTokens rawText)
|> mergeByIndex (findStrikethroughTokens rawText)
|> mergeByIndex (findLinkImageOpenTokens rawText)
|> mergeByIndex (findLinkImageCloseTokens rawText)
|> mergeByIndex (findHardBreakTokens rawText)
|> mergeByIndex (findAngleBracketLTokens rawText)
|> mergeByIndex (findAngleBracketRTokens rawText)

@dillonkearns
Copy link
Owner

That sounds right to me @LutSa! Within the context of "'s or ''s within HTML tags, <> should be regarded as part of the attribute and not special characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants