Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline HTML Code Blocks #50

Open
hayesgm opened this issue Jul 13, 2020 · 7 comments
Open

Inline HTML Code Blocks #50

hayesgm opened this issue Jul 13, 2020 · 7 comments

Comments

@hayesgm
Copy link

hayesgm commented Jul 13, 2020

I'm trying to use the extensible aspect of elm-markdown to add a <sup> tag without forking the library.

For example:

Hello <sup>world</sup>!

This is the standard GFM style, or so I've been told. I added a custom HTML extension as such:

htmlRender : Markdown.Html.Renderer (List (Html msg) -> Html msg)
htmlRender =
    Markdown.Html.tag "sup"
        (\renderedChildren ->
            Html.sup [] renderedChildren
        )

However, this is being parsed and rendered as:

Paragraph [Text "Hello"],
HtmlBlock (HtmlElement "sup" [] [Paragraph [Text "world"]]),
Paragraph [Text "!"]
<p>Hello</p>
<sup><p>world</p></sup>
<p>!</p>

I am not sure this falls under standard GFM failures as it depends largely on the specifically extensibility features of this library. My expectation is that this would have been parsed and rendered as:

Paragraph
  [ Text "Hello"
  , HtmlElement "sup" [] [Text "world"]
  , Text "!"
  ]
<p>Hello <sup>world</sup>!</p>

However, it appears in the current parsing data types (Markdown.Block), this data cannot be represented directly (e.g. we cannot nest an HtmlBlock below a Paragraph). Would the right approach here be to fork and add sup directly to the parser? At the end of the day, this limitation makes adding new extensible features a little difficult since they must always be top-level blocks.

Thanks for making an awesome, pure Elm parser!

@dillonkearns
Copy link
Owner

Hello! It's probably worth clarifying what the defaultHtmlRenderer does in this case and how it compares to standard GFM.

My initial thought is that it could take in an explicit set of html handlers. Right now, it doesn't accept any HTML tags. But that's just that particular function, the library itself is fully extensible in you can handle HTML tags. Here's an example that builds off of the defaultHtmlRenderer and adds a <sup> tag.

https://ellie-app.com/9qnwgJFwWLha1

Does that do what you're looking for? Let me know how that goes. I think it's worth documenting which HTML tags (if any) are rendered in the default HTML renderer, and considering adding an explicit argument of HTML handlers, or perhaps accepted tags to render directly as HTML.

@hayesgm
Copy link
Author

hayesgm commented Jul 14, 2020

Yes, that sample is similar to what I had been building, and my only real concern there is that it renders as <ul><li>2<sup><p>2</p></sup></li></ul>, and that extra Html.p tag is impossible to remove (since you can't effectively edit Elm's HTML msg types as the constructors are not exposed to pattern match over, as far as I know). The solution may just be to make sure the css treats the paragraph as an inline element.

Additionally, I think I surfaced another issue diagnosing this one. Specifically, if I wrap this ul in a custom <section> tag and that ul list items contain a custom HTML tag (e.g. <sup>), we end up rendering two separate uls. For example, with the same Elm code, adding a section tag like below:

https://ellie-app.com/9qpztJSgwhWa1

<section>
* 1
* 2<sup>2</sup>
* 2
</section>

This unexpectedly renders to:

<section>
  <ul>
    <li>1</li>
    <li>2</li>
  </ul>
  <sup><p>2</p></sup>
  <ul>
    <li>2</li>
  </ul>
</section>

Notice the two ul tags. The expectation is that we should have one ul with three list items. The parser, in fact, produces two UnorderedLists when it should only have produced one in this case.

By the way, thanks for your quick reply, and happy to help dive in to solve these issues!

@hayesgm
Copy link
Author

hayesgm commented Jul 14, 2020

Digging into this a bit, when we parse a node like section in HtmlParser, we end up breaking out the children as such:

Element "section" [] [Text "\n* 1", Text "\n\n* 2",Element "sup" [] [Text "2"],Text "\n* 3\n\n"]

Since that element breaks the unordered list when parsing blocks, whereas usually those elements would be a single unordered list and the html would be parsed as an inline.

The solution would be to follow the GFM Spec 4.6 Subsection 6 "end condition" and break the HTML parsing after a blank line (as demonstrated in Example 118 which is failing in the current specs). That said, as you prefer good error handling instead of full correctness, I'm not sure if that approach is what you want (since we would do a worse job at explaining mismatched tags, etc). Or, alternatively, we could continue parsing full HTML but recombine the nodes back into larger text nodes after the HTML parse step. Or we could punt on this issue entirely.

For what it's worth, this comes up in my project since I want to wrap large <section>s with custom HTML. The problem is that this is causing parsing to break in strange ways (like this example).

@supermario
Copy link

supermario commented Aug 5, 2020

I've run into this same issue, in my case meaning I'm unable to use this package as I need lots of rich inline HTML.

Example:

Screenshot 2020-08-05 08 14 51

Becomes

Screenshot 2020-08-05 08 16 35

Note the way strong/** and emphasis/_ parsers works fine – I've tried to use them instead to achieve the styling I need but unfortunately there are only a few and no way to define more it seems 😅

@dillonkearns
Copy link
Owner

Yeah, it's been on my radar that I'll need some way to take in raw text. Or at least that's one solution.

Current Behavior

For example, the current behavior is :

Markdown.Html.tag "sup"
                (\renderedChildren -> Html.sup [] renderedChildren)
            ]
<p>2<sup><p>2</p></sup></p>

Option 1

We could add a function that gives you direct access to the raw text within the HTML, like this:

Markdown.Html.tagWithRawText "sup"
                (\rawText -> Html.sup [] [ Html.text rawText ])
            ]
<p>2<sup>2</sup></p>

Option 2

I guess the other way to approach it could be to not wrap the rendered children in a paragraph tag.

Markdown.Html.tag "sup"
                (\renderedChildren -> Html.sup [] renderedChildren)
            ]
<p>2<sup>2</sup></p>

There are a lot of implications to think through for that case. It may turn out that the semantics are more intuitive and fit into the spec better. Or it may be that Option 2 isn't viable because it conflicts with the Markdown spec.

I'd be curious to hear thoughts on that, though. Any other possibilities? Any initial impressions of those two options?

@supermario
Copy link

supermario commented Aug 5, 2020

@dillonkearns not sure if I'm just not understanding your proposed options, but it seems to me that nested tags should be parsed within their context, rather than always breaking out to the top level.

I think @hayesgm articulated it nicely in his first post – it seems that any parsed HtmlBlock lifts all the way up to the top (as evidenced by the <sup> example becoming a top level HtmlBlock).

Would it be possible to have it not do that, and nest just like * and _ usages do? Or is there some conflict with the HTML parser that wouldn't allow parsing nested HTML? I could see how perhaps there are finicky contexts with nested HTML across multiple lines, but personally I think single-line nested HTML would be a great start.

Edit: re-reading the options you've proposed, it seems you're addressing the children of <sup>. I'm not sure this is the issue – the contents seem okay, the problem I'm witnessing is the entire tag gets pulled up to the top level, i.e. the following:

# This should be <sup>one</sup> line!

Gets parsed as if you had typed

# This should be
<sup>one</sup>
line!

@hayesgm
Copy link
Author

hayesgm commented Aug 10, 2020

I believe there are actually two issues going on here:

Issue 1 - Paragraph Wrapping

renderedChildren are always wrapped in a <p>, which can break certain renders.

Example: * 2<sup>2</sup>
Renders to <p>Hi <sup><p>there</p></sup>.</p>

The paragraph tag inside of <sup> breaks rendering without custom styling (e.g. sup p { display: inline; }).

Issue 2 - Invalid Parsing of Nested HTML Tags

When nesting HTML tags, we parse HTML too early and significantly break rendering.

Example:

<section>
* 1
* 2<sup>2</sup>
* 3
</section>

Renders to:

<section>
  <ul>
    <li>1</li>
    <li>2</li>
  </ul>
  <sup><p>2</p></sup>
  <ul>
    <li>3</li>
  </ul>
</section>

Notice the two uls that are created. This is because when we're parsing HTML, we parse the tags and then render the contents separately, which lead to this error. The hard part about fixing this issue is that Markdown solves tags by basically ignoring tag-matching. For us, getting smart tag matching mixed with Markdown is going to be tricky (but worth it, since this library, I believe, encourage you to use smart custom tags).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants