Code blocks buggy in markdown renderer #3592

jmsmkn · 2024-10-08T12:25:17Z

MarkDown such as:

Should render as:

shouldn't be a code block

first_line = 1
for foo in bar:
  print(foo)

But actual result is:

Note:

The first line should not be a code block
The first line of the python block for some reason has a space
Multiple blank lines at the start of a MarkDown block should not result in a code block

chrisvanrun · 2024-11-04T13:56:41Z

+1

Wanted to report this:

But it's captured by the above issue post.

chrisvanrun · 2024-12-18T10:50:19Z

Using this:

This line shouln't be a code block

A single space should not be a code block

This line should be a code block: two spaces in front

Now for the fenced style of coding, using ```:

first_line = 1
for foo in bar:
  print(foo)

And another one:

# Typed with Python
first_line = 1
for foo in bar:
  print(foo)

Using ~x3 also works:

first_line = 1
for foo in bar:
  print(foo)

How about we manually introduce an inline code HTML tag:

This should work: Hello world

Inline, using enapsulating single back-ticks (`): var and foo=True

Currently result in:

chrisvanrun · 2024-12-18T10:51:50Z

The first line should not be a code block

Possibly it had 2 spaces? I think I prefer the explicit fenced code for code. So that should go.

Multiple blank lines at the start of a MarkDown block should not result in a code block

Could not reproduce this.

ammar257ammar · 2024-12-18T10:53:55Z

I checked this one too. Only from two spaces on the text becomes a codeblock

chrisvanrun · 2024-12-18T10:55:31Z

The cause of these is we have 3 extensions in place to handle code in markdown -> html:

BS4Treeprocessor which adds "codehilite" class to any <code> tags. Not always correct since the inline use of explicit <code> seems broken.
markdown.extensions.codehilite
markdown.extensions.fenced_code

They interact, sometimes weirdly.

chrisvanrun · 2024-12-19T08:11:30Z

Quick update.

Guh, quite the rabbit hole. Trying to fix the last blemish on the <code> rendering:

The BS4TreeProcessor is messing up some HTML snippets that the markdown core generates. For instance, the HTMLExtractor parses "<code>Hello</code>" as these two snippets: <code> and </code>. The BeautifulSoup parser, used internal by the BS4TreeProcessor handles incomplete HTML as input but outputs the fully corrected versions. Post cleaning then removes any orphaned end-tags (I think?). So <code class="codehilite"></code>Hello is the final result.

Figuring out why the HTMLExtractor decides that it's two seperate snippets is one solution. However, it's based on a monkey-patched HTML parser and hence quite complex. Marrying MD and HTML in a single parse tree is food for overly complex things, so I like the alternative better:

Add a 'final' pure-HTML parser postprocessor that adds the BS4 classes. It's quite simple to setup, with some added whistles for making things 'safe':

class ExtendTagClasses:
    def __init__(self, tag_classes):
        self.tag_class_dict = tag_classes

    def __call__(self, html):
        input_is_safe = isinstance(html, SafeString)

        soup = BeautifulSoup(html)
        for tag, classes in self.tag_class_dict.items():

            # Make extensions safe
            classes = [escape(c).strip() for c in classes]

            # Add extension to the class attribute
            for element in soup.find_all(tag):
                current_classes = element.get("class", [])
                element["class"] = [*current_classes, *classes]

        new_html = str(soup)

        if input_is_safe:
            new_html = mark_safe(new_html)

        return new_html

It's a nice construct but has the side-effect that the obfuscated email links (i.e. <a href="mailto:fake.email@email.com">fake.email@email.com</a>. are being de-obfuscated because of the way BeautifulSoup default parser reads the HTML source.

Not sure if at this point it is better to:

Add a small email obf post-processor.
Spent time trying to get the original TreeProcessor to work.

chrisvanrun · 2024-12-19T08:17:39Z

Do note: the email's are being de-obfuscated by the original TreeParser already if it happens to be within a targeted add-class tag!

jmsmkn · 2024-12-19T08:17:55Z

Err...

        return mark_safe(new_html)

chrisvanrun · 2024-12-19T08:18:31Z

Err...

        return mark_safe(new_html)

Work in progress! It will have tests but indeed: that should not go there!

jmsmkn · 2024-12-19T08:30:29Z

You can remove a for loop I think:

        for element in soup.find_all([*self.tag_class_dict.keys()]):
            current_classes = element.get("class", [])
            element["class"] = [*current_classes, *self.tag_class_dict[element.name]]

chrisvanrun · 2024-12-20T16:24:20Z

After serveral different appoaches I had the hunch of pre-escaping the character/entity references. Working PR is out.

miriam-groeneveld added the bug squash candidate label Dec 4, 2024

chrisvanrun self-assigned this Dec 18, 2024

chrisvanrun linked a pull request Dec 20, 2024 that will close this issue

Fix buggy code blocks in markdown renderer #3764

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code blocks buggy in markdown renderer #3592

Code blocks buggy in markdown renderer #3592

jmsmkn commented Oct 8, 2024 •

edited

Loading

chrisvanrun commented Nov 4, 2024

chrisvanrun commented Dec 18, 2024

chrisvanrun commented Dec 18, 2024

ammar257ammar commented Dec 18, 2024 •

edited

Loading

chrisvanrun commented Dec 18, 2024

chrisvanrun commented Dec 19, 2024 •

edited

Loading

chrisvanrun commented Dec 19, 2024

jmsmkn commented Dec 19, 2024

chrisvanrun commented Dec 19, 2024 •

edited

Loading

jmsmkn commented Dec 19, 2024

chrisvanrun commented Dec 20, 2024 •

edited

Loading

Code blocks buggy in markdown renderer #3592

Code blocks buggy in markdown renderer #3592

Comments

jmsmkn commented Oct 8, 2024 • edited Loading

chrisvanrun commented Nov 4, 2024

chrisvanrun commented Dec 18, 2024

chrisvanrun commented Dec 18, 2024

ammar257ammar commented Dec 18, 2024 • edited Loading

chrisvanrun commented Dec 18, 2024

chrisvanrun commented Dec 19, 2024 • edited Loading

chrisvanrun commented Dec 19, 2024

jmsmkn commented Dec 19, 2024

chrisvanrun commented Dec 19, 2024 • edited Loading

jmsmkn commented Dec 19, 2024

chrisvanrun commented Dec 20, 2024 • edited Loading

jmsmkn commented Oct 8, 2024 •

edited

Loading

ammar257ammar commented Dec 18, 2024 •

edited

Loading

chrisvanrun commented Dec 19, 2024 •

edited

Loading

chrisvanrun commented Dec 19, 2024 •

edited

Loading

chrisvanrun commented Dec 20, 2024 •

edited

Loading