Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent parsing of indented bullet lists/unordered lists #107

Open
Viir opened this issue Sep 22, 2021 · 7 comments
Open

Inconsistent parsing of indented bullet lists/unordered lists #107

Viir opened this issue Sep 22, 2021 · 7 comments

Comments

@Viir
Copy link

Viir commented Sep 22, 2021

I was surprised by how version 7.0.0 maps this markdown string:

## elm-markdown 7.0.0

   + 0
   + 1
   + 2
   + 4

   - 0
   - 1
   - 2
   - 3

Here is a screenshot of parsed and rendered, based on one of the example ellies:

image

https://ellie-app.com/fnbWFCZhzyTa1

There are three spaces in front of each list item. Is it desirable/intended to have this distinction between minus/plus character?

@Viir
Copy link
Author

Viir commented Sep 22, 2021

Reading about the goals of this markdown package in the readme, I guess we would look at https://github.github.com/gfm/#lists for a specification.

Since that is GitHub Flavored Markown, a faster way seems to add that string literally here to see how it renders in the comment:


elm-markdown 7.0.0

  • 0
  • 1
  • 2
  • 4
  • 0
  • 1
  • 2
  • 3

The rendering in the GitHub issue above looks like there is no difference between using minus and plus character for this list.

@Viir Viir changed the title Case of inconsistent list parsing? Inconsistent handling of unordered indented lists Sep 22, 2021
@Viir Viir changed the title Inconsistent handling of unordered indented lists Inconsistent handling of indented bullet lists Sep 22, 2021
@Viir
Copy link
Author

Viir commented Sep 22, 2021

I went searching for a change in the program code that would fix this inconsistency.
I'd start that search by adding a test for the scenario from this thread.
But running elm-test-rs on 1b0b3dc (from the master branch) produces this error, no test results at all:

Running 1 tests. To reproduce these results later,
run elm-test-rs with --seed 3729730344 and --fuzz 100


    A test group contains multiple tests named 'list parsing'. Do some renaming so that tests have unique names.


TEST RUN FAILED

Duration: 1 ms
Passed:   0
Failed:   1

@Viir
Copy link
Author

Viir commented Sep 22, 2021

I found the test modules call them 'unordered lists' (not 'bullet lists'), so maybe the issue title is not ideal yet. 🤔

@Viir Viir changed the title Inconsistent handling of indented bullet lists Inconsistent handling of indented bullet/unordered lists Sep 22, 2021
@Viir Viir changed the title Inconsistent handling of indented bullet/unordered lists Inconsistent handling of indented bullet lists/unordered lists Sep 22, 2021
@Viir
Copy link
Author

Viir commented Sep 22, 2021

If the indent before the list items is to be discarded, following might be a appropriate implementation for an automated test:

        , test "basic indented list with '+'" <|
            \() ->
                String.trimLeft """
   + Item 1
   + Item 2
   + Item 3
"""
                    |> Markdown.Parser.parse
                    |> Expect.equal
                        (Ok
                            [ UnorderedList Tight
                                [ ListItem NoTask [ Paragraph [ Text "Item 1" ] ]
                                , ListItem NoTask [ Paragraph [ Text "Item 2" ] ]
                                , ListItem NoTask [ Paragraph [ Text "Item 3" ] ]
                                ]
                            ]
                        )

This test yields following output with elm-test-rs:

Running 1 tests. To reproduce these results later,
run elm-test-rs with --seed 1296081860 and --fuzz 100

↓ list parsing
✗ basic indented list with '+'

    Ok [UnorderedList Tight [ListItem NoTask [Paragraph [Text "Item 1\n + Item 2\n + Item 3"]]]]
    ╷
    │ Expect.equal
    ╵
    Ok [UnorderedList Tight [ListItem NoTask [Paragraph [Text "Item 1"]],ListItem NoTask [Paragraph [Text "Item 2"]],ListItem NoTask [Paragraph [Text "Item 3"]]]]


TEST RUN FAILED

Duration: 8 ms
Passed:   0
Failed:   1

@Viir
Copy link
Author

Viir commented Sep 22, 2021

I don't have a lot of experience with parsing, but rummaging through the parsing code, I stumbled on this function that looks like a good candidate for experimenting with a change:

unorderedListMarkerParser : Parser UnorderedListMarker
unorderedListMarkerParser =
oneOf
[ succeed Minus
|. Extra.upTo 3 Whitespace.space
|. Advanced.symbol (Advanced.Token "-" (Parser.ExpectingSymbol "-"))
, succeed Plus
|. Advanced.symbol (Advanced.Token "+" (Parser.ExpectingSymbol "+"))
, succeed Asterisk
|. Advanced.symbol (Advanced.Token "*" (Parser.ExpectingSymbol "*"))
]

I tried this variant:

unorderedListMarkerParser : Parser UnorderedListMarker
unorderedListMarkerParser =
    oneOf
        [ succeed Minus
            |. Extra.upTo 3 Whitespace.space
            |. Advanced.symbol (Advanced.Token "-" (Parser.ExpectingSymbol "-"))
        , succeed Plus
            |. Extra.upTo 3 Whitespace.space
            |. Advanced.symbol (Advanced.Token "+" (Parser.ExpectingSymbol "+"))
        , succeed Asterisk
            |. Extra.upTo 3 Whitespace.space
            |. Advanced.symbol (Advanced.Token "*" (Parser.ExpectingSymbol "*"))
        ]

Guess that would have been too easy. I see no change in the test result.

My next guess is something else consumes that part with a higher priority.

@Viir Viir changed the title Inconsistent handling of indented bullet lists/unordered lists Inconsistent parsing of indented bullet lists/unordered lists Sep 22, 2021
@Viir
Copy link
Author

Viir commented Sep 22, 2021

Taking a break helped. I found an error in the test code. This one looks better:

        , test "basic list with '+' indented by 3" <|
            \() ->
                """
   + Item 1
   + Item 2
   + Item 3
"""
                    |> Markdown.Parser.parse
                    |> Expect.equal
                        (Ok
                            [ UnorderedList Tight
                                [ ListItem NoTask [ Paragraph [ Text "Item 1" ] ]
                                , ListItem NoTask [ Paragraph [ Text "Item 2" ] ]
                                , ListItem NoTask [ Paragraph [ Text "Item 3" ] ]
                                ]
                            ]
                        )

@Viir
Copy link
Author

Viir commented Sep 22, 2021

With more experimenting, I learned that I need to add backtracking here. This variant lets the test pass:

unorderedListMarkerParser : Parser UnorderedListMarker
unorderedListMarkerParser =
    oneOf
        [ backtrackable
            (succeed Minus
                |. Extra.upTo 3 Whitespace.space
                |. Advanced.symbol (Advanced.Token "-" (Parser.ExpectingSymbol "-"))
            )
        , succeed Plus
            |. Extra.upTo 3 Whitespace.space
            |. Advanced.symbol (Advanced.Token "+" (Parser.ExpectingSymbol "+"))
        , succeed Asterisk
            |. Extra.upTo 3 Whitespace.space
            |. Advanced.symbol (Advanced.Token "*" (Parser.ExpectingSymbol "*"))
        ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant