Skip to content
This repository has been archived by the owner on Jun 8, 2024. It is now read-only.

[bug] Line length checking is not accurate #86

Open
Andrew15-5 opened this issue Sep 1, 2023 · 12 comments
Open

[bug] Line length checking is not accurate #86

Andrew15-5 opened this issue Sep 1, 2023 · 12 comments

Comments

@Andrew15-5
Copy link
Contributor

Andrew15-5 commented Sep 1, 2023

This is a sub-issue from #82 (see examples there). In a9e6249 line length can exceed the configured line limit. Specifically, if the text (content block) is used, then it can be chopped into smaller lines which won't alter the output.

EDIT by @astrale-sharp
The first line of a nested block also doesn't respect max_line_length (it's unaware of it's parent)

@astrale-sharp
Copy link
Owner

astrale-sharp commented Sep 2, 2023

Ex. 1

#very-long-long-long-long-long-function-name[Lorem ipsum dolor sit amet, consectetur adipiscing
  elit, sed do eiusmod tempor incididunt ut labore
  et dolore magnam aliquam quaerat voluptatem. Ut
  enim aeque doleamus animo, cum corpore dolemus,
  fieri.]

Ex. 2

/ Term : Some interesting words that are veeeeeeeeeeeeeeeeeery long

@astrale-sharp
Copy link
Owner

Ex 1 can be solved with your smart proposition which i'm feeling more on board with now even tho it make me feel uneasy still.
It also involves a hack: the parent tries to understand the child had to broke by scanning the child length and comparing it to max length minus 7 (it's used elsewhere too, see the todos in the code)

The alternative to the hack would be to have more context, maybe context could contain a hashmap of span -> info and info could hold a needed to break flag

@astrale-sharp
Copy link
Owner

Example 2 happens because \ Term : is not part of the markup, so the checking of the length only concerns it's sibling, it could be solved by checking if the parent is a term item and adding the length of the left siblings the the first line of size comparison

I'm going to take care of this in a few days but in the meanwhile if someone wants to do it, say so and I'll let you, I welcome PRs!

When i start working on this I'll say so

@Andrew15-5
Copy link
Contributor Author

If ex. 2 is indeed a Typst term, then the slash must be forward, not backward.

@astrale-sharp
Copy link
Owner

Edited, thank you :)

@astrale-sharp
Copy link
Owner

One reason I'm hesitant to take the provided solution is this:
f(arg)[too long][short][too long again][ long but can be broken cause spaced ]
What should be done then, everything in parenthesis except the last one? I mean probably cause it's prettier that way but I feel bad modifying that much code of the users

@Andrew15-5
Copy link
Contributor Author

The question is hard, because:

  • Typst has a unique syntax for required parameters: you can put them before/after parameters with default values
  • Typst has an option to omit () if the required parameter is content and just use [] without ()
  • LaTeX (typesetting language) doesn't have formatters, except https://github.com/cmhughes/latexindent.pl (which is basically a Perl wrapper with regex rules).
  • everyone has a unique taste in formatting

But regardless, as I said, the KISS principle is good as it can improve readability. There is already a nice example typst/typst#311 (comment):

show heading: it =>  {
    it
    par()[#text(size:0.5em)[#h(0.0em)]]
}

Specifically, par()[#text(size:0.5em)[#h(0.0em)]] can be greatly simplified to par(text(size:0.5em, h(0.0em))).

I already mentioned (somewhere) that I want text to be as packed as possible, like any Markdown file should be. Another example is (AGPL) license file: https://www.gnu.org/licenses/agpl-3.0.txt.

So, f(arg)[too long][short][too long again][ long but can be broken cause spaced ] IMO should be formatted as:

f(arg)[too long][short][too long again][ long but
  can be broken cause spaced ]

(limit is 50 columns)

...is what I would say, if I didn't think for a while. To eliminate multistep formatting that in nested cases can slow down the formatting time (maybe not), A more general rule should be applied. Like a rule priority. Something like this:

  1. if there is a whitespace(s) after [ + the line exceeds the limit + the [ does not exceed the limit, then put change (all) whitespace(s) with a (single) newline after the [
  2. if in the previous rule the newline was added + there is a whitespace(s) before the ] which is a pair to the [ from the previous rule, then change (all) whitespace(s) with a (single) newline before ]

Hmm... I don't know how it could be implemented, but there is still 2 steps in my rules. Ok, I'll finish the idea. After first 2 rules we will get:

f(arg)[too long][short][too long again][
  long but can be broken cause spaced
]

Of course somewhere between the rules we have to add 1 indent, because 1st line ends with [ and 3rd line starts with the paired ].

  1. And now we only need to chop the content (2nd line) if it exceeds the limit.

But it doesn't exceed, so we finished formatting.

@Andrew15-5
Copy link
Contributor Author

Here are similar examples (limit: 50, indent: 2):

#f(arg)[too long][short][too long again][ long but can be broken cause spaced]

#f(arg)[too long][short][too long again][ long abc long abc long long long long long long long long long abc long long long long long long long long long]

// According to my rules:

#f(arg)[too long][short][too long again][
  long but can be broken cause spaced]

#f(arg)[too long][short][too long again][
  long abc long abc long long long long long long
  long long long abc long long long long long long
  long long long]

// But this kinda looks better, especially if the content is very long:

#f(arg)[too long][short][too long again][ long but
  can be broken cause spaced]

#f(arg)[too long][short][too long again][ long abc
  long abc long long long long long long long long
  long abc long long long long long long long long
  long]

So maybe change rules to this:

  1. if there is whitespace(s) after [ and before ] + the [ does not exceed the limit, then put the content which is inside the [] on a separate line.
  2. if there is whitespace(s) only after [ , then break line at the most right whitespace.

I think the rules are more of a flowchart written linear (in one line), so maybe it's easier to follow a flowchart (with nested if statements and things like that).

@Andrew15-5
Copy link
Contributor Author

Here are all 4 variants:

Whitespace after [ and before ]

[ & ]:

#f(arg)[too long][short][too long again][ long but can be broken cause spaced ]

#f(arg)[too long][short][too long again][ long abc long abc long long long long long long long long long abc long long long long long long long long long ]
#f(arg)[too long][short][too long again][
  long but can be broken cause spaced
]

#f(arg)[too long][short][too long again][
  long abc long abc long long long long long long
  long long long abc long long long long long long
  long long long
]
Whitespace only after [

[ & ]:

#f(arg)[too long][short][too long again][ long but can be broken cause spaced]

#f(arg)[too long][short][too long again][ long abc long abc long long long long long long long long long abc long long long long long long long long long]
#f(arg)[too long][short][too long again][ long but
  can be broken cause spaced]

#f(arg)[too long][short][too long again][ long abc
  long abc long long long long long long long long
  long abc long long long long long long long long
  long]
Whitespace only before ]

[ & ]

#f(arg)[too long][short][too long again][long but can be broken cause spaced ]

#f(arg)[too long][short][too long again][long abc long abc long long long long long long long long long abc long long long long long long long long long ]
#f(arg)[too long][short][too long again][long but
  can be broken cause spaced ]

#f(arg)[too long][short][too long again][long abc
  long abc long long long long long long long long
  long abc long long long long long long long long
  long ]
No whitespaces

[ & ]:

#f(arg)[too long][short][too long again][long but can be broken cause spaced]

#f(arg)[too long][short][too long again][long abc long abc long long long long long long long long long abc long long long long long long long long long]
#f(arg)[too long][short][too long again][long but
  can be broken cause spaced]

#f(arg)[too long][short][too long again][long abc
  long abc long long long long long long long long
  long abc long long long long long long long long
  long]

@Andrew15-5
Copy link
Contributor Author

Here are all 4 variants if 2nd to last argument exceeds the limit:

Whitespace after [ and before ]

[ & ]:

#f(arg)[too long][short][ long but can be broken cause spaced ][too long again]

#f(arg)[too long][short][ long abc long abc long long long long long long long long long abc long long long long long long long long long ][too long again]
#f(arg)[too long][short][
  long but can be broken cause spaced
][too long again]

#f(arg)[too long][short][
  long abc long abc long long long long long long
  long long long abc long long long long long long
  long long long
][too long again]
Whitespace only after [

[ & ]:

#f(arg)[too long][short][ long but can be broken cause spaced][too long again]

#f(arg)[too long][short][ long abc long abc long long long long long long long long long abc long long long long long long long long long][too long again]
#f(arg)[too long][short][ long but can be broken
  cause spaced][too long again]

#f(arg)[too long][short][ long abc long abc long
  long long long long long long long long abc long
  long long long long long long long long][too
  long again]
Whitespace only before ]

[ & ]

#f(arg)[too long][short][long but can be broken cause spaced ][too long again]

#f(arg)[too long][short][long abc long abc long long long long long long long long long abc long long long long long long long long long ][too long again]
#f(arg)[too long][short][long but can be broken
  cause spaced ][too long again]

#f(arg)[too long][short][long abc long abc long
  long long long long long long long long abc long
  long long long long long long long long ][too
  long again]
No whitespaces

[ & ]:

#f(arg)[too long][short][long but can be broken cause spaced][too long again]

#f(arg)[too long][short][long abc long abc long long long long long long long long long abc long long long long long long long long long][too long again]
#f(arg)[too long][short][long but can be broken
  cause spaced][too long again]

#f(arg)[too long][short][long abc long abc long
  long long long long long long long long abc long
  long long long long long long long long][too
  long again]

One thing that could be a problem is something like this:

#f(arg)[too long][short][long abc long abc long
  long long long long long long long long abc long
  long long long long long long long long][too
  long again]

Maybe an easier approach is to only format things until the ]:

#f(arg)[too long][short][long abc long abc long
  long long long long long long long long abc long
  long long long long long long long long][too long again]

This way we can reapply rules to the last still exceeding the limit line. I
think all the rules will work very nicely the second time too.

But this again makes formatting in multiple steps. But since Rust's binaries are
superfast, it may not be a problem. What do you think, @astrale-sharp?

@astrale-sharp
Copy link
Owner

So, I don't think perf would be an issue, what I'm unsure about
is that we would then be reimplementing something very similar to how content block format themselves, but the parent would take care of it. It's an exception and isn't pretty but I guess we don't have a choice here

@astrale-sharp
Copy link
Owner

in lorem ipsum #loooooooooooooooong_var, long_var might go above the length and yet the line will be broken after it, not before

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants