Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Typst Language Support #302

Merged
merged 58 commits into from
Jan 13, 2025

Conversation

grantlemons
Copy link
Collaborator

@grantlemons grantlemons commented Dec 4, 2024

Re-opening of #289 from my fork instead of a branch

@grantlemons
Copy link
Collaborator Author

grantlemons commented Dec 18, 2024

Mostly just needs more tests. I'm no expert in Typst syntax, so if someone wants to help it would be much appreciated.

@Andrew15-5
Copy link

If I can understand the core concepts mentioned above (what should/could be checked), then I maybe be able to add some tests.

@grantlemons
Copy link
Collaborator Author

I decided to move typst parsing into its own crate, since it simplifies the feature dependency and establishes good precident that can be followed for unofficial/external parsers in the future.

Copy link
Collaborator

@elijah-potter elijah-potter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some stuff I didn't see last time. Sorry again, but I just need some clarification.

harper-cli/Cargo.toml Outdated Show resolved Hide resolved
harper-cli/src/main.rs Outdated Show resolved Hide resolved
harper-ls/Cargo.toml Show resolved Hide resolved
harper-ls/src/backend.rs Show resolved Hide resolved
harper-typst/src/lib.rs Outdated Show resolved Hide resolved
harper-typst/src/lib.rs Outdated Show resolved Hide resolved
harper-typst/src/lib.rs Outdated Show resolved Hide resolved
@elijah-potter elijah-potter merged commit 6d8904c into Automattic:master Jan 13, 2025
17 checks passed
@elijah-potter
Copy link
Collaborator

@grantlemons, don't forget to make the necessary changes to nvim-lspconfig and the Visual Studio Code plugin as well as the documentation.

You may open another PR to do those, or push to master. Up to you.

Copy link

@Andrew15-5 Andrew15-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eyyy! It was already merged, kewl. I guess I'm late to the party, but I still have a few things I want to discuss about the final patch.

P.S. I was really busy and only now found time to look into this.

Comment on lines +44 to +63
#[test]
fn contraction() {
let document = Document::new_curated("doesn't", &Typst);
let token_kinds = document.tokens().map(|t| t.kind).collect_vec();
dbg!(&token_kinds);

assert_eq!(token_kinds.len(), 1);
assert!(!token_kinds.into_iter().any(|t| {
matches!(
t,
TokenKind::Word(WordMetadata {
noun: Some(NounData {
is_possessive: Some(true),
..
}),
..
})
)
}))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhh, why is contraction identified a "possessive"? To me, it looks obviously wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that must have happened when I was removing the contraction logic from the typst parser itself. I'm actually going to remove the contraction tests entirely, since that's document logic.

Comment on lines +191 to +214
#[test]
fn non_adjacent_spaces_not_condensed() {
let source = r#"#authors_slice.join(", ", last: ", and ") bob"#;

let document = Document::new_curated(source, &Typst);
let token_kinds = document.tokens().map(|t| t.kind).collect_vec();
dbg!(&token_kinds);

assert!(matches!(
&token_kinds.as_slice(),
&[
TokenKind::Unlintable, // authors_slice.join
TokenKind::Punctuation(Punctuation::Comma),
TokenKind::Space(1),
TokenKind::Unlintable, // Ident
TokenKind::Punctuation(Punctuation::Comma),
TokenKind::Space(1),
TokenKind::Word(_), // and
TokenKind::Space(1),
TokenKind::Space(2),
TokenKind::Word(_),
]
))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how would harper work in this case? Will it complain that there are 2 consecutive commas in the "sentence"? (And also that there is not 2, but 3 consecutive spaces, even though the 1st of 3 spaces won't go directly next to the last 2 spaces.) Or are Unlintable tokens split lintable tokens into isolated sentences/blocks?

, , and   bob

And also, are 2 spaces intentional?

P.S. the // bob comment is missing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the unlintable tokens split it up because our matching patterns don't expect them.

The two spaces are intentional, they are included because of some changes I was making to how Space Spans are condensed. Really, I don't think it's relevent to typst though, so I'm going to remove it from the test

Comment on lines +216 to +238
#[test]
fn header_parsing() {
let source = r"= Header
Paragraph";

let document = Document::new_curated(source, &Typst);
let token_kinds = document.tokens().map(|t| t.kind).collect_vec();
dbg!(&token_kinds);

let charslice = source.chars().collect_vec();
let tokens = document.tokens().collect_vec();
assert_eq!(tokens[0].span.get_content_string(&charslice), "Header");
assert_eq!(tokens[2].span.get_content_string(&charslice), "Paragraph");

assert!(matches!(
&token_kinds.as_slice(),
&[
TokenKind::Word(_),
TokenKind::Newline(1),
TokenKind::Word(_)
]
))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 concerns:

  1. Are all the leading spaces in "= Header\n Paragraph" string ignored by the harper?
  2. The LF between two text lines are treated as spaces (Space: "\n") in Typst's AST, which are replaced by whitespaces so that you can wrap your long paragraphs after 80's column. But in this case, the first line is a heading, which means that there is heading() + par() and Newline(1) seems appropriate in this specific use case. So will it still print Newline for wrapped long paragraph? And most importantly, what exactly does Newline mean to harper?

Copy link
Collaborator Author

@grantlemons grantlemons Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all the leading spaces in "= Header\n Paragraph" string ignored by the harper?

I believe leading spaces are ignored, yes. @elijah-potter correct me if I'm wrong.

what exactly does Newline mean to harper?

TokenKind::Newline(count) represents count \n newline characters.

I believe they act to break up sequences of tokens between lines. In this case, for instance, you don't want Harper to consider the text to be Header Paragraph, you want it to be Header and Paragraph seperetely.

Comment on lines +260 to +280
#[test]
fn label_unlintable() {
let source = r"= Header
<label>
Paragraph";

let document = Document::new_curated(source, &Typst);
let token_kinds = document.tokens().map(|t| t.kind).collect_vec();
dbg!(&token_kinds);

assert!(matches!(
&token_kinds.as_slice(),
&[
TokenKind::Word(_),
TokenKind::Newline(1),
TokenKind::Unlintable,
TokenKind::Newline(1),
TokenKind::Word(_),
]
))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, the LF after the label won't add any new line to the output, as labels by themselves aren't visible (usually they are attached to some element in the markup mode). So this again raises concern if Newline token is really a hard newline that I'm thinking of (that breaks paragraphs into separate paragraphs or something similar).

Comment on lines +316 to +317
let source = r#"group’s
writing"#;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is written differently from previous examples? Why not write:

        let source = r#"
            group's
            writing
        "#;

There is dedent crate that can prettify this into "group's\nwriting", but that's a new dependency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because this test exists to test the offsetting functionality, converting between byte spans and char spans. I wanted to keep the string as similar as possible to the string that had issues, hence why I didn't change the formatting at all.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if an overly complex example is the intent here, so if it isn't, then I would like to simplify it and make it both: use best practices and be more practical. The way the sugar syntax for emph is used, is really cursed and isn't the intended way. And in addition, fix some bugs that were not caught because the visual output isn't present, so not audited.

Simplified example
#let template(
  title: "Default Title",
  authors: ("Author 1", "Author 2"),
  abstract: [*This is content*],
  body,
) = {
  set par(justify: true)
  set page(
    paper: "us-letter",
    columns: 2,
    number-align: top,
    numbering: (..n) => if n.pos().first() > 1 {
      n.pos().map(str).join(" of ") + h(1fr) + title
    },
  )

  place(
    top + center,
    float: true,
    scope: "parent",
    clearance: 2em,
  )[
    #show heading: set text(17pt)
    = #title

    #let authors-line = if authors.len() > 3 {
      // "et al." isn't parsed properly, but this isn't the fault of the Typst
      // parser.
      // authors-max3.push("et al.")
      authors => authors.join(", ")
    } else {
      authors => authors.join(", ", last: ", and ")
    }
    #emph(authors-line(authors.slice(0, calc.min(authors.len(), 3))))

    #par(justify: false)[
      *Abstract* \
      #abstract
    ]
  ]

  body
}

#show: template.with(
  title: "A fluid dynamic model for glacier flow",
  authors: ("Grant Lemons", "John Doe", "Jane Doe"),
  abstract: lorem(80),
)

= Introduction
#lorem(300)

= Related Work
#lorem(200)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the intent was to make it rather complicated, to see if any issues would arise from nesting statements and such. If you'd like to add more examples that demonstrate a more idiomatic typst document, that would be fantastic.

@elijah-potter
Copy link
Collaborator

Eyyy! It was already merged, kewl. I guess I'm late to the party, but I still have a few things I want to discuss about the final patch.

Sorry about that! I'm sure @grantlemons would be happy to open a new PR once he has time to look at your comments. Thanks for catching the stuff you've pointed out! I've been quite busy with web stuff, so I admittedly rushed this one.

I see that you're a significant contributor on Typst. I'm impressed. It would be cool to get Harper plugin made for upstream.

@Andrew15-5
Copy link

Sorry about that!

If you thought it was a negative "Eyyy!", then it wasn't, the hint is "kewl", so it's fine, no need to apologize. I also wouldn't want to delay the merge just because I'm busy and can't reply for a long time.

I'm sure @grantlemons would be happy to open a new PR once he has time to look at your comments.

Actually, perhaps I can open the new PR. I'll add the Typst fixes, then anyone can request changes or open the PR into my PR and I will accept it (considering we agree on the changes).

Thanks for catching the stuff you've pointed out!

You're welcome! As a perfectionist, this is what I do, hehe. And I'm really interested in both Typst and a tool that can deliver the best spellcheck experience into mine editor. I use Neovim btw.

I see that you're a significant contributor on Typst. I'm impressed.

Thank you! 💚 Actually, I'm not really a significant contributor yet (most of the changes so far were docs changes), but I'm intending to fix that. I have some things that are piling up, so I just need to finish other stuff so I can focus on making Typst even better! 🚀

It would be cool to get Harper plugin made for upstream.

Huh? plugin? What do you mean by that? Harper is already a "plugin" by itself that just needs to be added to an IDE. I'm open to collaboration.

P.S. The biggest blocker for me for the long time was finishing the NixOS config so I can ditch the Pop!_OS 22.04 (with utterly broken GNOME, at times). So this is my top priority right now.

@grantlemons grantlemons mentioned this pull request Jan 15, 2025
@grantlemons
Copy link
Collaborator Author

@Andrew15-5 I've created a new PR #391, let me know if you have any further suggestions. (Idiomatic tests would probably be best suited by you creating a new pr though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants