-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Typst Language Support #302
Conversation
9a7ae81
to
6b5d78a
Compare
…of an environment's span
… possessives or conjunctions
…ove dict parsing to fit better
…ntent in span so quotes aren't escaped
6b5d78a
to
550cf20
Compare
Mostly just needs more tests. I'm no expert in Typst syntax, so if someone wants to help it would be much appreciated. |
If I can understand the core concepts mentioned above (what should/could be checked), then I maybe be able to add some tests. |
I decided to move typst parsing into its own crate, since it simplifies the feature dependency and establishes good precident that can be followed for unofficial/external parsers in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some stuff I didn't see last time. Sorry again, but I just need some clarification.
@grantlemons, don't forget to make the necessary changes to You may open another PR to do those, or push to master. Up to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eyyy! It was already merged, kewl. I guess I'm late to the party, but I still have a few things I want to discuss about the final patch.
P.S. I was really busy and only now found time to look into this.
#[test] | ||
fn contraction() { | ||
let document = Document::new_curated("doesn't", &Typst); | ||
let token_kinds = document.tokens().map(|t| t.kind).collect_vec(); | ||
dbg!(&token_kinds); | ||
|
||
assert_eq!(token_kinds.len(), 1); | ||
assert!(!token_kinds.into_iter().any(|t| { | ||
matches!( | ||
t, | ||
TokenKind::Word(WordMetadata { | ||
noun: Some(NounData { | ||
is_possessive: Some(true), | ||
.. | ||
}), | ||
.. | ||
}) | ||
) | ||
})) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhh, why is contraction identified a "possessive"? To me, it looks obviously wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, that must have happened when I was removing the contraction logic from the typst parser itself. I'm actually going to remove the contraction tests entirely, since that's document logic.
#[test] | ||
fn non_adjacent_spaces_not_condensed() { | ||
let source = r#"#authors_slice.join(", ", last: ", and ") bob"#; | ||
|
||
let document = Document::new_curated(source, &Typst); | ||
let token_kinds = document.tokens().map(|t| t.kind).collect_vec(); | ||
dbg!(&token_kinds); | ||
|
||
assert!(matches!( | ||
&token_kinds.as_slice(), | ||
&[ | ||
TokenKind::Unlintable, // authors_slice.join | ||
TokenKind::Punctuation(Punctuation::Comma), | ||
TokenKind::Space(1), | ||
TokenKind::Unlintable, // Ident | ||
TokenKind::Punctuation(Punctuation::Comma), | ||
TokenKind::Space(1), | ||
TokenKind::Word(_), // and | ||
TokenKind::Space(1), | ||
TokenKind::Space(2), | ||
TokenKind::Word(_), | ||
] | ||
)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how would harper work in this case? Will it complain that there are 2 consecutive commas in the "sentence"? (And also that there is not 2, but 3 consecutive spaces, even though the 1st of 3 spaces won't go directly next to the last 2 spaces.) Or are Unlintable
tokens split lintable tokens into isolated sentences/blocks?
, , and bob
And also, are 2 spaces intentional?
P.S. the // bob
comment is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the unlintable tokens split it up because our matching patterns don't expect them.
The two spaces are intentional, they are included because of some changes I was making to how Space Spans are condensed. Really, I don't think it's relevent to typst though, so I'm going to remove it from the test
#[test] | ||
fn header_parsing() { | ||
let source = r"= Header | ||
Paragraph"; | ||
|
||
let document = Document::new_curated(source, &Typst); | ||
let token_kinds = document.tokens().map(|t| t.kind).collect_vec(); | ||
dbg!(&token_kinds); | ||
|
||
let charslice = source.chars().collect_vec(); | ||
let tokens = document.tokens().collect_vec(); | ||
assert_eq!(tokens[0].span.get_content_string(&charslice), "Header"); | ||
assert_eq!(tokens[2].span.get_content_string(&charslice), "Paragraph"); | ||
|
||
assert!(matches!( | ||
&token_kinds.as_slice(), | ||
&[ | ||
TokenKind::Word(_), | ||
TokenKind::Newline(1), | ||
TokenKind::Word(_) | ||
] | ||
)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 concerns:
- Are all the leading spaces in
"= Header\n Paragraph"
string ignored by the harper? - The LF between two text lines are treated as spaces (
Space: "\n"
) in Typst's AST, which are replaced by whitespaces so that you can wrap your long paragraphs after 80's column. But in this case, the first line is a heading, which means that there isheading()
+par()
andNewline(1)
seems appropriate in this specific use case. So will it still printNewline
for wrapped long paragraph? And most importantly, what exactly doesNewline
mean to harper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all the leading spaces in
"= Header\n Paragraph"
string ignored by the harper?
I believe leading spaces are ignored, yes. @elijah-potter correct me if I'm wrong.
what exactly does Newline mean to harper?
TokenKind::Newline(count)
represents count
\n
newline characters.
I believe they act to break up sequences of tokens between lines. In this case, for instance, you don't want Harper to consider the text to be Header Paragraph
, you want it to be Header
and Paragraph
seperetely.
#[test] | ||
fn label_unlintable() { | ||
let source = r"= Header | ||
<label> | ||
Paragraph"; | ||
|
||
let document = Document::new_curated(source, &Typst); | ||
let token_kinds = document.tokens().map(|t| t.kind).collect_vec(); | ||
dbg!(&token_kinds); | ||
|
||
assert!(matches!( | ||
&token_kinds.as_slice(), | ||
&[ | ||
TokenKind::Word(_), | ||
TokenKind::Newline(1), | ||
TokenKind::Unlintable, | ||
TokenKind::Newline(1), | ||
TokenKind::Word(_), | ||
] | ||
)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, the LF after the label won't add any new line to the output, as labels by themselves aren't visible (usually they are attached to some element in the markup mode). So this again raises concern if Newline
token is really a hard newline that I'm thinking of (that breaks paragraphs into separate paragraphs or something similar).
let source = r#"group’s | ||
writing"#; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this is written differently from previous examples? Why not write:
let source = r#"
group's
writing
"#;
There is dedent
crate that can prettify this into "group's\nwriting"
, but that's a new dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because this test exists to test the offsetting functionality, converting between byte spans and char spans. I wanted to keep the string as similar as possible to the string that had issues, hence why I didn't change the formatting at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if an overly complex example is the intent here, so if it isn't, then I would like to simplify it and make it both: use best practices and be more practical. The way the sugar syntax for emph
is used, is really cursed and isn't the intended way. And in addition, fix some bugs that were not caught because the visual output isn't present, so not audited.
Simplified example
#let template(
title: "Default Title",
authors: ("Author 1", "Author 2"),
abstract: [*This is content*],
body,
) = {
set par(justify: true)
set page(
paper: "us-letter",
columns: 2,
number-align: top,
numbering: (..n) => if n.pos().first() > 1 {
n.pos().map(str).join(" of ") + h(1fr) + title
},
)
place(
top + center,
float: true,
scope: "parent",
clearance: 2em,
)[
#show heading: set text(17pt)
= #title
#let authors-line = if authors.len() > 3 {
// "et al." isn't parsed properly, but this isn't the fault of the Typst
// parser.
// authors-max3.push("et al.")
authors => authors.join(", ")
} else {
authors => authors.join(", ", last: ", and ")
}
#emph(authors-line(authors.slice(0, calc.min(authors.len(), 3))))
#par(justify: false)[
*Abstract* \
#abstract
]
]
body
}
#show: template.with(
title: "A fluid dynamic model for glacier flow",
authors: ("Grant Lemons", "John Doe", "Jane Doe"),
abstract: lorem(80),
)
= Introduction
#lorem(300)
= Related Work
#lorem(200)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the intent was to make it rather complicated, to see if any issues would arise from nesting statements and such. If you'd like to add more examples that demonstrate a more idiomatic typst document, that would be fantastic.
Sorry about that! I'm sure @grantlemons would be happy to open a new PR once he has time to look at your comments. Thanks for catching the stuff you've pointed out! I've been quite busy with web stuff, so I admittedly rushed this one. I see that you're a significant contributor on Typst. I'm impressed. It would be cool to get Harper plugin made for upstream. |
If you thought it was a negative "Eyyy!", then it wasn't, the hint is "kewl", so it's fine, no need to apologize. I also wouldn't want to delay the merge just because I'm busy and can't reply for a long time.
Actually, perhaps I can open the new PR. I'll add the Typst fixes, then anyone can request changes or open the PR into my PR and I will accept it (considering we agree on the changes).
You're welcome! As a perfectionist, this is what I do, hehe. And I'm really interested in both Typst and a tool that can deliver the best spellcheck experience into mine editor. I use Neovim btw.
Thank you! 💚 Actually, I'm not really a significant contributor yet (most of the changes so far were docs changes), but I'm intending to fix that. I have some things that are piling up, so I just need to finish other stuff so I can focus on making Typst even better! 🚀
Huh? plugin? What do you mean by that? Harper is already a "plugin" by itself that just needs to be added to an IDE. I'm open to collaboration. P.S. The biggest blocker for me for the long time was finishing the NixOS config so I can ditch the Pop!_OS 22.04 (with utterly broken GNOME, at times). So this is my top priority right now. |
@Andrew15-5 I've created a new PR #391, let me know if you have any further suggestions. (Idiomatic tests would probably be best suited by you creating a new pr though) |
Re-opening of #289 from my fork instead of a branch