Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve AST with respect to estree #2854

Open
1 of 12 tasks
Boshen opened this issue Mar 29, 2024 · 6 comments
Open
1 of 12 tasks

Improve AST with respect to estree #2854

Boshen opened this issue Mar 29, 2024 · 6 comments
Assignees
Labels
A-ast Area - AST P-high Priority - High

Comments

@Boshen
Copy link
Member

Boshen commented Mar 29, 2024

This is not exhaustive but should cover all 'types' of differences to resolve:

  • Part of the AST is still based on the "import assertion" spec and should be updated. I want to tackle this I think it's a good next for me to do.
  • setProp(node, "typeArguments", node.typeParameters) is not a mistake, it's because TSESLint deprecated typeParameters in multiple nodes for typeArguments but keep both for now for backward compatibility. I'm not sure what is the right move for OXC here. For ref: fix: rename typeParameters to typeArguments where needed typescript-eslint/typescript-eslint#5384
  • Sometimes node are missing boolean, like prefix: true on UnaryExpression. I don't yet know how to do that in serde
  • StaticMemberExpression, ComputedMemberExpression & PrivateFieldExpression need to be serialized to MemberExpression with few changes
  • StringLiteral, BooleanLiteral, ... need to be serialized as Literal with raw value set. For BigInt, because it's not in the JSON spec, it would still requires some work on the JS side but this can be done as part of the reviver function when desrializing.
  • modifiers is not an ESTree concept. So anytime there is this prop in OXC it probably means you need to set one or two boolean on the equivalent node. Examples are VariableDeclaration.declare, ClassExpression.abstract, TSEnumDeclaration.const. My take here is that it will make the OXC AST more clean and easy to explore to have named booleans on nodes instead of a comment to say which one are 'allowed'. But I don't know the implication of this.
  • Some types that are nullable in OXC default to empty array in ESTree, like TSInterfaceDeclaration.extends, ClassExpression.implements
  • FunctionBody is a BlockStatement in ESTree
  • FormalParameters is closer to ESTree now, but should be inlined directly into <node>.params in multiple nodes
  • TSMappedTypeModifierOperator is boolean | '-' | '+' | undefined in TSESTree, I think my conversion is incomplete at the moment
  • UsingDeclaration is a VariableDeclaration (Not done in my wrapper)
  • Not sure yet, but the structure of hashbang, comments & directives needs some updates

Some changes could be seen as rename only, but as soon we touch the name of the nodes the type generation became complex to handle that why I didn't push more PRs for that for now.

For info the TSUnionType hack is really prettier specific.

But the biggest blocker to make my repo public is the utf16 conversion. The time to print some large files goes in seconds with the dump implementation, and because I saw you plan to solve that I don't want to optimize for it.

Some good news is that my repo successfully gave the same output of prettier on ~500 TSX files, ~8k dts files & 10k files inside my node_modules. This is obviously a very biased dataset, but still promising!

Originally posted by @ArnaudBarre in #2463 (comment)

@Boshen Boshen changed the title Improve AST in respect to estree Improve AST with respect to estree Mar 29, 2024
@Boshen
Copy link
Member Author

Boshen commented Mar 29, 2024

modifiers is not an ESTree concept. So anytime there is this prop in OXC it probably means you need to set one or two boolean on the equivalent node. Examples are VariableDeclaration.declare, ClassExpression.abstract, TSEnumDeclaration.const. My take here is that it will make the OXC AST more clean and easy to explore to have named booleans on nodes instead of a comment to say which one are 'allowed'. But I don't know the implication of this.

The "modifiers" concept was ported from a contributor from a long time ago. We may need to reevaluate this decision and make larger changes. The underlying problem is how recoverable our parsing should be.

Boshen pushed a commit that referenced this issue Apr 22, 2024
Don't include `trailing_comma` fields in JSON AST, for compat with
ESTree (#2854).
@sxzz

This comment was marked as resolved.

@sapphi-red
Copy link
Contributor

sapphi-red commented Jan 16, 2025

@overlookmotel
Copy link
Contributor

overlookmotel commented Jan 17, 2025

To surface all the obscure differences between our JS-side AST and ESTree, I think we ideally need conformance tests. Initially we'll have tons of failures, but we can chip away at them until there are none. I suggest:

  • Parse all Test262 fixtures with Acorn and write the resulting ASTs as JSON files.
  • Run Oxc parser on same fixtures and serialize to JSON with serde.
  • Conformance runner compare the two, and generate snapshot files showing which fixtures fail.

The Acorn AST JSON files will take time to compute and will be many many files. We could:

  • Create a new repo with the code to run Acorn.
  • Check in the resulting JSON files to that repo.
  • Link that repo into main Oxc repo as a git submodule (same as we do with Test262, Babel, etc fixtures).

Then the conformance tests could be written purely in Rust, and we only need to run Acorn once, not every time conformance runs.

That'll only cover us for JS syntax. I guess we could do similar for TS ASTs using fixtures from TypeScript repo and using typescript-eslint's parser to generate the JSON files. But personally I think we should concentrate on the JS AST first.

Does that sound like a good idea? If so, does anyone have an interest in building the infrastructure?

@ArnaudBarre
Copy link
Contributor

ArnaudBarre commented Jan 17, 2025

I think that a tool like this would be useful for interfacing with other tools that manipulate source code (my initial goal that I still want to pursue at some point is having a fast prettier parser). And for me source code is TS code and I don't know anyone still thinking that people write mainly JS (expect ESLint). So I think the infra should target TS code in priority, but I don't have the bandwidth to do it so maybe Test262 is better than nothing.

@overlookmotel
Copy link
Contributor

Just to clarify: The target is very much TS. But as TS is a superset of JS, in my opinion it'd be easier if we tackle JS first, and then go on to TS.

I am also keen to make sure that there aren't any places where JS and TS ASTs diverge from each other (aside from TS AST adding new types, and new properties to existing JS types). i.e. I want to make sure that we can satisfy both ESTree and TS AST compatibility with a single AST shape. If we can't, that will inform how the serializer needs to be designed (which is a work in progress - #6347).

@Boshen Boshen added P-high Priority - High A-ast Area - AST labels Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ast Area - AST P-high Priority - High
Projects
None yet
Development

No branches or pull requests

5 participants