Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize partiql-parser package with partiql-ast IR #1142

Merged
merged 10 commits into from
Jul 13, 2023
Merged

Conversation

RCHowell
Copy link
Contributor

@RCHowell RCHowell commented Jul 3, 2023

Relevant Issues

Rebase of #1110

Description

This PR introduces the new PartiQL AST and PartiQL Parser built on the latest IR classes. Both the partiql-ast and partiql-parser packages can be consumed as artifacts independently from partiql-lang-kotlin.

About the AST

The AST package is simply the type definitions file with a default factory. The easiest way to review is by looking at the definitions. https://github.com/partiql/partiql-lang-kotlin/blob/ast-rebase/partiql-ast/src/main/resources/partiql_ast.ion

Notably this AST is able to use PIG 1.0 features such as sets, maps, enums, imported types, and additional scalar types. We are able to better model the AST without concern for V0 AST compatibility as there is a layer of indirection between the AST's data structures and the serialized s-expression format. It is possible to convert to the legacy AST via the ToLegacyAst.kt helper. We also get to drop the dependency on pig-runtime and its workarounds.

Where possible, node definitions are nested (see DML and GPML as an example) and redundancy is eliminated (see unary and binary expressions as an example). Because we can now model sums of sum types, we can fix the inheritance of certain nodes, details in #1110.

About the Parser

The parser package is effectively the existing PartiQLPigParser implementation (credit @johnedquinn) updated to produce the new AST rather than the PIG AST. One notable difference is that metadata (ie source locations) is not tagged onto nodes, rather it is kept in a metadata map exactly like the PartiQL Rust implementation. A benefit of this approach is that we decouple ANTLR parse tree nodes from AST nodes because we can build a source location from the start-end of a grammar rule rather than from a single token as done in org.partiql.lang.syntax parsers.

Note also that the partiql-parser package and the existing AST use the same grammar found in partiql-parser/src/main/antlr/PartiQL.g4. This means the ANTLR-based implementations of both the old and new interfaces use the same grammar.

Other Information

  • Updated Unreleased Section in CHANGELOG: [YES/NO]
    TODO

  • Any backward-incompatible changes? [YES/NO]
    No. This change includes additions with the ability to produce the

  • Any new external dependencies? [YES/NO]
    No

  • Do your changes comply with the Contributing Guidelines
    and Code Style Guidelines? [YES/NO]
    Yes

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions
Copy link

github-actions bot commented Jul 3, 2023

Conformance comparison report

Base (44e8a8d) 6d2c28b +/-
% Passing 92.40% 92.40% 0.00%
✅ Passing 5376 5376 0
❌ Failing 442 442 0
🔶 Ignored 0 0 0
Total Tests 5818 5818 0

Number passing in both: 5376

Number failing in both: 442

Number passing in Base (44e8a8d) but now fail: 0

Number failing in Base (44e8a8d) but now pass: 0

@codecov-commenter
Copy link

codecov-commenter commented Jul 3, 2023

Codecov Report

Patch coverage: 85.24% and project coverage change: +0.04 🎉

Comparison is base (fe56425) 73.19% compared to head (606efd3) 73.24%.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1142      +/-   ##
============================================
+ Coverage     73.19%   73.24%   +0.04%     
+ Complexity     2376     2371       -5     
============================================
  Files           237      224      -13     
  Lines         17261    17417     +156     
  Branches       3133     3155      +22     
============================================
+ Hits          12635    12757     +122     
- Misses         3653     3682      +29     
- Partials        973      978       +5     
Flag Coverage Δ
CLI 14.28% <ø> (ø)
EXAMPLES 80.28% <ø> (ø)
LANG 79.09% <85.24%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...kotlin/org/partiql/lang/eval/EvaluatingCompiler.kt 80.70% <0.00%> (-0.14%) ⬇️
...iql/lang/eval/physical/PhysicalPlanCompilerImpl.kt 83.86% <0.00%> (-0.26%) ⬇️
.../eval/visitors/SubqueryCoercionVisitorTransform.kt 96.70% <0.00%> (-2.19%) ⬇️
...n/org/partiql/lang/prettyprint/ASTPrettyPrinter.kt 77.91% <0.00%> (-0.15%) ⬇️
...artiql/lang/types/PartiqlPhysicalTypeExtensions.kt 90.00% <0.00%> (-2.76%) ⬇️
...org/partiql/lang/prettyprint/QueryPrettyPrinter.kt 78.37% <50.00%> (-0.29%) ⬇️
.../org/partiql/lang/syntax/impl/PartiQLShimParser.kt 78.57% <78.57%> (ø)
...tlin/org/partiql/lang/syntax/util/DateTimeUtils.kt 79.54% <79.54%> (ø)
.../org/partiql/lang/syntax/impl/PartiQLPigVisitor.kt 89.47% <88.37%> (+0.16%) ⬆️
...c/main/kotlin/org/partiql/lang/errors/ErrorCode.kt 85.21% <100.00%> (+0.24%) ⬆️
... and 1 more

... and 16 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@johnedquinn johnedquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some great work. I have one additional comment (beyond the small comments I left): we need to update partiql-lang/build.gradle.kts to use api instead of libs(project()) for the parser.

Beyond my small comments, I think this PR is ready to be shipped. Thanks for all the effort!

partiql-parser/build.gradle.kts Outdated Show resolved Hide resolved
partiql-ast/build.gradle.kts Outdated Show resolved Hide resolved
Copy link
Contributor

@yliuuuu yliuuuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, only have some small nit.
One high level comment is, the new Parser that parse to the new AST package does not have its own test suit. I understand that the ShimParser is working as expected, but do we planning on port all those parser tests to test directly on the new AST.
Given some of the key distinguishes between the new ast and old one, (using partiQL value, different source location, etc), may worth to have those test cases some points in the future. (huge work for sure, not in scope of this PR for sure, just want to understand the road map here).

Another high level comment is: would be nice to have some of the PR description ported to package level README.

@RCHowell RCHowell requested review from johnedquinn and yliuuuu July 6, 2023 21:16
Copy link
Member

@johnedquinn johnedquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update partiql-lang/build.gradle.kts to use api instead of the libs(project())

partiql-ast/src/main/kotlin/org/partiql/ast/Ast.kt Outdated Show resolved Hide resolved
johnedquinn
johnedquinn previously approved these changes Jul 7, 2023
Copy link
Member

@johnedquinn johnedquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Copy link
Contributor

@vgapeyev vgapeyev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I continue studying this work, but thought of posting a few disjoint comments that have accumulated so far.

partiql-ast/README.adoc Show resolved Hide resolved
public interface AstNode {

// Every node gets an _id for associating any metadata
public val _id: String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this in AstFactoryImpl (generated code):

  public override val _id: () -> String = { """Ast-${"%08x".format(Random.nextInt())}""" }

Is this _id meant to be unique identification, or more like some sort of hash key for spreading things around? For the former, I'd be concerned that Random.nextInt() is not a reliable method to get uniqueness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be concerned that Random.nextInt() is not a reliable method to get uniqueness.

Correct, you can override if you require uniqueness. I considered this good enough, but I can make it truly unique if necessary. Related, I wanted the factory to be stateless

partiql-ast/src/main/kotlin/org/partiql/ast/Ast.kt Outdated Show resolved Hide resolved
}
}
}
----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MyFactory with ComparableIdentifier a good example for this facility (ability to define alternative AST factories)? In this sense: in our PartiQL implementation we deal with case sensitivity, but not in the way presented here; so why is this being (implicitly) recommended to others?

I might be touching a sensitive design issue by asking this question, but not sure which one. Could this be whether it is even useful to offer the ability of defining alternative AST factories?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this as an example because we had discussed this exact limitation last fall. This example shows how the current design is able to address situations in which generated sources do not handle edge cases.

I believe it is useful to provide customers with the facility to produce their own concrete implementations of AST nodes (if they want, hence a default). Here we are indeed eating our own dogfood while overcoming basic limitations that are trivial when hand-writing ASTs. The default factory is our default factory, but not the default factory.

Use of [the abstract factory] pattern enables interchangeable concrete implementations without changing the code that uses them, even at run. wikipedia

Is your primary concern that this is a bad example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My foremost concern is understanding what ready-to-roll uses we have for the significant extra complexity that Abstract Factory pattern introduces. Not nice-to-haves that a future customer might find useful, but 2-3 usecases (that is, more than 1!) that we intend to roll out ourselves -- otherwise we won't prove the dog food is edible.

The prototypical usecase for Abstract Factory is making it possible to develop a GUI application, such as a game, while abstracting away from a specific windowing toolkit, so that the application can run with with the look and feel of any toolkit for which someone would provide a concrete factory.

PartiQL AST (the interfaces in partiql-ast/org.partiql.ast.Types) is the analog of a "windowing toolkit abstraction", while PartiQL parser and PartiQL compiler are analogues of a "GUI game". What would be useful analogues of the concrete windowing toolkits that we are going to roll out as alternatives to the one concrete AST in partiql-ast/org.partiql.ast.impl.Types ?

I can vaguely imagine something on a scale of an AST variant that contains something like instrumentation to count uses of nodes during compilation or optimization transformations -- but I don't have a defensible case for this and don't recall us planning for anything of a kind. The case-sensitivity example is not convincing because it's something we should just provide (or not!) in the default AST implementation -- case sensitivity ought to be part of the PartiQL AST contract (or not! -- taken care of outside the AST, as currently).

To put it differently, what do we give up if org.partiql.ast.Types and org.partiql.ast.impl.Types are one and the same thing (just concrete classes, no interfaces)?

This probably requires a discussion more sprawling than can fit in PR comments...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware of the pattern background, and I see your concern now. The motivation for this was to have the flexibility of hand-written AST nodes, but the convenience of generated ones. This pattern enables us to get everything generated, but hand-write when we desire. The default GUI AST toolkit is what we provide, but customers can provide their own look and feel nodes when desired/required.

As you'll see in the PIG AST and the Physical Plan. Various mechanisms have been added for customer extension such as the metadata <string, object> map as well as the impl string on the partiql_physical domain (which is a point of customer extension, but is also related to the "physical"-ness of the domain).

My effort here was to provide a more powerful mechanism for both ourselves and customers to extend AST nodes. That is, they can use the generated identifiers to store metadata in a richer-typed structure than the object map (see SourceLocations) for an example. And they can handwrite implementations of our AST node interfaces.

My opinion is that, if I could go back in time, the AST would be internal and hand-written. It is the driving cause of major-version churn due to its constant evolving and surface area, as we as its inflexibility due to the generator limitations. I don't this PR solves that issue, but it gives us some better defaults and flexibility. If the AST were internal, then the separation of interfaces and indirection of instantiation would be superfluous. But I found this pattern to be a reasonable compromise given our situation.

partiql-ast/README.adoc Show resolved Hide resolved
johnedquinn
johnedquinn previously approved these changes Jul 12, 2023
vgapeyev
vgapeyev previously approved these changes Jul 12, 2023
Copy link
Contributor

@vgapeyev vgapeyev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove these comments or make it more clear what is the point?

partiql-ast/README.adoc Outdated Show resolved Hide resolved
partiql-ast/README.adoc Outdated Show resolved Hide resolved
vgapeyev
vgapeyev previously approved these changes Jul 12, 2023
Copy link
Member

@johnedquinn johnedquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@RCHowell RCHowell merged commit b1d67b2 into main Jul 13, 2023
@RCHowell RCHowell deleted the partiql-parser branch July 13, 2023 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants