-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initialize partiql-parser package with partiql-ast IR #1142
Conversation
Conformance comparison report
Number passing in both: 5376 Number failing in both: 442 Number passing in Base (44e8a8d) but now fail: 0 Number failing in Base (44e8a8d) but now pass: 0 |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1142 +/- ##
============================================
+ Coverage 73.19% 73.24% +0.04%
+ Complexity 2376 2371 -5
============================================
Files 237 224 -13
Lines 17261 17417 +156
Branches 3133 3155 +22
============================================
+ Hits 12635 12757 +122
- Misses 3653 3682 +29
- Partials 973 978 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is some great work. I have one additional comment (beyond the small comments I left): we need to update partiql-lang/build.gradle.kts
to use api
instead of libs(project())
for the parser.
Beyond my small comments, I think this PR is ready to be shipped. Thanks for all the effort!
partiql-lang/src/test/kotlin/org/partiql/lang/syntax/PartiQLParserWindowTests.kt
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me, only have some small nit.
One high level comment is, the new Parser that parse to the new AST package does not have its own test suit. I understand that the ShimParser is working as expected, but do we planning on port all those parser tests to test directly on the new AST.
Given some of the key distinguishes between the new ast and old one, (using partiQL value, different source location, etc), may worth to have those test cases some points in the future. (huge work for sure, not in scope of this PR for sure, just want to understand the road map here).
Another high level comment is: would be nice to have some of the PR description ported to package level README.
partiql-ast/src/main/kotlin/org/partiql/lang/ast/IsTransformedOrderByAliasMeta.kt
Show resolved
Hide resolved
partiql-parser/src/main/kotlin/org/partiql/parser/PartiQLParserBuilder.kt
Show resolved
Hide resolved
partiql-lang/src/main/kotlin/org/partiql/lang/syntax/PartiQLParserBuilder.kt
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update partiql-lang/build.gradle.kts
to use api
instead of the libs(project())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I continue studying this work, but thought of posting a few disjoint comments that have accumulated so far.
public interface AstNode { | ||
|
||
// Every node gets an _id for associating any metadata | ||
public val _id: String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this in AstFactoryImpl
(generated code):
public override val _id: () -> String = { """Ast-${"%08x".format(Random.nextInt())}""" }
Is this _id
meant to be unique identification, or more like some sort of hash key for spreading things around? For the former, I'd be concerned that Random.nextInt() is not a reliable method to get uniqueness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be concerned that Random.nextInt() is not a reliable method to get uniqueness.
Correct, you can override if you require uniqueness. I considered this good enough, but I can make it truly unique if necessary. Related, I wanted the factory to be stateless
partiql-parser/src/main/kotlin/org/partiql/parser/impl/PartiQLParserDefault.kt
Show resolved
Hide resolved
} | ||
} | ||
} | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is MyFactory
with ComparableIdentifier
a good example for this facility (ability to define alternative AST factories)? In this sense: in our PartiQL implementation we deal with case sensitivity, but not in the way presented here; so why is this being (implicitly) recommended to others?
I might be touching a sensitive design issue by asking this question, but not sure which one. Could this be whether it is even useful to offer the ability of defining alternative AST factories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this as an example because we had discussed this exact limitation last fall. This example shows how the current design is able to address situations in which generated sources do not handle edge cases.
I believe it is useful to provide customers with the facility to produce their own concrete implementations of AST nodes (if they want, hence a default). Here we are indeed eating our own dogfood while overcoming basic limitations that are trivial when hand-writing ASTs. The default factory is our default factory, but not the default factory.
Use of [the abstract factory] pattern enables interchangeable concrete implementations without changing the code that uses them, even at run. wikipedia
Is your primary concern that this is a bad example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My foremost concern is understanding what ready-to-roll uses we have for the significant extra complexity that Abstract Factory pattern introduces. Not nice-to-haves that a future customer might find useful, but 2-3 usecases (that is, more than 1!) that we intend to roll out ourselves -- otherwise we won't prove the dog food is edible.
The prototypical usecase for Abstract Factory is making it possible to develop a GUI application, such as a game, while abstracting away from a specific windowing toolkit, so that the application can run with with the look and feel of any toolkit for which someone would provide a concrete factory.
PartiQL AST (the interfaces in partiql-ast/org.partiql.ast.Types) is the analog of a "windowing toolkit abstraction", while PartiQL parser and PartiQL compiler are analogues of a "GUI game". What would be useful analogues of the concrete windowing toolkits that we are going to roll out as alternatives to the one concrete AST in partiql-ast/org.partiql.ast.impl.Types ?
I can vaguely imagine something on a scale of an AST variant that contains something like instrumentation to count uses of nodes during compilation or optimization transformations -- but I don't have a defensible case for this and don't recall us planning for anything of a kind. The case-sensitivity example is not convincing because it's something we should just provide (or not!) in the default AST implementation -- case sensitivity ought to be part of the PartiQL AST contract (or not! -- taken care of outside the AST, as currently).
To put it differently, what do we give up if org.partiql.ast.Types and org.partiql.ast.impl.Types are one and the same thing (just concrete classes, no interfaces)?
This probably requires a discussion more sprawling than can fit in PR comments...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm aware of the pattern background, and I see your concern now. The motivation for this was to have the flexibility of hand-written AST nodes, but the convenience of generated ones. This pattern enables us to get everything generated, but hand-write when we desire. The default GUI AST toolkit is what we provide, but customers can provide their own look and feel nodes when desired/required.
As you'll see in the PIG AST and the Physical Plan. Various mechanisms have been added for customer extension such as the metadata <string, object>
map as well as the impl
string on the partiql_physical
domain (which is a point of customer extension, but is also related to the "physical"-ness of the domain).
My effort here was to provide a more powerful mechanism for both ourselves and customers to extend AST nodes. That is, they can use the generated identifiers to store metadata in a richer-typed structure than the object map (see SourceLocations) for an example. And they can handwrite implementations of our AST node interfaces.
My opinion is that, if I could go back in time, the AST would be internal and hand-written. It is the driving cause of major-version churn due to its constant evolving and surface area, as we as its inflexibility due to the generator limitations. I don't this PR solves that issue, but it gives us some better defaults and flexibility. If the AST were internal, then the separation of interfaces and indirection of instantiation would be superfluous. But I found this pattern to be a reasonable compromise given our situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove these comments or make it more clear what is the point?
partiql-types/src/main/kotlin/org/partiql/value/datetime/impl/SqlDate.kt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Relevant Issues
Rebase of #1110
toLegacyAst()
without synthetic locationsorg.partiql.lang.syntax.PartiQLParser
interface via shimDescription
This PR introduces the new PartiQL AST and PartiQL Parser built on the latest IR classes. Both the partiql-ast and partiql-parser packages can be consumed as artifacts independently from partiql-lang-kotlin.
About the AST
The AST package is simply the type definitions file with a default factory. The easiest way to review is by looking at the definitions. https://github.com/partiql/partiql-lang-kotlin/blob/ast-rebase/partiql-ast/src/main/resources/partiql_ast.ion
Notably this AST is able to use PIG 1.0 features such as sets, maps, enums, imported types, and additional scalar types. We are able to better model the AST without concern for V0 AST compatibility as there is a layer of indirection between the AST's data structures and the serialized s-expression format. It is possible to convert to the legacy AST via the ToLegacyAst.kt helper. We also get to drop the dependency on pig-runtime and its workarounds.
Where possible, node definitions are nested (see DML and GPML as an example) and redundancy is eliminated (see unary and binary expressions as an example). Because we can now model sums of sum types, we can fix the inheritance of certain nodes, details in #1110.
About the Parser
The parser package is effectively the existing PartiQLPigParser implementation (credit @johnedquinn) updated to produce the new AST rather than the PIG AST. One notable difference is that metadata (ie source locations) is not tagged onto nodes, rather it is kept in a metadata map exactly like the PartiQL Rust implementation. A benefit of this approach is that we decouple ANTLR parse tree nodes from AST nodes because we can build a source location from the start-end of a grammar rule rather than from a single token as done in org.partiql.lang.syntax parsers.
Note also that the partiql-parser package and the existing AST use the same grammar found in partiql-parser/src/main/antlr/PartiQL.g4. This means the ANTLR-based implementations of both the old and new interfaces use the same grammar.
Other Information
Updated Unreleased Section in CHANGELOG: [YES/NO]
TODO
Any backward-incompatible changes? [YES/NO]
No. This change includes additions with the ability to produce the
Any new external dependencies? [YES/NO]
No
Do your changes comply with the Contributing Guidelines
and Code Style Guidelines? [YES/NO]
Yes
License Information
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.