-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initialize partiql-parser package with partiql-ast IR #1142
Changes from 7 commits
a9f361e
d353514
ee82b7a
4c3e880
fa1eeee
6c60fad
62a4dab
ed4625d
ac2f411
606efd3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,364 @@ | ||
= PartiQL AST | ||
|
||
The PartiQL AST package contains interfaces, data classes, and utilities for manipulating a syntax tree. | ||
|
||
NOTE: If you are on an older version of PartiQL, you can convert to the old AST via `.toLegacyAst()` in `org.partiql.ast.helpers`. | ||
|
||
== Interfaces | ||
|
||
The interfaces are generated from `resources/partiql_ast.ion` (details in `lib/sprout/README`) | ||
|
||
=== Node | ||
|
||
[source,kotlin] | ||
---- | ||
public interface AstNode { | ||
|
||
// Every node gets an _id for associating any metadata | ||
public val _id: String | ||
|
||
public val children: List<AstNode> | ||
|
||
public fun <R, C> accept(visitor: AstVisitor<R, C>, ctx: C): R | ||
} | ||
---- | ||
|
||
=== Example | ||
|
||
.Example Definition | ||
[source,ion] | ||
---- | ||
expr::[ | ||
// ... | ||
binary::{ | ||
op: [ | ||
PLUS, MINUS, TIMES, DIVIDE, MODULO, CONCAT, | ||
AND, OR, | ||
EQ, NE, GT, GTE, LT, LTE, | ||
], | ||
lhs: expr, | ||
rhs: expr, | ||
}, | ||
// ... | ||
] | ||
---- | ||
|
||
.Generated Interface | ||
[source,kotlin] | ||
---- | ||
// Note: `Expr:AstNode` is a sealed interface of all expr variants | ||
|
||
public interface Binary : Expr { | ||
public val op: Op | ||
public val lhs: Expr | ||
public val rhs: Expr | ||
|
||
public fun copy( | ||
op: Op = this.op, | ||
lhs: Expr = this.lhs, | ||
rhs: Expr = this.rhs, | ||
): Binary | ||
|
||
public enum class Op { | ||
PLUS, | ||
MINUS, | ||
TIMES, | ||
DIVIDE, | ||
MODULO, | ||
CONCAT, | ||
AND, | ||
OR, | ||
EQ, | ||
NE, | ||
GT, | ||
GTE, | ||
LT, | ||
LTE, | ||
} | ||
} | ||
---- | ||
|
||
== Factory, DSL, and Builders | ||
|
||
The PartiQL AST library provides several creational patterns in `org.partiql.ast.builder` such as an abstract base factory, Kotlin DSL, and Java fluent-builders. | ||
These patterns enable customers to extend the AST to fit their needs, while providing a base which can be decorated appropriately. | ||
|
||
=== Factory Usage | ||
|
||
The factory is how you instantiate a node. The default factory can be called directly like, | ||
|
||
[source,kotlin] | ||
---- | ||
import org.partiql.ast.Ast | ||
|
||
Ast.exprLit(int32Value(1)) // expr.lit | ||
RCHowell marked this conversation as resolved.
Show resolved
Hide resolved
|
||
---- | ||
|
||
==== Custom Nodes | ||
|
||
Additionally, you can extend the abstract base factory and use it in builders as well as the DSL. This gives you full | ||
control over how your nodes are instantiated. If you are ambitious, you can implement your own versions of AST node interfaces and implement a base factory. This | ||
will allow you to create custom behaviors. For example, generated equals functions do not consider semantics. Perhaps | ||
we want to improve how we compare nodes? Here's an example that considers the case-sensitivity of identifiers. | ||
|
||
.Custom Node and Factory Example | ||
[source,kotlin] | ||
---- | ||
public abstract class MyFactory : AstBaseFactory() { | ||
|
||
override fun identifierSymbol(symbol: String, caseSensitivity: Identifier.CaseSensitivity): Identifier.Symbol { | ||
return ComparableIdentifier(_id(), symbol, caseSensitivity) | ||
} | ||
} | ||
|
||
class ComparableIdentifier( | ||
override val _id: String, | ||
override val symbol: String, | ||
override val caseSensitivity: Identifier.CaseSensitivity, | ||
) : Identifier.Symbol { | ||
|
||
// override copy, children | ||
|
||
override fun equals(other: Any?): Boolean { | ||
if (other == null || other !is Identifier.Symbol) return false // different type | ||
if (other === this) return true // same object | ||
return when (caseSensitivity) { | ||
Identifier.CaseSensitivity.SENSITIVE -> this.symbol == other.symbol | ||
Identifier.CaseSensitivity.INSENSITIVE -> this.symbol.lowercase() == other.symbol.lowercase() | ||
} | ||
} | ||
} | ||
---- | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is I might be touching a sensitive design issue by asking this question, but not sure which one. Could this be whether it is even useful to offer the ability of defining alternative AST factories? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I used this as an example because we had discussed this exact limitation last fall. This example shows how the current design is able to address situations in which generated sources do not handle edge cases. I believe it is useful to provide customers with the facility to produce their own concrete implementations of AST nodes (if they want, hence a default). Here we are indeed eating our own dogfood while overcoming basic limitations that are trivial when hand-writing ASTs. The default factory is our default factory, but not the default factory.
Is your primary concern that this is a bad example? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My foremost concern is understanding what ready-to-roll uses we have for the significant extra complexity that Abstract Factory pattern introduces. Not nice-to-haves that a future customer might find useful, but 2-3 usecases (that is, more than 1!) that we intend to roll out ourselves -- otherwise we won't prove the dog food is edible. The prototypical usecase for Abstract Factory is making it possible to develop a GUI application, such as a game, while abstracting away from a specific windowing toolkit, so that the application can run with with the look and feel of any toolkit for which someone would provide a concrete factory. PartiQL AST (the interfaces in partiql-ast/org.partiql.ast.Types) is the analog of a "windowing toolkit abstraction", while PartiQL parser and PartiQL compiler are analogues of a "GUI game". What would be useful analogues of the concrete windowing toolkits that we are going to roll out as alternatives to the one concrete AST in partiql-ast/org.partiql.ast.impl.Types ? I can vaguely imagine something on a scale of an AST variant that contains something like instrumentation to count uses of nodes during compilation or optimization transformations -- but I don't have a defensible case for this and don't recall us planning for anything of a kind. The case-sensitivity example is not convincing because it's something we should just provide (or not!) in the default AST implementation -- case sensitivity ought to be part of the PartiQL AST contract (or not! -- taken care of outside the AST, as currently). To put it differently, what do we give up if org.partiql.ast.Types and org.partiql.ast.impl.Types are one and the same thing (just concrete classes, no interfaces)? This probably requires a discussion more sprawling than can fit in PR comments... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm aware of the pattern background, and I see your concern now. The motivation for this was to have the flexibility of hand-written AST nodes, but the convenience of generated ones. This pattern enables us to get everything generated, but hand-write when we desire. The default As you'll see in the PIG AST and the Physical Plan. Various mechanisms have been added for customer extension such as the metadata My effort here was to provide a more powerful mechanism for both ourselves and customers to extend AST nodes. That is, they can use the generated identifiers to store metadata in a richer-typed structure than the object map (see SourceLocations) for an example. And they can handwrite implementations of our AST node interfaces. My opinion is that, if I could go back in time, the AST would be internal and hand-written. It is the driving cause of major-version churn due to its constant evolving and surface area, as we as its inflexibility due to the generator limitations. I don't this PR solves that issue, but it gives us some better defaults and flexibility. If the AST were internal, then the separation of interfaces and indirection of instantiation would be superfluous. But I found this pattern to be a reasonable compromise given our situation. |
||
|
||
=== DSL Usage | ||
|
||
The DSL is useful from Kotlin and is some syntax sugar over fluent builders. Here is how its used: | ||
|
||
.Default Factory DSL Example | ||
[source,kotlin] | ||
---- | ||
import org.partiql.ast.builder.ast | ||
|
||
// Tree for PartiQL `VALUES (1, 2)` | ||
ast { | ||
RCHowell marked this conversation as resolved.
Show resolved
Hide resolved
|
||
exprCollection(Expr.Collection.Type.VALUES) { | ||
values += exprLit(int32Value(1)) | ||
values += exprLit(int32Value(2)) | ||
} | ||
} | ||
|
||
// Tree for `SELECT a FROM T` | ||
ast { | ||
exprSFW { | ||
select = selectProject { | ||
items += selectProjectItemExpression { | ||
expr = exprVar { | ||
identifier = identifierSymbol("a", Identifier.CaseSensitivity.INSENSITIVE) | ||
scope = Expr.Var.Scope.DEFAULT | ||
} | ||
} | ||
} | ||
from = fromValue { | ||
expr = v(symbol) | ||
type = From.Value.Type.SCAN | ||
} | ||
} | ||
} | ||
---- | ||
|
||
.Fancier DSL Usage | ||
[source,kotlin] | ||
---- | ||
import org.partiql.ast.builder.ast | ||
import org.partiql.ast.builder.AstBuilder | ||
|
||
// define some helpers | ||
private fun AstBuilder.select(vararg s: String) = selectProject { | ||
s.forEach { | ||
items += selectProjectItemExpression(v(it)) | ||
} | ||
} | ||
|
||
private fun AstBuilder.table(symbol: String) = fromValue { | ||
expr = v(symbol) | ||
type = From.Value.Type.SCAN | ||
} | ||
|
||
private fun AstBuilder.v(symbol: String) = this.exprVar { | ||
identifier = id(symbol) | ||
scope = Expr.Var.Scope.DEFAULT | ||
} | ||
|
||
|
||
// Tree for `SELECT x, y, z FROM T` | ||
|
||
ast { | ||
exprSFW { | ||
select = select("x", "y", "z") | ||
from = table("T") | ||
} | ||
} | ||
---- | ||
|
||
.Custom Factory DSL Example | ||
[source,kotlin] | ||
---- | ||
import org.partiql.ast.builder.ast | ||
|
||
// This will instantiate your custom `ComparableIdentifier`. Nice! | ||
ast(myFactory) { | ||
exprSFW { | ||
select = select("x", "y", "z") | ||
from = table("T") | ||
} | ||
} | ||
---- | ||
|
||
IMPORTANT: The last examples works because the DSL block closes over the factory with an AstBuilder. This means that | ||
the helper extensions or any DSL usage will use the provided factory! | ||
|
||
=== Builder Usage | ||
|
||
The DSL is not much more than Kotlin syntactic sugar over traditional fluent-builder classes. If you are coming from Java, these will be useful. | ||
Every node defines a static `builder()` function. Keeping with the previous example, let's see how we can inject our custom | ||
factory. | ||
|
||
[source,kotlin] | ||
---- | ||
// instance of default IdentifierSymbolImpl | ||
val a = Identifier.Symbol.builder() | ||
.symbol("HELLO") | ||
.caseSensitivity(Identifier.CaseSensitivity.INSENSITIVE) | ||
.build() // empty, build with default factory | ||
|
||
// instance of ComparableIdentifier | ||
val b = Identifier.Symbol.builder() | ||
.symbol("hello") | ||
.caseSensitivity(Identifier.CaseSensitivity.INSENSITIVE) | ||
.build(myFactory) // nice! | ||
|
||
assert(b == a) // TRUE | ||
assert(a == b) // !! FALSE !! consider always using the same type of factory | ||
---- | ||
|
||
== Visitor and Rewriter | ||
|
||
The PartiQL AST is a set of interfaces, so how might we extend these for our own purposes? We do not have pattern matching in Kotlin/Java, so we use the visitor pattern. | ||
|
||
The visitor pattern is effectively adding methods to each object with some compile safety. You define a behavior and use the node `accept` the behavior. The visitor provides an additional parameter `ctx: C` which is the equivalent of arguments to each method for your behavior. | ||
|
||
[source,kotlin] | ||
---- | ||
public abstract class AstBaseVisitor<R, C> : AstVisitor<R, C> { | ||
|
||
public override fun visit(node: AstNode, ctx: C): R = node.accept(this, ctx) | ||
|
||
public open fun defaultVisit(node: AstNode, ctx: C): R { | ||
for (child in node.children) { | ||
child.accept(this, ctx) | ||
} | ||
return defaultReturn(node, ctx) | ||
} | ||
|
||
public abstract fun defaultReturn(node: AstNode, ctx: C): R | ||
} | ||
---- | ||
|
||
For example, let's implement a `toString(case: Case)` function on some basic nodes. | ||
|
||
[source,kotlin] | ||
---- | ||
// | ||
// Usage: | ||
// node.accept(ToString, Case.UPPER) // ~ node.toString(Case.UPPER) | ||
RCHowell marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// | ||
object ToString : AstBaseVisitor<String, Case>() { | ||
|
||
override fun defaultVisit(node: AstNode, ctx: Case) = defaultReturn(node, ctx) | ||
|
||
override fun defaultReturn(node: AstNode, ctx: Case): String = when (ctx) { | ||
Case.UPPER -> node::class.simpleName.uppercase() | ||
Case.LOWER -> node::class.simpleName.lowercase() | ||
Case.PASCAL -> node::class.simpleName | ||
Case.SNAKE -> snakeCaseHelper(node::class.simpleName) | ||
RCHowell marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
// Any other overrides you want! | ||
} | ||
---- | ||
|
||
|
||
=== Folding | ||
|
||
Folding is straightforward by using either mutable context or an immutable accumulators. The structure you fold to is | ||
entirely dependent on your use case, but here is a simple example with a mutable list that you can generalize. Often times you may fold to an entirely new domain — or fold to the same domain which we'll cover in the rewriter. | ||
|
||
.Example "ClassName" Collector | ||
[source,kotlin] | ||
---- | ||
// Traverse the tree collecting all node names | ||
object AstClassNameCollector { | ||
|
||
// Public static entry for Java style consumption | ||
@JvmStatic | ||
fun collect(node: AstNode): List<String> { | ||
val acc = mutableListOf<String>() | ||
node.accept(ToString, acc) // recall, we have node.toString() | ||
RCHowell marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return acc | ||
} | ||
|
||
// Private implementation | ||
private object ToString : AstBaseVisitor<String?, MutableList<String>>() { | ||
|
||
override fun defaultVisit(node: AstNode, ctx: MutableList<String>): String? { | ||
node.children.forEach { child -> child.accept(this, ctx) } // traverse | ||
defaultReturn(this, ctx)?.let { ctx.add(it) } | ||
} | ||
|
||
override fun defaultReturn(node: AstNode, ctx: MutableList<String>) = node::class.simpleName | ||
|
||
// Any other overrides you want! | ||
} | ||
} | ||
---- | ||
|
||
=== Rewriter | ||
|
||
See `org.partiql.ast.util.AstRewriter`. This class facilitates rewriting an AST; you need only override the relevant methods for your rewriter. | ||
|
||
=== Tips | ||
|
||
- Each `visit` is a function call; adding state to a visitor is akin to global variables. _Consider keeping state in the context parameter_. This is beneficial because you state is naturally scoped via the call stack. | ||
- Sometimes state in a visitor makes an implementation much cleaner (go for it!). Just remember that the visitor might not be re-usable or idempotent. | ||
- Consider using singletons/objects for stateless visitors | ||
- Consider making your visitors private with a single public static entry point. | ||
- When you make a private visitor, you can rename the ctx parameter to something relevant. Use the `Suppress("PARAMETER_NAME_CHANGED_ON_OVERRIDE")` to make the linter to relax. | ||
- If writing and using Kotlin, consider adding an extension method to the base class. This _really_ makes it look like you've opened the classes (but really it's just a static method). | ||
|
||
=== Understanding Visitors | ||
|
||
I believe Robert Nystrom captured the misunderstanding of visitors quite well: | ||
|
||
[quote] | ||
____ | ||
The Visitor pattern is the most widely misunderstood pattern in all of Design Patterns, which is really saying something when you look at the software architecture excesses of the past couple of decades. | ||
|
||
The trouble starts with terminology. The pattern isn’t about “visiting”, and the “accept” method in it doesn’t conjure up any helpful imagery either. Many think the pattern has to do with traversing trees, which isn’t the case at all. We are going to use it on a set of classes that are tree-like, but that’s a coincidence. As you’ll see, the pattern works as well on a single object. | ||
|
||
The Visitor pattern is really about approximating the functional style within an OOP language. It lets us add new columns to that table easily. We can define all of the behavior for a new operation on a set of types in one place, without having to touch the types themselves. It does this the same way we solve almost every problem in computer science: by adding a layer of indirection. | ||
|
||
-- Robert Nystrom, Crafting Interpreters | ||
____ | ||
|
||
Additionally, see how the wiki page explicitly mentions pattern matching. Kotlin is interesting because we have something _like_ pattern matching, but the PartiQL AST library is intended for consumption from both Kotlin and Java. | ||
|
||
[quote] | ||
____ | ||
A visitor pattern is a software design pattern and separates the algorithm from the object structure. Because of this separation new operations can be added to existing object structures without modifying the structures. It is one way to follow the open/closed principle in object-oriented programming and software engineering. | ||
|
||
In essence, the visitor allows adding new virtual functions to a family of classes, without modifying the classes. Instead, a visitor class is created that implements all of the appropriate specializations of the virtual function. The visitor takes the instance reference as input, and implements the goal through double dispatch. | ||
|
||
Programming languages with sum types and pattern matching obviate many of the benefits of the visitor pattern, as the visitor class is able to both easily branch on the type of the object and generate a compiler error if a new object type is defined which the visitor does not yet handle. | ||
|
||
https://en.wikipedia.org/wiki/Visitor_pattern | ||
____ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this in
AstFactoryImpl
(generated code):Is this
_id
meant to be unique identification, or more like some sort of hash key for spreading things around? For the former, I'd be concerned that Random.nextInt() is not a reliable method to get uniqueness.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, you can override if you require uniqueness. I considered this good enough, but I can make it truly unique if necessary. Related, I wanted the factory to be stateless