Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize partiql-parser package with partiql-ast IR #1142

Merged
merged 10 commits into from
Jul 13, 2023
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,13 @@ on:
- '**'
- '!docs/**'
- '!**/*.md'
- '!**/*.adoc'
pull_request:
paths:
- '**'
- '!docs/**'
- '!**.*.md'
- '!**/*.md'
- '!**/*.adoc'

jobs:
test:
Expand Down
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,12 @@ Thank you to all who have contributed!
### Added
- Adds `org.partiql.value` (experimental) package for reading/writing PartiQL
values
- Adds `org.partiql.ast` package and usage documentation
- Adds `org.partiql.parser` package and usage documentation
- Adds PartiQL's Timestamp Data Model.
- Adds support for Timestamp constructor call in Parser.


### Changed

### Deprecated
Expand Down
364 changes: 364 additions & 0 deletions partiql-ast/README.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,364 @@
= PartiQL AST

The PartiQL AST package contains interfaces, data classes, and utilities for manipulating a syntax tree.

NOTE: If you are on an older version of PartiQL, you can convert to the old AST via `.toLegacyAst()` in `org.partiql.ast.helpers`.

== Interfaces

The interfaces are generated from `resources/partiql_ast.ion` (details in `lib/sprout/README`)

=== Node

[source,kotlin]
----
public interface AstNode {

// Every node gets an _id for associating any metadata
public val _id: String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this in AstFactoryImpl (generated code):

  public override val _id: () -> String = { """Ast-${"%08x".format(Random.nextInt())}""" }

Is this _id meant to be unique identification, or more like some sort of hash key for spreading things around? For the former, I'd be concerned that Random.nextInt() is not a reliable method to get uniqueness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be concerned that Random.nextInt() is not a reliable method to get uniqueness.

Correct, you can override if you require uniqueness. I considered this good enough, but I can make it truly unique if necessary. Related, I wanted the factory to be stateless


public val children: List<AstNode>

public fun <R, C> accept(visitor: AstVisitor<R, C>, ctx: C): R
}
----

=== Example

.Example Definition
[source,ion]
----
expr::[
// ...
binary::{
op: [
PLUS, MINUS, TIMES, DIVIDE, MODULO, CONCAT,
AND, OR,
EQ, NE, GT, GTE, LT, LTE,
],
lhs: expr,
rhs: expr,
},
// ...
]
----

.Generated Interface
[source,kotlin]
----
// Note: `Expr:AstNode` is a sealed interface of all expr variants

public interface Binary : Expr {
public val op: Op
public val lhs: Expr
public val rhs: Expr

public fun copy(
op: Op = this.op,
lhs: Expr = this.lhs,
rhs: Expr = this.rhs,
): Binary

public enum class Op {
PLUS,
MINUS,
TIMES,
DIVIDE,
MODULO,
CONCAT,
AND,
OR,
EQ,
NE,
GT,
GTE,
LT,
LTE,
}
}
----

== Factory, DSL, and Builders

The PartiQL AST library provides several creational patterns in `org.partiql.ast.builder` such as an abstract base factory, Kotlin DSL, and Java fluent-builders.
These patterns enable customers to extend the AST to fit their needs, while providing a base which can be decorated appropriately.

=== Factory Usage

The factory is how you instantiate a node. The default factory can be called directly like,

[source,kotlin]
----
import org.partiql.ast.Ast

Ast.exprLit(int32Value(1)) // expr.lit
RCHowell marked this conversation as resolved.
Show resolved Hide resolved
----

==== Custom Nodes

Additionally, you can extend the abstract base factory and use it in builders as well as the DSL. This gives you full
control over how your nodes are instantiated. If you are ambitious, you can implement your own versions of AST node interfaces and implement a base factory. This
will allow you to create custom behaviors. For example, generated equals functions do not consider semantics. Perhaps
we want to improve how we compare nodes? Here's an example that considers the case-sensitivity of identifiers.

.Custom Node and Factory Example
[source,kotlin]
----
public abstract class MyFactory : AstBaseFactory() {

override fun identifierSymbol(symbol: String, caseSensitivity: Identifier.CaseSensitivity): Identifier.Symbol {
return ComparableIdentifier(_id(), symbol, caseSensitivity)
}
}

class ComparableIdentifier(
override val _id: String,
override val symbol: String,
override val caseSensitivity: Identifier.CaseSensitivity,
) : Identifier.Symbol {

// override copy, children

override fun equals(other: Any?): Boolean {
if (other == null || other !is Identifier.Symbol) return false // different type
if (other === this) return true // same object
return when (caseSensitivity) {
Identifier.CaseSensitivity.SENSITIVE -> this.symbol == other.symbol
Identifier.CaseSensitivity.INSENSITIVE -> this.symbol.lowercase() == other.symbol.lowercase()
}
}
}
----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MyFactory with ComparableIdentifier a good example for this facility (ability to define alternative AST factories)? In this sense: in our PartiQL implementation we deal with case sensitivity, but not in the way presented here; so why is this being (implicitly) recommended to others?

I might be touching a sensitive design issue by asking this question, but not sure which one. Could this be whether it is even useful to offer the ability of defining alternative AST factories?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this as an example because we had discussed this exact limitation last fall. This example shows how the current design is able to address situations in which generated sources do not handle edge cases.

I believe it is useful to provide customers with the facility to produce their own concrete implementations of AST nodes (if they want, hence a default). Here we are indeed eating our own dogfood while overcoming basic limitations that are trivial when hand-writing ASTs. The default factory is our default factory, but not the default factory.

Use of [the abstract factory] pattern enables interchangeable concrete implementations without changing the code that uses them, even at run. wikipedia

Is your primary concern that this is a bad example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My foremost concern is understanding what ready-to-roll uses we have for the significant extra complexity that Abstract Factory pattern introduces. Not nice-to-haves that a future customer might find useful, but 2-3 usecases (that is, more than 1!) that we intend to roll out ourselves -- otherwise we won't prove the dog food is edible.

The prototypical usecase for Abstract Factory is making it possible to develop a GUI application, such as a game, while abstracting away from a specific windowing toolkit, so that the application can run with with the look and feel of any toolkit for which someone would provide a concrete factory.

PartiQL AST (the interfaces in partiql-ast/org.partiql.ast.Types) is the analog of a "windowing toolkit abstraction", while PartiQL parser and PartiQL compiler are analogues of a "GUI game". What would be useful analogues of the concrete windowing toolkits that we are going to roll out as alternatives to the one concrete AST in partiql-ast/org.partiql.ast.impl.Types ?

I can vaguely imagine something on a scale of an AST variant that contains something like instrumentation to count uses of nodes during compilation or optimization transformations -- but I don't have a defensible case for this and don't recall us planning for anything of a kind. The case-sensitivity example is not convincing because it's something we should just provide (or not!) in the default AST implementation -- case sensitivity ought to be part of the PartiQL AST contract (or not! -- taken care of outside the AST, as currently).

To put it differently, what do we give up if org.partiql.ast.Types and org.partiql.ast.impl.Types are one and the same thing (just concrete classes, no interfaces)?

This probably requires a discussion more sprawling than can fit in PR comments...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware of the pattern background, and I see your concern now. The motivation for this was to have the flexibility of hand-written AST nodes, but the convenience of generated ones. This pattern enables us to get everything generated, but hand-write when we desire. The default GUI AST toolkit is what we provide, but customers can provide their own look and feel nodes when desired/required.

As you'll see in the PIG AST and the Physical Plan. Various mechanisms have been added for customer extension such as the metadata <string, object> map as well as the impl string on the partiql_physical domain (which is a point of customer extension, but is also related to the "physical"-ness of the domain).

My effort here was to provide a more powerful mechanism for both ourselves and customers to extend AST nodes. That is, they can use the generated identifiers to store metadata in a richer-typed structure than the object map (see SourceLocations) for an example. And they can handwrite implementations of our AST node interfaces.

My opinion is that, if I could go back in time, the AST would be internal and hand-written. It is the driving cause of major-version churn due to its constant evolving and surface area, as we as its inflexibility due to the generator limitations. I don't this PR solves that issue, but it gives us some better defaults and flexibility. If the AST were internal, then the separation of interfaces and indirection of instantiation would be superfluous. But I found this pattern to be a reasonable compromise given our situation.


=== DSL Usage

The DSL is useful from Kotlin and is some syntax sugar over fluent builders. Here is how its used:

.Default Factory DSL Example
[source,kotlin]
----
import org.partiql.ast.builder.ast

// Tree for PartiQL `VALUES (1, 2)`
ast {
RCHowell marked this conversation as resolved.
Show resolved Hide resolved
exprCollection(Expr.Collection.Type.VALUES) {
values += exprLit(int32Value(1))
values += exprLit(int32Value(2))
}
}

// Tree for `SELECT a FROM T`
ast {
exprSFW {
select = selectProject {
items += selectProjectItemExpression {
expr = exprVar {
identifier = identifierSymbol("a", Identifier.CaseSensitivity.INSENSITIVE)
scope = Expr.Var.Scope.DEFAULT
}
}
}
from = fromValue {
expr = v(symbol)
type = From.Value.Type.SCAN
}
}
}
----

.Fancier DSL Usage
[source,kotlin]
----
import org.partiql.ast.builder.ast
import org.partiql.ast.builder.AstBuilder

// define some helpers
private fun AstBuilder.select(vararg s: String) = selectProject {
s.forEach {
items += selectProjectItemExpression(v(it))
}
}

private fun AstBuilder.table(symbol: String) = fromValue {
expr = v(symbol)
type = From.Value.Type.SCAN
}

private fun AstBuilder.v(symbol: String) = this.exprVar {
identifier = id(symbol)
scope = Expr.Var.Scope.DEFAULT
}


// Tree for `SELECT x, y, z FROM T`

ast {
exprSFW {
select = select("x", "y", "z")
from = table("T")
}
}
----

.Custom Factory DSL Example
[source,kotlin]
----
import org.partiql.ast.builder.ast

// This will instantiate your custom `ComparableIdentifier`. Nice!
ast(myFactory) {
exprSFW {
select = select("x", "y", "z")
from = table("T")
}
}
----

IMPORTANT: The last examples works because the DSL block closes over the factory with an AstBuilder. This means that
the helper extensions or any DSL usage will use the provided factory!

=== Builder Usage

The DSL is not much more than Kotlin syntactic sugar over traditional fluent-builder classes. If you are coming from Java, these will be useful.
Every node defines a static `builder()` function. Keeping with the previous example, let's see how we can inject our custom
factory.

[source,kotlin]
----
// instance of default IdentifierSymbolImpl
val a = Identifier.Symbol.builder()
.symbol("HELLO")
.caseSensitivity(Identifier.CaseSensitivity.INSENSITIVE)
.build() // empty, build with default factory

// instance of ComparableIdentifier
val b = Identifier.Symbol.builder()
.symbol("hello")
.caseSensitivity(Identifier.CaseSensitivity.INSENSITIVE)
.build(myFactory) // nice!

assert(b == a) // TRUE
assert(a == b) // !! FALSE !! consider always using the same type of factory
----

== Visitor and Rewriter

The PartiQL AST is a set of interfaces, so how might we extend these for our own purposes? We do not have pattern matching in Kotlin/Java, so we use the visitor pattern.

The visitor pattern is effectively adding methods to each object with some compile safety. You define a behavior and use the node `accept` the behavior. The visitor provides an additional parameter `ctx: C` which is the equivalent of arguments to each method for your behavior.

[source,kotlin]
----
public abstract class AstBaseVisitor<R, C> : AstVisitor<R, C> {

public override fun visit(node: AstNode, ctx: C): R = node.accept(this, ctx)

public open fun defaultVisit(node: AstNode, ctx: C): R {
for (child in node.children) {
child.accept(this, ctx)
}
return defaultReturn(node, ctx)
}

public abstract fun defaultReturn(node: AstNode, ctx: C): R
}
----

For example, let's implement a `toString(case: Case)` function on some basic nodes.

[source,kotlin]
----
//
// Usage:
// node.accept(ToString, Case.UPPER) // ~ node.toString(Case.UPPER)
RCHowell marked this conversation as resolved.
Show resolved Hide resolved
//
object ToString : AstBaseVisitor<String, Case>() {

override fun defaultVisit(node: AstNode, ctx: Case) = defaultReturn(node, ctx)

override fun defaultReturn(node: AstNode, ctx: Case): String = when (ctx) {
Case.UPPER -> node::class.simpleName.uppercase()
Case.LOWER -> node::class.simpleName.lowercase()
Case.PASCAL -> node::class.simpleName
Case.SNAKE -> snakeCaseHelper(node::class.simpleName)
RCHowell marked this conversation as resolved.
Show resolved Hide resolved
}

// Any other overrides you want!
}
----


=== Folding

Folding is straightforward by using either mutable context or an immutable accumulators. The structure you fold to is
entirely dependent on your use case, but here is a simple example with a mutable list that you can generalize. Often times you may fold to an entirely new domain — or fold to the same domain which we'll cover in the rewriter.

.Example "ClassName" Collector
[source,kotlin]
----
// Traverse the tree collecting all node names
object AstClassNameCollector {

// Public static entry for Java style consumption
@JvmStatic
fun collect(node: AstNode): List<String> {
val acc = mutableListOf<String>()
node.accept(ToString, acc) // recall, we have node.toString()
RCHowell marked this conversation as resolved.
Show resolved Hide resolved
return acc
}

// Private implementation
private object ToString : AstBaseVisitor<String?, MutableList<String>>() {

override fun defaultVisit(node: AstNode, ctx: MutableList<String>): String? {
node.children.forEach { child -> child.accept(this, ctx) } // traverse
defaultReturn(this, ctx)?.let { ctx.add(it) }
}

override fun defaultReturn(node: AstNode, ctx: MutableList<String>) = node::class.simpleName

// Any other overrides you want!
}
}
----

=== Rewriter

See `org.partiql.ast.util.AstRewriter`. This class facilitates rewriting an AST; you need only override the relevant methods for your rewriter.

=== Tips

- Each `visit` is a function call; adding state to a visitor is akin to global variables. _Consider keeping state in the context parameter_. This is beneficial because you state is naturally scoped via the call stack.
- Sometimes state in a visitor makes an implementation much cleaner (go for it!). Just remember that the visitor might not be re-usable or idempotent.
- Consider using singletons/objects for stateless visitors
- Consider making your visitors private with a single public static entry point.
- When you make a private visitor, you can rename the ctx parameter to something relevant. Use the `Suppress("PARAMETER_NAME_CHANGED_ON_OVERRIDE")` to make the linter to relax.
- If writing and using Kotlin, consider adding an extension method to the base class. This _really_ makes it look like you've opened the classes (but really it's just a static method).

=== Understanding Visitors

I believe Robert Nystrom captured the misunderstanding of visitors quite well:

[quote]
____
The Visitor pattern is the most widely misunderstood pattern in all of Design Patterns, which is really saying something when you look at the software architecture excesses of the past couple of decades.

The trouble starts with terminology. The pattern isn’t about “visiting”, and the “accept” method in it doesn’t conjure up any helpful imagery either. Many think the pattern has to do with traversing trees, which isn’t the case at all. We are going to use it on a set of classes that are tree-like, but that’s a coincidence. As you’ll see, the pattern works as well on a single object.

The Visitor pattern is really about approximating the functional style within an OOP language. It lets us add new columns to that table easily. We can define all of the behavior for a new operation on a set of types in one place, without having to touch the types themselves. It does this the same way we solve almost every problem in computer science: by adding a layer of indirection.

-- Robert Nystrom, Crafting Interpreters
____

Additionally, see how the wiki page explicitly mentions pattern matching. Kotlin is interesting because we have something _like_ pattern matching, but the PartiQL AST library is intended for consumption from both Kotlin and Java.

[quote]
____
A visitor pattern is a software design pattern and separates the algorithm from the object structure. Because of this separation new operations can be added to existing object structures without modifying the structures. It is one way to follow the open/closed principle in object-oriented programming and software engineering.

In essence, the visitor allows adding new virtual functions to a family of classes, without modifying the classes. Instead, a visitor class is created that implements all of the appropriate specializations of the virtual function. The visitor takes the instance reference as input, and implements the goal through double dispatch.

Programming languages with sum types and pattern matching obviate many of the benefits of the visitor pattern, as the visitor class is able to both easily branch on the type of the object and generate a compiler error if a new object type is defined which the visitor does not yet handle.

https://en.wikipedia.org/wiki/Visitor_pattern
____
Loading