-
Notifications
You must be signed in to change notification settings - Fork 0
Calcite Notes
See Glossary
for concepts, and the set of patterns which Calcite uses.
-
RexNode
: scalar expression -
RelNode
: Relational algebra node, AKA relational operator. - Adapter:
- A custom schema is created with a
SchemaFactory
. - How is the schema factory registered from code?
- The schema returns an instance of
Table
. If updatable, thenModifiableTable
. - Also, to push operations,
FilterableTable
orProjectableFilterableTable
. - Table gives rise to
TableScan
. - Optimization done via planner rules.
- Each engine implements its own variation of the operators.
- Operators have "conventions" which is the calling convention. The built-in convention is the enumerable convention.
See slide 23 of this presentation:
- SQL is parsed to 'SqlNode'. A converter produces
RelNode
s.
Type inference happens in SqlValidatorImpl
. The validator walks the tree to, among other things, validate the type of each node. Basics:
- The planner invoked the
CanclitePlanner.validate()
method. - The validator invokes
validateQuery()
to walk theSELECT
nodes. - If our expression is in the
SELECT
list (the most common case), thenvalidateSelectItem()
does the work. -
deriveType()
computes the type of that one item. - Most expressions are functions. The
accept()
method of the function does the function-specific validation. - The
DeriveTpyeVisitor
drives the process by validating operands. The visitor itself is not recursive: a different instance is used for each expression level. - Each function has a
deriveType
implementation which invokesvaldateOperands
and theninferReturnType
.
The result is that each function is responsible for its own type inference. Along the way:
- The validator builds up a map from node to type. The recursive process will repeatedly ask the type of a node. The validator uses the cache to avoid running the type inference multiple times. Note that this means that node type is not an attribute of the
SqlNode
, it is kept in a separate work space. -
checkOperandTypes
checks if an operand can be cast from its actual type to the desired argument type. - While one might expect
deriveType()
to build up a list of operand types, it does so indirectly: first putting the types in the cache, then pulling them out again. -
deriveType()
says that "type coercion" is disabled "for builtin operator operands". Since "they are handled by the TypeCoercion specifically". -
deriveType()
sets the operator on the call, then lets the call validate the operands. - An elaborate mechanism exists to infer types, based on a set of rules. For plus, the rule in question is
ReturnTypes.LEAST_RESTRICTIVE
, based on a call toSqlTypeFactory.leastRestrictive()
, which computes the widest of the operand types.
Each SqlCall
provides a type inference algorithm in the form of SqlOperandTypeInference
. InferTypes
provides some standard forms: FIRST_KNOWN
, RETURN_TYPE
, etc. Functions (presumably operators) can provide a custom version. Oddly, the addition operator uses FIRST_KNOWN
. (Question, when is widening done?)
Setup:
- A Calcite
Schema
can hold aFunction
, which may be a compile-time function (AKA "macro".) - A
TableMacro
is a kind ofFunction
that produces aTranslatableTable
. - The parser reduces the table call to a
SqlStdOperatorTable.COLLECTION_TABLE.createCall()
, which isSqlCollectionTableOperator
. - On the other hand,
EXPLICIT_TABLE
"allows an identifier to be explicitly flagged as a table, For example,select * from (TABLE t)
orTABLE t
." (If used, theTABLE
prefix form can't literally be used? Would clash with current usage?) - Thus, the parser invokes
SqlCollectionTableOperator
to create the table function call. At this point, all the parser has is names and operands. -
ExternalOperatorConversion
is the bridge from function name to function definition (e.g.ExternalOperator
.) It is registered viaSqlBindings.addOperatorConversion()
in a Guice module. - The
Set
ofSqlOperatorConversion
items is injected into theDruidOperatorTable
.
In validate()
step:
-
SqlValidatorImpl.performUnconditionalRewrites
looks up the function by name using the operator table, with the result as aSqlBasicCall
. -
DruidOperatorTable.lookupOperatorOverloads
looks up the function (by name). Gets theSqlOperatorConversion
from the definition by callingcalciteOperator()
, which becomes the result above. (There is only one overload in this case.) - Above is repeated a second time during validation.
-
SqlUtil.lookupSubjectRoutesByName
does the second call, and filters by type which, for our use, isSqlSyntax
of eitherFUNCTION
orSPECIAL
. -
ProcedureNamespace.validate()
callsvaliateImpl()
which callsgetTable()
on the UDF (theExternalOperatorConversion
) to get a ? on which it then callsgetRowType()
. Here, it seems each procedure defines a "namespace", of nameCATALOG
. -
ExternalOperatorConversion.getTable()
is in the base class. It callsSqlUserDefinedTableMacro.convertArguments()
to get the list of arguments, which are computed from a call toExternalTableMacro.getParameters()
compared against the actual operand list. -
ExternalOperatorConversion.getTable
: “Returns the table in this UDF, or null if there is no table.” The rest is details of how this is done by default. -
convertArguments()
converts arguments to Java format. At this point, there is a 1:1 mapping between actual arguments and parameter definitions. The type of the definition is used to coerce the actual argument to the desired type. The list of parameters is obtained fromExternalTableMacro
. -
ExternalOperatorConversion.getTable()
after converting the arguments, callsExternalTableMacro.apply()
to “apply” the arguments and return aTranslatableTable
table. That is, we go from call + arguments to a table node. In this case, aDruidTable
.
Once we have the Druid table, then the table resolution process kicks in.
ExternalTableMacro
-
ExternalTableScan
- Rel node for the table scan. Holds the Druid table and JSON mapper. -
ExternalTableScanRule
- Converts theExternalTableScan
rel node into aDruidQueryRel
. -
ExternalDataSource
- Attached to aDruidTable
to define the external data source.
From Drill:
-
SqlKind.COLLECTION_TABLE
is said to supportTABLE(udx(x, y, z))
, which is close to what we want. - Specifically,
TABLE(<file name>(<pluginProp> => <value>, ...))
- Drill has a way have "udx" above reference a format plugin.
- Used in
DrillTableInfo
to do the parameter/arg conversion. - Inside, the only argument must be a
SqlUserDefinedTableMacro
, with the first argument as an identifier. - Can also be an
SqlKind.IDENTIFIER
directly. - From plugins, get the list of fields, convert to args, use plugin name (?) to define a table within the proper schema namespace.
- This means that the set of arguments is known ahead of time: by inspecting the format plugin.
Other notes:
-
OperandTypes.VARIADIC
: "Operand type-checking strategy for an operator which takes no operands." -
TranslatableTable
: "Extension to Table that specifies how it is to be translated to a relational expression. It is optional for a Table to implement this interface. If Table does not implement this interface, it will be converted to an EnumerableTableScan. Generally a Table will implement this interface to create a particular subclass of RelNode, and also register rules that act on that particular subclass of RelNode." - The operator's
deriveType()
method is called withSqlValidator
andSqlValidatorScope
, which is used to look up names, such as table names.
Calcite provides a planner, Planner
and PlannerImpl
which appears more of an example. Druid works around the fact that the validator is not visible, but has lots of useful information. Drill, for example, incorporates the functionality .
Calcite provides this table syntax, but the syntax seems incomplete relative to the actual parser.
Calcite provides a way to extend the table ref syntax.
From Calcite docs:
tableReference:
tablePrimary
[ FOR SYSTEM_TIME AS OF expression ]
[ pivot ]
[ unpivot ]
[ matchRecognize ]
[ [ AS ] alias [ '(' columnAlias [, columnAlias ]* ')' ] ]
tablePrimary:
[ [ catalogName . ] schemaName . ] tableName
'(' TABLE [ [ catalogName . ] schemaName . ] tableName ')'
| tablePrimary [ hintComment ] [ EXTEND ] '(' columnDecl [, columnDecl ]* ')'
| [ LATERAL ] '(' query ')'
| UNNEST '(' expression ')' [ WITH ORDINALITY ]
| [ LATERAL ] TABLE '(' [ SPECIFIC ] functionName '(' expression [, expression ]* ')' ')'
Inferred syntax from Parser.jj
:
FromClause: FROM <TableRef> ...
TableRef:
<TableRefDetails>
[ <Pivot> ] [ <Unpivot> ]
[ [ AS ] SimpleIdentifier [ ParenthesizedSimpleIdentifierList ] ]
[ TABLESAMPLE ... ]
TableRefDetails:
<TableRefWithHintsOpt> <TableOverOpt>
| [LATERAL] <ParenthesizedExpression> <TableOverOpt>
| UNNEST <ParenthesizedQueryOrCommaList>
| [LATERAL] TABLE '(' <TableFunctionCall> ')'
| <ExtendedTableRef>
TableRefWithHintsOpt:
<CompoundTableIdentifier> [ <CommaSeparatedSqlHints> ]
CompoundTableIdentifier:
<TableIdentifierSegment> ( . <TableIdentifierSegment> )*
The AST is that produced by the Calcite SQL planner. "Rel" (relational element) nodes are a set of relational operators and related items on the path to a physical query plan. The CalcitePlanner
(Druid's version of PlannerImpl
), rel(.)
method does the deed.
- The
SqlToRelConverter
class does the conversion. - It is parameterized with the validator, the catalog reader, the config and a few other items.
- The
convertQuery()
method takes the validated SQL node and produces the root rel. - A few follow-on operations occur.
In the planner test framework the rel returned from conversion is visible in the plan
section of a test case.