forked from apache/druid
-
Notifications
You must be signed in to change notification settings - Fork 0
Calcite Table Functions
Paul Rogers edited this page Nov 17, 2022
·
4 revisions
-
MSQSqlModule
provides a binding toExternalOperatorConversion
:
SqlBindings.addOperatorConversion(binder, ExternalOperatorConversion.class);
-
ExternalOperator
is created via Guice. It is not registered in Guice, it is just created as needed, which is... -
ExternalOperatorConversion
is created via Guice, passing in theExternalOperator
instance. -
ExternalOperatorConversion
holds an instance ofSqlOperator
, specificallySqlUserDefinedTableMacro
, which is created via theExternalOperatorConversion
(called from Guice). - In this case, the Druid-specific class is
ExternalOperator
, which extendsSqlUserDefinedTableMacro
. - The
ExternalOperator
constructor causes the parameters to be created so they can be passed to the super constructor. - The
ExternalOperator
macro is given theExternalTableMacro
instance, and callsExternalTableMacro.getParameters()
to get the list of parameters.
Relationships:
CalcitePlannerModule MSQSqlModule
| |
Guice Guice Guice
| | |
DruidOperatorTable o-- ExternalOperatorConversion o-- ExternalOperator o-- ExternalTableMacro
| | |
v v v
SqlOperatorConversion SqlUserDefinedTableMacro TableMacro
|
v
SqlFunction
This means:
-
ExternalOperatorConversion
are statically defined, via Guice. - Each
ExternalOperatorConversion
holds onto the Calcite operator, in this case,ExternalOperator extends SqlUserDefinedTableMacro
. - So,
ExternalOperator
is also a singleton, created at startup. -
ExternalOperator
is an operator definition, which holds onto aExternalTableMacro
, which is also a definition, in itstableMacro
field. - The
ExternalTableMacro
parameters are created once, via the Guice-created instance.
Questions:
- Why
ExternalTableMacro
is created via Guice, other than for completeness. It is only ever used byExternalOperatorConversion
and probably could have been created directly within the constructor. The answer is probably theObjectMapper
required by the constructor. Changed to not create through Guice.
-
BaseDruidSqlValidator extends SqlValidatorImpl
validateNamespace(.)
calls -
ProcedureNamespace.validateImpl(.)
which special casesSqlUserDefinedTableMacro
- The special case calls
udf.getTable(.)
whereudf
is theExternalOperator extends SqlUserDefinedTableMacro
instance. -
getTable(.)
retrieves theTableMacro tableMacro
instance, in this case,ExternalTableMacro
. -
SqlUserDefinedTableMacro.getTable(.)
callsconvertArguments(.)
-
convertArguments()
callsExternalTableMacro extends TableMacro
getParameters()
(which creates another instance of the parameters.) -
SqlUserDefinedTableMacro.getTable()
then callsExternalOperator extends SqlUserDefinedTableMacro
apply(.)
to apply the arguments. - The arguments are given as a list of Java objects which match up to the parameters by position. The values are coerced to Java types using the
TypeFactory
associated with the planner. -
ExternalTableMacro.apply()
grabs the threeString
arguments, converts the value to JSON, and returns an instance ofExternalTable
that has anExternalDataSource
that holds the converted arguments. - The
ExternalTable
then becomes the "real" table referenced in theFROM
clause. -
ProcedureNamespace.validateImpl(.)
then callsExternalTable extends TranslatableTable
getRowType()
to get the row signature.
Basic structure:
Validator
|
| (calls)
|
ProcedureNamespace
|
| (is given instance of)
|
RelDataType o-- SqlUserDefinedTableMacro o-- TableMacro
| |
| | (creates)
| |
ProcedureNamespace ExternalTable o-- ExternalDataSource
Notes:
- It would seem that we can create the
ExternalTableMacro
parameters once, and reuse them: no need to create them over and over. (Done, useDruidTypeSystem.TYPE_FACTORY
as the type factory.)
- SqlResourceCollectorShuttle
calls gets the
SqlOperatorfrom the
SqlCall` node when walking the tree. - The
SqlCall.getOperator()
method returns the associated operator, hereExternalOperator
. - After casting to
AuthorizableOperator
, the shuttle callsExternalOperator.computeResources(.)
to return the resource, which isEXTERNAL_RESOURCE_ACTION
.
-
SqlToRelConverter.convertCollectionTable(.)
call obtains theSqlOperator
from `SqlCall.getOperator(). - The operator here is
ExternalOperatorConversion extends SqlOperatorConversion
. -
convertCollectionTable(.)
special-casesSqlUserDefinedTableMacro
and again callsgetTable()
. -
getTable()
repeats the process above: again creating the parameters and again creating an instance ofExternalTable
. -
convertCollectionTable(.)
callsRelOptTableImpl.toRel(.)
which callsExternalTable.toRel(.)
. -
ExternalTable.toRel(.)
creates anExternalTableScan
instance to represent the scan. -
ExternalTableScan.deriveRowType()
again callsExternalTable.getRowType()
to convert the row type.
Questions:
- Can the row type be cached in
ExternalTable
to avoid multiple conversions? (Yes, this works.) - Can the
ExternalTable
be cached to avoid multiple conversions? (Possibly not possible as coded, since the table is created from the table macro, which is a singleton. There is no place to hang a cached instance, that I can easily see.)
-
ExternalTableScan
callsExternalTable.getDataSource()
multiple times.
See UserDefinedTableMacroFunction
for details.
- The parser creates an instance of a
SqlCall
withExtendsOperator
as the operator. - The first argument to the above call is the table function, the second is the schema.
-
ExtendsOperator.rewriteCall(.)
gets the first argument, which must be an instance ofUserDefinedTableMacroFunction
. - It then calls
UserDefinedTableMacroFunction.rewriteCall(tableFnCall, schema)
wheretableFnCall
is aSqlBasicCall
to theUserDefinedTableMacroFunction
. -
UserDefinedTableMacroFunction.rewriteCall(.)
passes the schema into an ad-hoc copy of theInputTableMacro
, which now holds onto the schema for later use. - Calcite obtains the table macro from the macro function, so both are cloned.
- The above also creates a new call as an instance of
ExtendedCall
that also holds the schema, primarily for use inunparse(.)
. - From here on, the flow is like that described earlier.
Note that the need to make a copy of the macro provides an opportunity to cache the ExternalTable
instance.
To create a external-table like table function:
- Define a subclass of
TableMacro
to define function parameters and convert arguments to aTranslatableTable
. - Define a subclass of
SqlUserDefinedTableMacro
which defines the above macro. - Define a subclass of
SqlOperatorConversion
to hold the above function definition. - Add the above operator conversion to
MSQSqlModule
.
- Bridge is
ExternalTableSpec
: produced by aExternalTableDefn
, used to construct anExternalDatasource and
ExternalTable`. -
ExternalTableDefn.applyParameters()
converts from a resolved table and parameter map to aExternalTableSpec
.
To do:
- Need a way to do the above without a resolved table.
- When unparameterized, no merging. Instead, use table properties.
- Must validate the resulting properties.
- Easiest to create a
ResolvedTable
, but with properties from SQL. - Need to merge in the extends schema.
- Filter the list of properties to get the SQL arguments for "raw" function. Probably just a list of names, in preferred order.
- Custom macros must be defined statically, which means they need access to a table defn statically or via Guice.
- Code to translate Table Defn properties to parameters. Control ordering so pass-by-position is stable.
So:
- The
TableMacro
takes an injected table defn or registry. - The
TableMacro
converts the defn properties to Calcite parameters (once, statically, using a fixed type factory). -
TableMacro.apply(.)
converts positional args to a map (using param definitions) - Then uses the
TableDefn
to create aExternalTableDefn
. - From there, create an
ExternalTable
as in the existing code.
A dynamic table function is one that is defined via metadata, rather than via a static definition at compile time. In our case, we want external tables to appear as table function so the user can parameterize them.
Resolution:
-
CalcitePlanner
is Druid's version of the Calcite planner, with customizations. -
CalcitePlanner
defines the operator table to be used to resolve functions. -
CalcitePlanner
creates an instance ofChainedSqlOperatorTable
to hold both the usualDruidOperatorTable
and a representation of dynamic table functions. - The chained table also holds an instance of
CalciteCatalogReader
which can retrieve functions from a Calcite schema. -
BaseDruidSqlValidator extends SqlValidatorImpl
validateNamespace(.)
calls -
SqlValidatorImpl.validate(.)
callsvalidateScopedExpression(.)
to resolve our table function. - After several more steps, the above calls
ChainedSqlOperatorTable.lookupOperatorOverloads(.)
to find the function. -
ChainedSqlOperatorTable
first callsDruidOperatorTable.lookupOperatorOverloads(.)
. Of course, our table-specific function isn't found there. - Then,
ChainedSqlOperatorTable
callsCalciteCatalogReader.lookupOperatorOverloads(.)
to resolve. This code checks that the category isFUNCTION
and, in particular, a table function. Our reference is a table function. -
CalciteCatalogReader.getFunctionsFrom()
determines the schemas to use to resolve the function: both the current (default) schema, which isdruid
and the root. Since external tables reside in theext
schema, we must explicitly reference them this way:ext.myTable
. -
getFunctionsFrom(.)
, once it finds theext
schema, callsgetFunctions()
on that schema. - That schema is represented by a
CalciteSchema
which maps theSchema
class that Druid provides. -
CalciteSchema.getFunctions()
loads "implicit" functions from the schema by callingSchema.getFunctions(String name)
. - Druid schemas derive from the Druid
AbstractTableSchema
, which returns an empty list of functions by default. -
ExternalSchema
overrides this method to return aParameterizedTableMacro
for the table. This class extendsTableMacro
which matches the predicate thatgetFunctionsFrom(.)
uses to match functions. - Calcite wraps the macro in a
SqlUserDefinedTableMacro
. - Function processing then continues as described earlier.
Notes:
ChainedSqlOperatorTable
"implements the SqlOperatorTable
interface by chaining together any number of underlying operator table instances."
CalciteCatalogReader
is a "SqlOperatorTable
based on tables and functions defined schemas."
Issues:
- Function resolution happens twice. How can we cache the values?
- The visitor misses the dynamic function, thus omitting security checks for the external table.