Skip to content

Calcite Table Functions

Paul Rogers edited this page Nov 9, 2022 · 4 revisions

Planning Flow

Startup

  • MSQSqlModule provides a binding to ExternalOperatorConversion:
    SqlBindings.addOperatorConversion(binder, ExternalOperatorConversion.class);
  • ExternalOperator is created via Guice. It is not registered in Guice, it is just created as needed, which is...
  • ExternalOperatorConversion is created via Guice, passing in the ExternalOperator instance.
  • ExternalOperatorConversion holds an instance of SqlOperator, specifically SqlUserDefinedTableMacro, which is created via the ExternalOperatorConversion (called from Guice).
  • In this case, the Druid-specific class is ExternalOperator, which extends SqlUserDefinedTableMacro.
  • The ExternalOperator constructor causes the parameters to be created so they can be passed to the super constructor.
  • The ExternalOperator macro is given the ExternalTableMacro instance, and calls ExternalTableMacro.getParameters() to get the list of parameters.

Relationships:

CalcitePlannerModule      MSQSqlModule
      |                        |
    Guice                    Guice                                             Guice
      |                        |                                                 |
DruidOperatorTable o-- ExternalOperatorConversion o-- ExternalOperator o-- ExternalTableMacro
                               |                            |                    |
                               v                            v                    v
                        SqlOperatorConversion     SqlUserDefinedTableMacro   TableMacro
                                                            |
                                                            v
                                                       SqlFunction

This means:

  • ExternalOperatorConversion are statically defined, via Guice.
  • Each ExternalOperatorConversion holds onto the Calcite operator, in this case, ExternalOperator extends SqlUserDefinedTableMacro.
  • So, ExternalOperator is also a singleton, created at startup.
  • ExternalOperator is an operator definition, which holds onto a ExternalTableMacro, which is also a definition, in its tableMacro field.
  • The ExternalTableMacro parameters are created once, via the Guice-created instance.

It is not clear why ExternalTableMacro is created via Guice, other than for completeness. It is only ever used by ExternalOperatorConversion and probably could have been created directly within the constructor.

AST

Resolution

  • BaseDruidSqlValidator extends SqlValidatorImpl validateNamespace(.) calls
  • ProcedureNamespace.validateImpl(.) which special cases SqlUserDefinedTableMacro
  • The special case calls udf.getTable(.) where udf is the ExternalOperator extends SqlUserDefinedTableMacro instance.
  • getTable(.) retrieves the TableMacro tableMacro instance, in this case, ExternalTableMacro.
  • SqlUserDefinedTableMacro.getTable(.) calls convertArguments(.)
  • convertArguments() calls ExternalTableMacro extends TableMacro getParameters() (which creates another instance of the parameters.)
  • SqlUserDefinedTableMacro.getTable() then calls ExternalOperator extends SqlUserDefinedTableMacro apply(.) to apply the arguments.
  • The arguments are given as a list of Java objects which match up to the parameters by position. The values are coerced to Java types using the TypeFactory associated with the planner.
  • ExternalTableMacro.apply() grabs the three String arguments, converts the value to JSON, and returns an instance of ExternalTable that has an ExternalDataSource that holds the converted arguments.
  • The ExternalTable then becomes the "real" table referenced in the FROM clause.
  • ProcedureNamespace.validateImpl(.) then calls ExternalTable extends TranslatableTable getRowType() to get the row signature.

Basic structure:

Validator
   |
   | (calls)
   |
ProcedureNamespace
   |
   | (is given instance of)
   |
RelDataType o-- SqlUserDefinedTableMacro o-- TableMacro
   |                                             |
   |                                             | (creates)
   |                                             |
ProcedureNamespace                          ExternalTable o-- ExternalDataSource

Notes:

  • It would seem that we can create the ExternalTableMacro parameters once, and reuse them: no need to create them over and over.

Authorization

  • SqlResourceCollectorShuttlecalls gets theSqlOperatorfrom theSqlCall` node when walking the tree.
  • The SqlCall.getOperator() method returns the associated operator, here ExternalOperator.
  • After casting to AuthorizableOperator, the shuttle calls ExternalOperator.computeResources(.) to return the resource, which is EXTERNAL_RESOURCE_ACTION.

Conversion

  • SqlToRelConverter.convertCollectionTable(.) call obtains the SqlOperator from `SqlCall.getOperator().
  • The operator here is ExternalOperatorConversion extends SqlOperatorConversion.
  • convertCollectionTable(.) special-cases SqlUserDefinedTableMacro and again calls getTable().
  • getTable() repeats the process above: again creating the parameters and again creating an instance of ExternalTable.
  • convertCollectionTable(.) calls RelOptTableImpl.toRel(.) which calls ExternalTable.toRel(.).
  • ExternalTable.toRel(.) creates an ExternalTableScan instance to represent the scan.
  • ExternalTableScan.deriveRowType() again calls ExternalTable.getRowType() to convert the row type.

Questions:

  • Can the row type be cached in ExternalTable to avoid multiple converstions?
  • Can the ExternalTable be cached to avoid multiple conversions?

What?

(Something happened after the above.)

Optimization