Skip to content

Operators Additional Analysis

Paul Rogers edited this page Sep 4, 2022 · 2 revisions

Very rough notes on moving the operator conversion ahead from the initial prototypes.

Operator Conversion

Goals:

  • Replace QueryRunner with native query "planners" (first draft).
  • Replace Sequence with operators.
  • Try to leave the rest of the structure unchanged. That is, walk the QueryRunner hierarchy, but divert calls one-by-one to the above. Operators are wrapped in sequences to allow interoperation of both systems.

Query types:

  • Scan
    • Simple. Well under way. Must test in server environment.
  • Timeseries
    • First steps in Broker. Do tests and historicals.

Native Query "Planner"

This area will be the hardest to untangle.

Current Design

See Query Lifecycle for the current design. Challenges include:

  • Multiple query runner factories & tool chests.
  • Multiple segment walkers which correspond to multiple execution environments.
  • Multiple query processing pool implementations.

Revised Design

Goals:

  • Convert a native query to an operator structure in all environments:
    • Tests
    • Broker
    • Historical
    • Real-time
  • Core set of capabilities, operators, row formats, fragment runners, etc.
  • Query-specific "query kit" translator.
  • Eventually, direct SQL-to-operator translation (via a physical plan)

Crude steps:

  • Leaving all else unchanged, convert QueryRunners to use operators.
  • Base conversion structure (whatever that turns out to be. Maybe from op-step1 branch?).
    • Segment resolution
    • Overall scatter/gather structure
    • Query-specific bits
  • Query-specific translators.
    • Translate the native query to a logical plan. (Query-specific with common helpers.)
    • Analyze the logical plan to produce a physical plan. (Mostly independent of query type, perhaps with helpers?)
    • Translate the physical plan to use operators. (Hopefully mechanical.)

A key question is whether there is one translator, with subclasses for each query type (the existing model, more-or-less), or query-type specific translators, that use common components for shared bits. The hunch at present is to prefer the latter, since each query type is pretty complex.

Approach

General thought: spiral outward. Once a large enough section is "wrapped", refactor to make it cleaner.

  • Individual QueryRunners.
  • QuerySegmentWalkers. Harder because the API must change, and we want to factor out the segment stuff from the query planning stuff.
  • QueryToolChests. Individual functions change. Ideally, by now, there are "query planner" stubs for each.
  • QueryRunnerFactorys. Evolve to create a native query planner. Or, create a new class and put into the same registry somehow.

Changing this part of the stack will break backward compatibility: extensions will no longer work since the underlying mechanism changes. So, a goal is to preserve the old mechanism and run the new one in parallel. That was done for QueryRunners, thought is needed for how to do it for other areas. The split might be that if a query class provides a new "factory", then use it, else go the old route.

Possible approach:

  • Convert scan and timeseries queries to use operators.
    • Query profile shows no "holes".
    • Ideally have a set of tests for the various native query variations.
  • Create a native query translator for the above, leaving others unchanged.
    • QueryLifecycle.execute() chooses the path.
    • Materialize the key bits of the query rather than the current system of lambdas, etc. (Doing so will break APIs.)
  • Convert group query & widen operators (async) and query translators
  • Tackle other query types
    • ScanQuery (mostly done)
    • TimeseriesQuery (in progress)
    • SegmentMetadataQuery
    • GroupByQuery
    • DataSourceMetadataQuery
    • SearchQuery
    • MovingAverageQuery
    • TimeBoundaryQuery
    • TopNQuery

The query translators must also work step-by-step.

  • Isolate the segment-related bits.
  • Refactor the segment walkers to assist with this bit, keeping QueryToolChest.
  • Pull logic out of QueryToolChest into the native planner.
  • Goal: native planner produces the operator set.

Row Structure

Current Design

Druid uses a variety of row formats. Each query type seems to have its own representation.

  • Scan: works with data in batches (ScanResultValue) of (typically 20K) rows.
    • "Compact arrays": each row is an Object[], batch has a schema header.
    • Map: ?
    • Array: ?
  • Timeseries: individual rows wrapped in an accessor: Result<TimeseriesResultValue>. Each value is a row of various forms, typically a Map.
  • ...

Because of the way native queries work, it is not possible to predict the schema ahead of time. Queries reference columns (but in various objects, not in a row schema), and each segment may have different sets of columns, each with differing types. Thus, each QueryRunner tends to "discover" the schema of a row on the fly. Column references tend to be by name (on every row.)

Revised Design

The goal is to move to row and batch structure which is more efficient.

  • Multiple storage formats.
    • Native formats, for use on ingest.
    • Java array format, for temporary use during transition.
    • Perhaps exiting formats, wrapped for use in transition.
    • Frames, or some variation, after MSQ integration.
  • Rows arrive in batches to minimize overhead.
  • Columns can be indexed by position, for efficiency.
  • Batches carry a schema, to allow name-to-index translation.

Cursor and Storage Adapter

Current Design

Revised Design

Goals:

  • Split the design into "metadata" and "runtime" components.
  • New design can be a wrapper over the existing design.
  • Segment-like behavior is an add-on. Basic behavior is as a supplier of rows with source-specific capabilities.

Test Framework

Druid has a robust SQL query framework, but it seems nothing similar exists on the native side. SQL does not map to all possible native query variations, leaving some areas uncovered. To address this, we will need a comprehensive set of native tests.

  • Tests are defined in files, perhaps using the proposed "planner test framework" format.
  • Test includes a native query, expected results, and metadata.
  • Same tests can be run as ITs, as a test against a local server, or (eventually) single-server.
Clone this wiki locally