-
Notifications
You must be signed in to change notification settings - Fork 0
Operators Additional Analysis
Very rough notes on moving the operator conversion ahead from the initial prototypes.
Goals:
- Replace
QueryRunner
with native query "planners" (first draft). - Replace
Sequence
with operators. - Try to leave the rest of the structure unchanged. That is, walk the
QueryRunner
hierarchy, but divert calls one-by-one to the above. Operators are wrapped in sequences to allow interoperation of both systems.
Query types:
- Scan
- Simple. Well under way. Must test in server environment.
- Timeseries
- First steps in Broker. Do tests and historicals.
This area will be the hardest to untangle.
See Query Lifecycle for the current design. Challenges include:
- Multiple query runner factories & tool chests.
- Multiple segment walkers which correspond to multiple execution environments.
- Multiple query processing pool implementations.
Goals:
- Convert a native query to an operator structure in all environments:
- Tests
- Broker
- Historical
- Real-time
- Core set of capabilities, operators, row formats, fragment runners, etc.
- Query-specific "query kit" translator.
- Eventually, direct SQL-to-operator translation (via a physical plan)
Crude steps:
- Leaving all else unchanged, convert
QueryRunner
s to use operators. - Base conversion structure (whatever that turns out to be. Maybe from
op-step1
branch?).- Segment resolution
- Overall scatter/gather structure
- Query-specific bits
- Query-specific translators.
- Translate the native query to a logical plan. (Query-specific with common helpers.)
- Analyze the logical plan to produce a physical plan. (Mostly independent of query type, perhaps with helpers?)
- Translate the physical plan to use operators. (Hopefully mechanical.)
A key question is whether there is one translator, with subclasses for each query type (the existing model, more-or-less), or query-type specific translators, that use common components for shared bits. The hunch at present is to prefer the latter, since each query type is pretty complex.
General thought: spiral outward. Once a large enough section is "wrapped", refactor to make it cleaner.
- Individual
QueryRunner
s. -
QuerySegmentWalker
s. Harder because the API must change, and we want to factor out the segment stuff from the query planning stuff. -
QueryToolChest
s. Individual functions change. Ideally, by now, there are "query planner" stubs for each. -
QueryRunnerFactory
s. Evolve to create a native query planner. Or, create a new class and put into the same registry somehow.
Changing this part of the stack will break backward compatibility: extensions will no longer work since the underlying mechanism changes. So, a goal is to preserve the old mechanism and run the new one in parallel. That was done for QueryRunner
s, thought is needed for how to do it for other areas. The split might be that if a query class provides a new "factory", then use it, else go the old route.
Possible approach:
- Convert scan and timeseries queries to use operators.
- Query profile shows no "holes".
- Ideally have a set of tests for the various native query variations.
- Create a native query translator for the above, leaving others unchanged.
-
QueryLifecycle.execute()
chooses the path. - Materialize the key bits of the query rather than the current system of lambdas, etc. (Doing so will break APIs.)
-
- Convert group query & widen operators (async) and query translators
- Tackle other query types
-
ScanQuery
(mostly done) -
TimeseriesQuery
(in progress) SegmentMetadataQuery
GroupByQuery
DataSourceMetadataQuery
SearchQuery
MovingAverageQuery
TimeBoundaryQuery
TopNQuery
-
The query translators must also work step-by-step.
- Isolate the segment-related bits.
- Refactor the segment walkers to assist with this bit, keeping
QueryToolChest
. - Pull logic out of
QueryToolChest
into the native planner. - Goal: native planner produces the operator set.
Druid uses a variety of row formats. Each query type seems to have its own representation.
- Scan: works with data in batches (
ScanResultValue
) of (typically 20K) rows.- "Compact arrays": each row is an
Object[]
, batch has a schema header. - Map: ?
- Array: ?
- "Compact arrays": each row is an
- Timeseries: individual rows wrapped in an accessor:
Result<TimeseriesResultValue>
. Each value is a row of various forms, typically aMap
. - ...
Because of the way native queries work, it is not possible to predict the schema ahead of time. Queries reference columns (but in various objects, not in a row schema), and each segment may have different sets of columns, each with differing types. Thus, each QueryRunner
tends to "discover" the schema of a row on the fly. Column references tend to be by name (on every row.)
The goal is to move to row and batch structure which is more efficient.
- Multiple storage formats.
- Native formats, for use on ingest.
- Java array format, for temporary use during transition.
- Perhaps exiting formats, wrapped for use in transition.
- Frames, or some variation, after MSQ integration.
- Rows arrive in batches to minimize overhead.
- Columns can be indexed by position, for efficiency.
- Batches carry a schema, to allow name-to-index translation.
Goals:
- Split the design into "metadata" and "runtime" components.
- New design can be a wrapper over the existing design.
- Segment-like behavior is an add-on. Basic behavior is as a supplier of rows with source-specific capabilities.
Druid has a robust SQL query framework, but it seems nothing similar exists on the native side. SQL does not map to all possible native query variations, leaving some areas uncovered. To address this, we will need a comprehensive set of native tests.
- Tests are defined in files, perhaps using the proposed "planner test framework" format.
- Test includes a native query, expected results, and metadata.
- Same tests can be run as ITs, as a test against a local server, or (eventually) single-server.