Operators Development Plan

Jump to bottom

Paul Rogers edited this page Oct 5, 2022 · 5 revisions

Plan

PR Plan

Basics
- Operator framework: excludes actual query code
- Scan query
Timeseries query
Other queries
Insert physical plan between query runner and operators
Convert query runners to a planner: f(query, metadata) -> physical plan
Optimize away the ad-hoc bits: move toward a more typical row format
Abstract out the HTTP protocol. Add Gian's new network protocol.
Introduce multiple tiers. Requires new server type.
Push rework below the Cursor level: revisit storage adapters
- Adapters assume that Druid will do the work
- Segment adapter negotiates push-down of operations it can handle
- CSV, etc. are simple layers; they don't try to simulate segments

First PR Plan

Goal:

Broaden discussion. (There will likely be much resistance this first go-round.)
Establish a toehold.
Make the concept more concrete to other devs.

Tasks:

Rebase op-step3 on master.
Fix merge issues.
Run a test run with operators enabled.
DruidUnionRel
- As designed, runs independent queries, then concats results
- Change to have a root segment with a union operator, with child frags (each frag has a separate query)
Test on Broker, historical pair.
- Ingest data
- Run queries in old mode & verify.
- Run queries in op mode & verify.
Perhaps find the benchmarks and try those.

Investigations

Prototype for group-by query

Other Tasks

Producer/consumer queue for the scatter/gather operator
Unified row format, perhaps based on frames

Completed

Early prototypes: scan query, timeseries query
Internal discussions around merit