Skip to content

Operators Development Plan

Paul Rogers edited this page Oct 5, 2022 · 5 revisions

Plan

PR Plan

  • Basics

    • Operator framework: excludes actual query code
    • Scan query
  • Timeseries query

  • Other queries

  • Insert physical plan between query runner and operators

  • Convert query runners to a planner: f(query, metadata) -> physical plan

  • Optimize away the ad-hoc bits: move toward a more typical row format

  • Abstract out the HTTP protocol. Add Gian's new network protocol.

  • Introduce multiple tiers. Requires new server type.

  • Push rework below the Cursor level: revisit storage adapters

    • Adapters assume that Druid will do the work
    • Segment adapter negotiates push-down of operations it can handle
    • CSV, etc. are simple layers; they don't try to simulate segments

First PR Plan

Goal:

  • Broaden discussion. (There will likely be much resistance this first go-round.)
  • Establish a toehold.
  • Make the concept more concrete to other devs.

Tasks:

  • Rebase op-step3 on master.
  • Fix merge issues.
  • Run a test run with operators enabled.
  • DruidUnionRel
    • As designed, runs independent queries, then concats results
    • Change to have a root segment with a union operator, with child frags (each frag has a separate query)
  • Test on Broker, historical pair.
    • Ingest data
    • Run queries in old mode & verify.
    • Run queries in op mode & verify.
  • Perhaps find the benchmarks and try those.

Investigations

  • Prototype for group-by query

Other Tasks

  • Producer/consumer queue for the scatter/gather operator
  • Unified row format, perhaps based on frames

Completed

  • Early prototypes: scan query, timeseries query
  • Internal discussions around merit
Clone this wiki locally