-
Notifications
You must be signed in to change notification settings - Fork 0
Operators Timeseries Query
CombiningAccumulator
handles values for a single group.
-
Comparator<T> ordering
determines when two values are the same. -
BinaryOperator<T> mergeFn
merges values:-
f(x, null)
accepts the first value -
f(x, y)
combines values, where x is the result of a previous accumulation and y is the new value.
-
It seems that an Accumulator
is, essentially, a collection builder: it gathers items into an output (which may be a list, etc.)
When used as a CombiningAccumulator
, it creates a list of like values, with the values combined.
- "accumulate" - gather values
- "combine" - function to combine two values into one value
Data coming into a agg is grouped by segment? So that "by segment" simply leaves the grouping?
Row
is a representation of a row of data.
RowAdapters
is a wrapper around a row with limited functionality: returns functions to get the timestamp and a column value.
MapBasedRow
is a row comprised of a timestamp and a map.
TimeseriesResultBuilder
is a builder for a (timestamp, event-map) row, represented as a Result
over a TimeseriesResultValue
Result
is another form of (timestamp, value) object.
ColumnSelectorFactory
factory for column selectors.
ColumnValueSelector
provides access methods (and conversions) for all Druid data types.
RowBasedColumnSelectorFactory
creates column selectors for a Row
given a column name.
Results appear to contain only a timestamp and aggregator values. (No dimensions per group?)
TimeseriesQueryQueryToolChest.mergeResults
:
Base sequence values: TimeseriesResultValue
which is a wrapper over a map. (Note: would be more efficient to store data as a tuple.)
Case 1: Overall total (with nuances for skip empty buckets, by-segment): get single value or create a default value.
AggregatorFactory
opinionated factory for aggregations. This probably wants to be refactored, but will be hard because of the sheer number of items, and that that is an extension point. The factory seems to know its column name; thus it is a (type, column) pair. Should be split.
- accumulate(prev-total, cur-row) -> new-total
- mergefn.(x, null) - initializes the accumulator
- mergeFn.(tot, row) -> new-tot
Random requirements:
- By-segment
To convert merge:
- Retain the nested query runner for now
- Top flow
- Toolchest returns a query runner that will do the work
- That query runner implements the runtime flow
- Runtime flow
- Base runner of type
QueryRunner<Result<TimeseriesResultValue>>
- Aggregator
- Limit (if needed)
- Top aggregator
- Handle empty input (with conditions, see above)
- Add grand total, if requested
- Base runner of type
Runner stack:
- What?
-
TimeseriesQueryQueryToolChest.mergeResults()
-> anonymous runner - Anonymous merge runner
- Rewrite query to remove grand total flag
- Run
resultMergeQueryRunner
- Result for zero rows
- Grand total
- Anonymous
resultMergeQueryRunner
- Remove post aggregators
- Run merging runner
- Apply limit
- Merging runner (if not by segment)
- Base runner
In more detail:
-
TimeseriesQueryQueryToolChest.mergeResults()
returns a query runner - Creates an instance of
ResultMergeQueryRunner
which extendsBySegmentSkippingQueryRunner
- Provides the base runner and comparators
- Creates an anonymous runner that takes the merge runner via a closure
When the anonymous runner is run:
- Rewrites the query to remove the grand total flag
- Invokes the merging runner and obtains the results
- Special cases ALL
- If the above results are empty, provide a zero result instead
- If no grand total, return the results thus far.
- Create a sequence which accumulates grand totals
- Totals are over all rows; no grouping here
- Return a sequence which first iterates over the results
- Then inserts a grand-totals row.
When the merging runner is run:
- Calls
BySegmentSkippingQueryRunner.run()
- If by segment, just returns the base runner.
- Else, calls
doRun()
with the base runner. - For Timeseries,
doRun()
is overridden in themergeResults()
method.-
ResultMergeQueryRunner.doRun()
creates aCombiningSequence
to do the merge
-
- Removes the post-aggs
- Calls
ResultMergeQueryRunner.doRun()
, with the rewritten query - Applies a limit
- Note: this means the limit is prior to grand totals, and so the last grand total will be wrong.