Skip to content

Commit

Permalink
doc: add doc on aggregations (#16144)
Browse files Browse the repository at this point in the history
  • Loading branch information
kwannoel authored Apr 19, 2024
1 parent bda3b45 commit db02083
Show file tree
Hide file tree
Showing 3 changed files with 49 additions and 0 deletions.
49 changes: 49 additions & 0 deletions docs/aggregation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Aggregations

We will cover internal implementation of common aggregations in this document.


## Frontend

TODO

## Expression Framework

TODO

## HashAggExecutor

![aggregation components](./images/aggregation/agg-components.png)

Within the `HashAggExecutor`, there are 4 main components:
1. AggCalls.
2. AggState.
3. AggGroups.
4. Persisted State.

AggCalls are the aggregation calls for the query. For instance `SUM(v1)`, `COUNT(v2)` has the AggCalls `SUM` and `COUNT`.

AggState is the state we use to compute to the result (output) of the aggregation call.
Within each aggregation group, it will have an AggState for each AggCall.

AggGroups are created per aggregation group.
For instance with `GROUP BY x1, x2`, there will be a group for each unique combination of `x1` and `x2`.

Whenever stream chunks come in, the executor will update the aggregation state for each group, per agg call.

On barrier, we will persist the in-memory states.
For `value` type aggregations, we will persist the state to the intermediate state table.
This state table will store all value aggregations per group on a single row.

For `MaterializedInput` type aggregations, these require tracking input state. For example, non-append-only min/max.
For each of these aggregations, they have 1 state table (`AggStateStorage::MaterializedInput`) each. Within the state table, it will store the input state for each group.

### Initialization of `AggGroups`

![init-agg-group](./images/aggregation/init-agg-group.png)

AggGroups are initialized when corresponding aggregation groups are not found in `AggGroupCache`.
This could be either because the `AggGroupCache` got evicted,
or its a new group key.

It could take a while to initialize agg groups, hence we cache them in `AggGroupCache`.
Binary file added docs/images/aggregation/agg-components.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/aggregation/init-agg-group.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit db02083

Please sign in to comment.