-
Notifications
You must be signed in to change notification settings - Fork 591
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
doc: add doc on aggregations (#16144)
- Loading branch information
Showing
3 changed files
with
49 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Aggregations | ||
|
||
We will cover internal implementation of common aggregations in this document. | ||
|
||
|
||
## Frontend | ||
|
||
TODO | ||
|
||
## Expression Framework | ||
|
||
TODO | ||
|
||
## HashAggExecutor | ||
|
||
![aggregation components](./images/aggregation/agg-components.png) | ||
|
||
Within the `HashAggExecutor`, there are 4 main components: | ||
1. AggCalls. | ||
2. AggState. | ||
3. AggGroups. | ||
4. Persisted State. | ||
|
||
AggCalls are the aggregation calls for the query. For instance `SUM(v1)`, `COUNT(v2)` has the AggCalls `SUM` and `COUNT`. | ||
|
||
AggState is the state we use to compute to the result (output) of the aggregation call. | ||
Within each aggregation group, it will have an AggState for each AggCall. | ||
|
||
AggGroups are created per aggregation group. | ||
For instance with `GROUP BY x1, x2`, there will be a group for each unique combination of `x1` and `x2`. | ||
|
||
Whenever stream chunks come in, the executor will update the aggregation state for each group, per agg call. | ||
|
||
On barrier, we will persist the in-memory states. | ||
For `value` type aggregations, we will persist the state to the intermediate state table. | ||
This state table will store all value aggregations per group on a single row. | ||
|
||
For `MaterializedInput` type aggregations, these require tracking input state. For example, non-append-only min/max. | ||
For each of these aggregations, they have 1 state table (`AggStateStorage::MaterializedInput`) each. Within the state table, it will store the input state for each group. | ||
|
||
### Initialization of `AggGroups` | ||
|
||
![init-agg-group](./images/aggregation/init-agg-group.png) | ||
|
||
AggGroups are initialized when corresponding aggregation groups are not found in `AggGroupCache`. | ||
This could be either because the `AggGroupCache` got evicted, | ||
or its a new group key. | ||
|
||
It could take a while to initialize agg groups, hence we cache them in `AggGroupCache`. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.