Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(batch): support spill hash agg for the batch query #16771

Merged
merged 22 commits into from
May 29, 2024

fix

30c5cc1
Select commit
Loading
Failed to load commit list.
Merged

feat(batch): support spill hash agg for the batch query #16771

fix
30c5cc1
Select commit
Loading
Failed to load commit list.
Task list completed / task-list-completed Started 2024-05-29 14:36:35 ago

0 / 8 tasks completed

8 tasks still to be completed

Details

Required Tasks

Task Status
Related RFC: risingwavelabs/rfcs#89 Incomplete
Tracking issue #16615 Incomplete
Support spill hash agg for the batch query. Incomplete
When HashAggExecutor told memory is insufficient, AggSpillManager will start to partition the hash table and spill to disk. After spilling the hash table, AggSpillManager will consume all chunks from the input executor, partition and spill to disk with the same hash function as the hash table spilling. Finally, we would get e.g. 20 partitions. Each partition should contain a portion of the original hash table and input data. A sub HashAggExecutor would be used to consume each partition one by one. If memory is still not enough in the sub HashAggExecutor, it will partition its hash table and input recursively. Incomplete
SpillOp is used to manage the spill directory of the spilling executor and it will drop the directory with a RAII style. Incomplete
An environment variable RW_BATCH_SPILL_DIR would be used to configure the path to spill, by default /tmp/. Incomplete
I have written necessary rustdoc comments Incomplete
I have added necessary unit tests and integration tests Incomplete
I have added test labels as necessary. See details. Incomplete
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features #7934). Incomplete
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future). Incomplete
All checks passed in ./risedev check (or alias, ./risedev c) Incomplete
My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details) Incomplete
My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users) Incomplete
Support spill hash agg for the batch query. Incomplete
If file doesn’t exist, it will be created and just like calling write. Incomplete
If file exists, data will be appended to the end of the file. Incomplete