-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorized aggregation with grouping by one fixed-size column #7341
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7341 +/- ##
==========================================
+ Coverage 80.06% 82.34% +2.27%
==========================================
Files 190 238 +48
Lines 37181 43722 +6541
Branches 9450 10970 +1520
==========================================
+ Hits 29770 36002 +6232
- Misses 2997 3385 +388
+ Partials 4414 4335 -79 ☔ View full report in Codecov by Sentry. |
tsl/src/nodes/vector_agg/exec.c
Outdated
} | ||
} | ||
|
||
/* | ||
* Currently the only grouping policy we use is per-batch grouping. | ||
* Determine which grouping policy we are going to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity: Why is the grouping policy decided at execution time and not plan time? Should it not affect the plan and cost calc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it all to plan time, although as we discussed today on call, it doesn't affect the costs yet.
Co-authored-by: Erik Nordström <[email protected]> Signed-off-by: Alexander Kuzmenkov <[email protected]>
Co-authored-by: Erik Nordström <[email protected]> Signed-off-by: Alexander Kuzmenkov <[email protected]>
This release contains performance improvements and bug fixes since the 2.17.2 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#6901: Add hypertable support for transition tables. * timescale#7104: Hypercore table access method. * timescale#7271: Push down `order by` in real-time continuous aggregate queries. * timescale#7295: Support `alter table set access method` on hypertable. * timescale#7341: Vectorized aggregation with grouping by one fixed-size by-value compressed column * timescale#7390: Disable custom `hashagg` planner code. * timescale#7411: Change parameter name to enable hypercore table access method. * timescale#7412: Add GUC for `hypercore_use_access_method` default. * timescale#7413: Add GUC for segmentwise recompression. * timescale#7433 Add support for merging chunks * timescale#7436 Add index creation on orderby columns * timescale#7443: Add hypercore function and view aliases. * timescale#7455: Support `drop not null` on compressed hypertables. * timescale#7458: Support vecorized aggregation with aggregate `filter` clauses that are also vectorizable. * timescale#7482: Optimize recompression of partially compressed chunks. * timescale#7486: Prevent building against postgres versions with broken ABI. * timescale#7521 Add optional `force` argument to `refresh_continuous_aggregate` * timescale#7528 Transform sorting on `time_bucket` to sorting on time for compressed chunks in some cases. * timescale#7565 Add hint when hypertable creation fails * timescale#7587 Add `include_tiered_data` parameter to `add_continuous_aggregate_policy` API **Bugfixes** * timescale#7378: Remove obsolete job referencing `policy_job_error_retention`. * timescale#7409: Update `bgw_job` table when altering procedure. * timescale#7410: Fix the `aggregated compressed column not found` error on aggregation query. * timescale#7426: Fix `datetime` parsing error in chunk constraint creation. * timescale#7432: Verify that the heap tuple is valid before using. * timescale#7434: Fixes the segfault when internally setting the replica identity for a given chunk. * timescale#7488: Emit error for transition table trigger on chunks. * timescale#7514: Fix the error: `invalid child of chunk append`. * timescale#7517 Fixes performance regression on `cagg_migrate` procedure * timescale#7527 Restart scheduler on error * timescale#7557: Fix null handling for in-memory tuple filtering. * timescale#7566 Improve transaction check in CAgg refresh * timescale#7584 Fix NaN-handling for vectorized aggregation **Thanks** * @bharrisau for reporting the segfault when creating chunks. * @k-rus for suggesting the improvement * @pgloader for reporting the issue in an internal background job. * @staticlibs for sending PR to improve transaction check in CAgg refresh * @uasiddiqi for reporting the `aggregated compressed column not found` error.
This release contains performance improvements and bug fixes since the 2.17.2 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#6901: Add hypertable support for transition tables. * timescale#7104: Hypercore table access method. * timescale#7271: Push down `order by` in real-time continuous aggregate queries. * timescale#7295: Support `alter table set access method` on hypertable. * timescale#7341: Vectorized aggregation with grouping by one fixed-size by-value compressed column * timescale#7390: Disable custom `hashagg` planner code. * timescale#7411: Change parameter name to enable hypercore table access method. * timescale#7412: Add GUC for `hypercore_use_access_method` default. * timescale#7413: Add GUC for segmentwise recompression. * timescale#7433 Add support for merging chunks * timescale#7436 Add index creation on orderby columns * timescale#7443: Add hypercore function and view aliases. * timescale#7455: Support `drop not null` on compressed hypertables. * timescale#7458: Support vecorized aggregation with aggregate `filter` clauses that are also vectorizable. * timescale#7482: Optimize recompression of partially compressed chunks. * timescale#7486: Prevent building against postgres versions with broken ABI. * timescale#7521 Add optional `force` argument to `refresh_continuous_aggregate` * timescale#7528 Transform sorting on `time_bucket` to sorting on time for compressed chunks in some cases. * timescale#7565 Add hint when hypertable creation fails * timescale#7587 Add `include_tiered_data` parameter to `add_continuous_aggregate_policy` API **Bugfixes** * timescale#7378: Remove obsolete job referencing `policy_job_error_retention`. * timescale#7409: Update `bgw_job` table when altering procedure. * timescale#7410: Fix the `aggregated compressed column not found` error on aggregation query. * timescale#7426: Fix `datetime` parsing error in chunk constraint creation. * timescale#7432: Verify that the heap tuple is valid before using. * timescale#7434: Fixes the segfault when internally setting the replica identity for a given chunk. * timescale#7488: Emit error for transition table trigger on chunks. * timescale#7514: Fix the error: `invalid child of chunk append`. * timescale#7517 Fixes performance regression on `cagg_migrate` procedure * timescale#7527 Restart scheduler on error * timescale#7557: Fix null handling for in-memory tuple filtering. * timescale#7566 Improve transaction check in CAgg refresh * timescale#7584 Fix NaN-handling for vectorized aggregation **Thanks** * @bharrisau for reporting the segfault when creating chunks. * @k-rus for suggesting the improvement * @pgloader for reporting the issue in an internal background job. * @staticlibs for sending PR to improve transaction check in CAgg refresh * @uasiddiqi for reporting the `aggregated compressed column not found` error.
The implementation uses the Postgres simplehash hash table for by-value fixed-size compressed columns.
The biggest improvement on a "sensible" query is about 90%, and a couple of queries show bigger improvements but these are very synthetic cases that don't make much sense:
https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3815&var-run2=3816&var-threshold=0.02&var-use_historical_thresholds=true&var-threshold_expression=2%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=false&from=now-2d&to=now