Vectorized aggregation with grouping by one fixed-size column #7341

akuzm · 2024-10-14T12:07:46Z

The implementation uses the Postgres simplehash hash table for by-value fixed-size compressed columns.

The biggest improvement on a "sensible" query is about 90%, and a couple of queries show bigger improvements but these are very synthetic cases that don't make much sense:
https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3815&var-run2=3816&var-threshold=0.02&var-use_historical_thresholds=true&var-threshold_expression=2%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=false&from=now-2d&to=now

some experiments

This reverts commit 795ef6b.

This reverts commit 166d0e8.

codecov · 2024-10-14T12:16:59Z

Codecov Report

Attention: Patch coverage is 93.40659% with 24 lines in your changes missing coverage. Please review.

Project coverage is 82.34%. Comparing base (59f50f2) to head (2bcef48).
Report is 672 commits behind head on main.

Files with missing lines	Patch %	Lines
tsl/src/nodes/vector_agg/grouping_policy_hash.c	91.13%	3 Missing and 11 partials ⚠️
tsl/src/nodes/vector_agg/plan.c	88.88%	2 Missing and 3 partials ⚠️
...nodes/vector_agg/function/agg_many_vector_helper.c	95.00%	0 Missing and 1 partial ⚠️
...rc/nodes/vector_agg/hashing/batch_hashing_params.h	85.71%	0 Missing and 1 partial ⚠️
...rc/nodes/vector_agg/hashing/hash_strategy_common.c	94.44%	0 Missing and 1 partial ⚠️
.../src/nodes/vector_agg/hashing/hash_strategy_impl.c	98.18%	0 Missing and 1 partial ⚠️
..._agg/hashing/hash_strategy_impl_single_fixed_key.c	95.65%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7341      +/-   ##
==========================================
+ Coverage   80.06%   82.34%   +2.27%     
==========================================
  Files         190      238      +48     
  Lines       37181    43722    +6541     
  Branches     9450    10970    +1520     
==========================================
+ Hits        29770    36002    +6232     
- Misses       2997     3385     +388     
+ Partials     4414     4335      -79

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

erimatnor · 2024-12-18T08:04:01Z

tsl/src/nodes/vector_agg/exec.c

 		}
 	}

 	/*
-	 * Currently the only grouping policy we use is per-batch grouping.
+	 * Determine which grouping policy we are going to use.


Out of curiosity: Why is the grouping policy decided at execution time and not plan time? Should it not affect the plan and cost calc?

I moved it all to plan time, although as we discussed today on call, it doesn't affect the costs yet.

tsl/src/nodes/vector_agg/grouping_policy_hash.c

tsl/src/nodes/vector_agg/grouping_policy_hash.h

Co-authored-by: Erik Nordström <[email protected]> Signed-off-by: Alexander Kuzmenkov <[email protected]>

@bharrisau

This release contains performance improvements and bug fixes since the 2.17.2 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#6901: Add hypertable support for transition tables. * timescale#7104: Hypercore table access method. * timescale#7271: Push down `order by` in real-time continuous aggregate queries. * timescale#7295: Support `alter table set access method` on hypertable. * timescale#7341: Vectorized aggregation with grouping by one fixed-size by-value compressed column * timescale#7390: Disable custom `hashagg` planner code. * timescale#7411: Change parameter name to enable hypercore table access method. * timescale#7412: Add GUC for `hypercore_use_access_method` default. * timescale#7413: Add GUC for segmentwise recompression. * timescale#7433 Add support for merging chunks * timescale#7436 Add index creation on orderby columns * timescale#7443: Add hypercore function and view aliases. * timescale#7455: Support `drop not null` on compressed hypertables. * timescale#7458: Support vecorized aggregation with aggregate `filter` clauses that are also vectorizable. * timescale#7482: Optimize recompression of partially compressed chunks. * timescale#7486: Prevent building against postgres versions with broken ABI. * timescale#7521 Add optional `force` argument to `refresh_continuous_aggregate` * timescale#7528 Transform sorting on `time_bucket` to sorting on time for compressed chunks in some cases. * timescale#7565 Add hint when hypertable creation fails * timescale#7587 Add `include_tiered_data` parameter to `add_continuous_aggregate_policy` API **Bugfixes** * timescale#7378: Remove obsolete job referencing `policy_job_error_retention`. * timescale#7409: Update `bgw_job` table when altering procedure. * timescale#7410: Fix the `aggregated compressed column not found` error on aggregation query. * timescale#7426: Fix `datetime` parsing error in chunk constraint creation. * timescale#7432: Verify that the heap tuple is valid before using. * timescale#7434: Fixes the segfault when internally setting the replica identity for a given chunk. * timescale#7488: Emit error for transition table trigger on chunks. * timescale#7514: Fix the error: `invalid child of chunk append`. * timescale#7517 Fixes performance regression on `cagg_migrate` procedure * timescale#7527 Restart scheduler on error * timescale#7557: Fix null handling for in-memory tuple filtering. * timescale#7566 Improve transaction check in CAgg refresh * timescale#7584 Fix NaN-handling for vectorized aggregation **Thanks** * @bharrisau for reporting the segfault when creating chunks. * @k-rus for suggesting the improvement * @pgloader for reporting the issue in an internal background job. * @staticlibs for sending PR to improve transaction check in CAgg refresh * @uasiddiqi for reporting the `aggregated compressed column not found` error.

@bharrisau

This release contains performance improvements and bug fixes since the 2.17.2 release. We recommend that you upgrade at the next available opportunity. **Features** * timescale#6901: Add hypertable support for transition tables. * timescale#7104: Hypercore table access method. * timescale#7271: Push down `order by` in real-time continuous aggregate queries. * timescale#7295: Support `alter table set access method` on hypertable. * timescale#7341: Vectorized aggregation with grouping by one fixed-size by-value compressed column * timescale#7390: Disable custom `hashagg` planner code. * timescale#7411: Change parameter name to enable hypercore table access method. * timescale#7412: Add GUC for `hypercore_use_access_method` default. * timescale#7413: Add GUC for segmentwise recompression. * timescale#7433 Add support for merging chunks * timescale#7436 Add index creation on orderby columns * timescale#7443: Add hypercore function and view aliases. * timescale#7455: Support `drop not null` on compressed hypertables. * timescale#7458: Support vecorized aggregation with aggregate `filter` clauses that are also vectorizable. * timescale#7482: Optimize recompression of partially compressed chunks. * timescale#7486: Prevent building against postgres versions with broken ABI. * timescale#7521 Add optional `force` argument to `refresh_continuous_aggregate` * timescale#7528 Transform sorting on `time_bucket` to sorting on time for compressed chunks in some cases. * timescale#7565 Add hint when hypertable creation fails * timescale#7587 Add `include_tiered_data` parameter to `add_continuous_aggregate_policy` API **Bugfixes** * timescale#7378: Remove obsolete job referencing `policy_job_error_retention`. * timescale#7409: Update `bgw_job` table when altering procedure. * timescale#7410: Fix the `aggregated compressed column not found` error on aggregation query. * timescale#7426: Fix `datetime` parsing error in chunk constraint creation. * timescale#7432: Verify that the heap tuple is valid before using. * timescale#7434: Fixes the segfault when internally setting the replica identity for a given chunk. * timescale#7488: Emit error for transition table trigger on chunks. * timescale#7514: Fix the error: `invalid child of chunk append`. * timescale#7517 Fixes performance regression on `cagg_migrate` procedure * timescale#7527 Restart scheduler on error * timescale#7557: Fix null handling for in-memory tuple filtering. * timescale#7566 Improve transaction check in CAgg refresh * timescale#7584 Fix NaN-handling for vectorized aggregation **Thanks** * @bharrisau for reporting the segfault when creating chunks. * @k-rus for suggesting the improvement * @pgloader for reporting the issue in an internal background job. * @staticlibs for sending PR to improve transaction check in CAgg refresh * @uasiddiqi for reporting the `aggregated compressed column not found` error.

akuzm added 23 commits October 2, 2024 10:32

Vectorized hash grouping on one column

b92e622

some experiments

Merge remote-tracking branch 'origin/main' into HEAD

4ce0e99

benchmark vectorized grouping (2024-10-02 no. 6)

74d4419

fixes

baedf7f

benchmark vectorized grouping (2024-10-02 no. 7)

35dbd36

some ugly stuff

74fffd3

benchmark vectorized grouping (2024-10-02 no. 9)

f8db454

someething

00a9d11

reduce indirections

339f91a

skip null bitmap words

f075589

cleanup

88f325d

crc32

15ab443

license

ff16ec8

benchmark vectorized hash grouping (2024-10-09 no. 10)

4291b17

test deltadelta changes

795ef6b

some speedups and simplehash simplifications

1fabb22

Revert "test deltadelta changes"

717abc4

This reverts commit 795ef6b.

test deltadelta changes

b03bd6b

work with signed types

166d0e8

Revert "work with signed types"

7f578b4

This reverts commit 166d0e8.

bulk stuff specialized to element type

e70cb0b

roll back the delta delta stuff

0040844

use simplehash

694faf6

akuzm added 6 commits October 14, 2024 13:31

cleanup

3d05674

benchmark vectorized hash grouping (simple) (2024-10-14 no. 11)

d90a90f

add more tests

4a93549

remove modified simplehash

3e06b92

offsets

a7942ed

cleanup

6fb517f

akuzm added 16 commits December 3, 2024 15:23

remove extras

b6cee02

ref

ecb1aec

fixes

f64676f

benchmark single fixed-column hash grouping (2024-12-03 no. 11)

fab11fb

cleanup

dff6dff

planning fixes for pg 17

831cadd

benchmark fixed-size hash grouping (2024-12-04 no. 152)

66403f2

remove some (yet) unused code

99e5b04

Merge remote-tracking branch 'origin/main' into HEAD

de22a22

ref

9fccab9

Merge remote-tracking branch 'akuzm/vector-filter' into HEAD

8e97c2f

add test

f5b648a

Merge remote-tracking branch 'origin/main' into HEAD

ecd9cb2

typo

dc6001d

disable parallel

0ea397a

add order

ea4dab1

erimatnor approved these changes Dec 18, 2024

View reviewed changes

akuzm and others added 6 commits December 18, 2024 17:51

Update tsl/src/nodes/vector_agg/grouping_policy_hash.h

4b98e46

Co-authored-by: Erik Nordström <[email protected]> Signed-off-by: Alexander Kuzmenkov <[email protected]>

Update tsl/src/nodes/vector_agg/grouping_policy_hash.h

b615dbe

Co-authored-by: Erik Nordström <[email protected]> Signed-off-by: Alexander Kuzmenkov <[email protected]>

determine the grouping type at plan time

045f59a

Merge remote-tracking branch 'origin/main' into HEAD

10e66ad

cleanup

df100f2

Merge remote-tracking branch 'origin/main' into HEAD

2bcef48

akuzm enabled auto-merge (squash) January 2, 2025 19:45

akuzm merged commit 11e866e into timescale:main Jan 2, 2025
49 of 50 checks passed

akuzm deleted the hash-simple branch January 2, 2025 19:46

pallavisontakke mentioned this pull request Jan 16, 2025

Release 2.18.0 #7596

Open

pallavisontakke mentioned this pull request Jan 17, 2025

Release 2.18.0 #7599

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorized aggregation with grouping by one fixed-size column #7341

Vectorized aggregation with grouping by one fixed-size column #7341

akuzm commented Oct 14, 2024 •

edited

Loading

codecov bot commented Oct 14, 2024 •

edited

Loading

erimatnor Dec 18, 2024

akuzm Dec 18, 2024

Vectorized aggregation with grouping by one fixed-size column #7341

Vectorized aggregation with grouping by one fixed-size column #7341

Conversation

akuzm commented Oct 14, 2024 • edited Loading

codecov bot commented Oct 14, 2024 • edited Loading

Codecov Report

erimatnor Dec 18, 2024

Choose a reason for hiding this comment

akuzm Dec 18, 2024

Choose a reason for hiding this comment

akuzm commented Oct 14, 2024 •

edited

Loading

codecov bot commented Oct 14, 2024 •

edited

Loading