-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate the performance regression in ordered append for partially compressed chunk #6032
Comments
Looking into this now. For the first case, I see that two queries marked as eligible for chunk append:
do not use a chunk append plan. For the second case, I actually see no difference between the produced plans or the execution times when running the benchmark locally. |
I started benchmarks of the "fix ordered append" commit and the previous one again, maybe it was some simultaneous change in test tables which looked as a regression. |
Looks like there are some regressions that are reproducible. |
it seems the following queries get slower:
1. SELECT * FROM space_part ORDER BY a, time DESC LIMIT 1;Plan Before: Parallel, Gather Merge
After plan: LIMIT on top of Merge Append
2. SELECT * FROM space_part ORDER BY time_bucket('2d', time) DESC, a LIMIT 1;3. SELECT * FROM ht_metrics_partially_compressed ORDER BY time_bucket('1d', time) DESC, device LIMIT 1;The plan is different before and after: Before commit 373c556 (PR #5812)Produced plan: parallel gather merge
```
explain analyze select * FROM ht_metrics_partially_compressed ORDER BY time_bucket('1d', time) DESC, device LIMIT 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=254871.03..254871.15 rows=1 width=28) (actual time=3562.470..3563.649 rows=1 loops=1)
-> Gather Merge (cost=254871.03..2860589.69 rows=22333172 width=28) (actual time=3562.466..3563.644 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=253871.01..281787.47 rows=11166586 width=28) (actual time=3547.759..3547.765 rows=1 loops=3)
Sort Key: (time_bucket('1 day'::interval, _hyper_1_1_chunk."time")) DESC, _hyper_1_1_chunk.device
Sort Method: top-N heapsort Memory: 25kB
Worker 0: Sort Method: top-N heapsort Memory: 25kB
Worker 1: Sort Method: top-N heapsort Memory: 25kB
-> Result (cost=0.07..198038.08 rows=11166586 width=28) (actual time=0.045..2121.360 rows=8928003 loops=3)
-> Parallel Append (cost=0.07..58455.75 rows=11166586 width=20) (actual time=0.041..1166.239 rows=8928003 loops=3)
-> Custom Scan (DecompressChunk) on _hyper_1_1_chunk (cost=0.07..40.41 rows=541000 width=20) (actual time=0.049..63.924 rows=916000 loops=1)
-> Parallel Seq Scan on compress_hyper_2_7_chunk (cost=0.00..40.41 rows=541 width=88) (actual time=0.005..0.158 rows=920 loops=1)
-> Custom Scan (DecompressChunk) on _hyper_1_6_chunk (cost=0.07..70.65 rows=965000 width=20) (actual time=0.058..161.449 rows=1636010 loops=1)
-> Parallel Seq Scan on compress_hyper_2_12_chunk (cost=0.00..70.65 rows=965 width=88) (actual time=0.006..0.522 rows=1640 loops=1)
-> Custom Scan (DecompressChunk) on _hyper_1_2_chunk (cost=0.07..259.47 rows=3547000 width=20) (actual time=0.028..455.151 rows=6028000 loops=1)
-> Parallel Seq Scan on compress_hyper_2_8_chunk (cost=0.00..259.47 rows=3547 width=88) (actual time=0.002..1.177 rows=6030 loops=1)
-> Custom Scan (DecompressChunk) on _hyper_1_3_chunk (cost=0.07..259.47 rows=3547000 width=20) (actual time=0.049..472.343 rows=6028000 loops=1)
-> Parallel Seq Scan on compress_hyper_2_9_chunk (cost=0.00..259.47 rows=3547 width=88) (actual time=0.003..1.177 rows=6030 loops=1)
-> Custom Scan (DecompressChunk) on _hyper_1_4_chunk (cost=0.07..259.47 rows=3547000 width=20) (actual time=0.041..164.879 rows=2009333 loops=3)
-> Parallel Seq Scan on compress_hyper_2_10_chunk (cost=0.00..259.47 rows=3547 width=88) (actual time=0.003..0.388 rows=2010 loops=3)
-> Custom Scan (DecompressChunk) on _hyper_1_5_chunk (cost=0.07..259.47 rows=3547000 width=20) (actual time=0.041..436.828 rows=6028000 loops=1)
-> Parallel Seq Scan on compress_hyper_2_11_chunk (cost=0.00..259.47 rows=3547 width=88) (actual time=0.002..0.899 rows=6030 loops=1)
-> Parallel Seq Scan on _hyper_1_1_chunk (cost=0.00..245.65 rows=11765 width=20) (actual time=0.003..1.370 rows=20000 loops=1)
-> Parallel Seq Scan on _hyper_1_2_chunk (cost=0.00..245.65 rows=11765 width=20) (actual time=0.002..1.379 rows=20000 loops=1)
-> Parallel Seq Scan on _hyper_1_3_chunk (cost=0.00..245.65 rows=11765 width=20) (actual time=0.003..1.363 rows=20000 loops=1)
-> Parallel Seq Scan on _hyper_1_4_chunk (cost=0.00..245.65 rows=11765 width=20) (actual time=0.002..1.437 rows=20000 loops=1)
-> Parallel Seq Scan on _hyper_1_5_chunk (cost=0.00..245.65 rows=11765 width=20) (actual time=0.002..1.370 rows=20000 loops=1)
-> Parallel Seq Scan on _hyper_1_6_chunk (cost=0.00..245.65 rows=11765 width=20) (actual time=0.013..2.224 rows=20000 loops=1)
Planning Time: 2.372 ms
Execution Time: 3563.894 ms
(31 rows)
```
After PR #5812Produced plan: Merge Append
```
explain analyze select * FROM ht_metrics_partially_compressed ORDER BY time_bucket('1d', time) DESC, device LIMIT 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4427.02..4427.05 rows=1 width=28) (actual time=8326.498..8326.510 rows=1 loops=1)
-> Merge Append (cost=4427.02..954411.99 rows=26800000 width=28) (actual time=8326.495..8326.506 rows=1 loops=1)
Sort Key: (time_bucket('1 day'::interval, _hyper_1_1_chunk."time")) DESC, _hyper_1_1_chunk.device
-> Sort (cost=4644.20..6944.20 rows=920000 width=20) (actual time=296.704..296.706 rows=1 loops=1)
Sort Key: (time_bucket('1 day'::interval, _hyper_1_1_chunk."time")) DESC, _hyper_1_1_chunk.device
Sort Method: top-N heapsort Memory: 25kB
-> Result (cost=0.05..44.20 rows=920000 width=20) (actual time=0.052..149.768 rows=916000 loops=1)
-> Custom Scan (DecompressChunk) on _hyper_1_1_chunk (cost=0.05..44.20 rows=920000 width=20) (actual time=0.050..63.894 rows=916000 loops=1)
-> Seq Scan on compress_hyper_2_7_chunk (cost=0.00..44.20 rows=920 width=88) (actual time=0.012..0.313 rows=920 loops=1)
-> Sort (cost=428.00..478.00 rows=20000 width=20) (actual time=6.908..6.908 rows=1 loops=1)
Sort Key: (time_bucket('1 day'::interval, _hyper_1_1_chunk."time")) DESC, _hyper_1_1_chunk.device
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on _hyper_1_1_chunk (cost=0.00..328.00 rows=20000 width=20) (actual time=0.016..3.480 rows=20000 loops=1)
-> Sort (cost=30434.30..45509.30 rows=6030000 width=20) (actual time=1879.827..1879.828 rows=1 loops=1)
Sort Key: (time_bucket('1 day'::interval, _hyper_1_2_chunk."time")) DESC, _hyper_1_2_chunk.device
Sort Method: top-N heapsort Memory: 25kB
-> Result (cost=0.05..284.30 rows=6030000 width=20) (actual time=0.045..996.643 rows=6028000 loops=1)
-> Custom Scan (DecompressChunk) on _hyper_1_2_chunk (cost=0.05..284.30 rows=6030000 width=20) (actual time=0.044..424.280 rows=6028000 loops=1)
-> Seq Scan on compress_hyper_2_8_chunk (cost=0.00..284.30 rows=6030 width=88) (actual time=0.010..1.980 rows=6030 loops=1)
-> Sort (cost=428.00..478.00 rows=20000 width=20) (actual time=6.826..6.827 rows=1 loops=1)
Sort Key: (time_bucket('1 day'::interval, _hyper_1_2_chunk."time")) DESC, _hyper_1_2_chunk.device
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on _hyper_1_2_chunk (cost=0.00..328.00 rows=20000 width=20) (actual time=0.018..3.442 rows=20000 loops=1)
-> Sort (cost=30434.30..45509.30 rows=6030000 width=20) (actual time=1863.252..1863.253 rows=1 loops=1)
Sort Key: (time_bucket('1 day'::interval, _hyper_1_3_chunk."time")) DESC, _hyper_1_3_chunk.device
Sort Method: top-N heapsort Memory: 25kB
-> Result (cost=0.05..284.30 rows=6030000 width=20) (actual time=0.042..990.892 rows=6028000 loops=1)
-> Custom Scan (DecompressChunk) on _hyper_1_3_chunk (cost=0.05..284.30 rows=6030000 width=20) (actual time=0.040..419.105 rows=6028000 loops=1)
-> Seq Scan on compress_hyper_2_9_chunk (cost=0.00..284.30 rows=6030 width=88) (actual time=0.011..1.989 rows=6030 loops=1)
-> Sort (cost=428.00..478.00 rows=20000 width=20) (actual time=6.826..6.826 rows=1 loops=1)
Sort Key: (time_bucket('1 day'::interval, _hyper_1_3_chunk."time")) DESC, _hyper_1_3_chunk.device
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on _hyper_1_3_chunk (cost=0.00..328.00 rows=20000 width=20) (actual time=0.018..3.422 rows=20000 loops=1)
-> Sort (cost=30434.30..45509.30 rows=6030000 width=20) (actual time=1863.692..1863.693 rows=1 loops=1)
Sort Key: (time_bucket('1 day'::interval, _hyper_1_4_chunk."time")) DESC, _hyper_1_4_chunk.device
Sort Method: top-N heapsort Memory: 25kB
-> Result (cost=0.05..284.30 rows=6030000 width=20) (actual time=0.045..991.920 rows=6028000 loops=1)
-> Custom Scan (DecompressChunk) on _hyper_1_4_chunk (cost=0.05..284.30 rows=6030000 width=20) (actual time=0.043..420.526 rows=6028000 loops=1)
-> Seq Scan on compress_hyper_2_10_chunk (cost=0.00..284.30 rows=6030 width=88) (actual time=0.011..2.544 rows=6030 loops=1)
-> Sort (cost=428.00..478.00 rows=20000 width=20) (actual time=7.041..7.041 rows=1 loops=1)
Sort Key: (time_bucket('1 day'::interval, _hyper_1_4_chunk."time")) DESC, _hyper_1_4_chunk.device
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on _hyper_1_4_chunk (cost=0.00..328.00 rows=20000 width=20) (actual time=0.043..3.600 rows=20000 loops=1)
-> Sort (cost=30434.30..45509.30 rows=6030000 width=20) (actual time=1864.865..1864.866 rows=1 loops=1)
Sort Key: (time_bucket('1 day'::interval, _hyper_1_5_chunk."time")) DESC, _hyper_1_5_chunk.device
Sort Method: top-N heapsort Memory: 25kB
-> Result (cost=0.05..284.30 rows=6030000 width=20) (actual time=0.042..991.010 rows=6028000 loops=1)
-> Custom Scan (DecompressChunk) on _hyper_1_5_chunk (cost=0.05..284.30 rows=6030000 width=20) (actual time=0.040..419.622 rows=6028000 loops=1)
-> Seq Scan on compress_hyper_2_11_chunk (cost=0.00..284.30 rows=6030 width=88) (actual time=0.010..2.173 rows=6030 loops=1)
-> Sort (cost=428.00..478.00 rows=20000 width=20) (actual time=6.883..6.883 rows=1 loops=1)
Sort Key: (time_bucket('1 day'::interval, _hyper_1_5_chunk."time")) DESC, _hyper_1_5_chunk.device
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on _hyper_1_5_chunk (cost=0.00..328.00 rows=20000 width=20) (actual time=0.020..3.472 rows=20000 loops=1)
-> Sort (cost=8277.40..12377.40 rows=1640000 width=20) (actual time=516.734..516.735 rows=1 loops=1)
Sort Key: (time_bucket('1 day'::interval, _hyper_1_6_chunk."time")) DESC, _hyper_1_6_chunk.device
Sort Method: top-N heapsort Memory: 25kB
-> Result (cost=0.05..77.40 rows=1640000 width=20) (actual time=0.049..268.715 rows=1636010 loops=1)
-> Custom Scan (DecompressChunk) on _hyper_1_6_chunk (cost=0.05..77.40 rows=1640000 width=20) (actual time=0.047..114.312 rows=1636010 loops=1)
-> Seq Scan on compress_hyper_2_12_chunk (cost=0.00..77.40 rows=1640 width=88) (actual time=0.009..0.533 rows=1640 loops=1)
-> Sort (cost=428.00..478.00 rows=20000 width=20) (actual time=6.914..6.914 rows=1 loops=1)
Sort Key: (time_bucket('1 day'::interval, _hyper_1_6_chunk."time")) DESC, _hyper_1_6_chunk.device
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on _hyper_1_6_chunk (cost=0.00..328.00 rows=20000 width=20) (actual time=0.016..3.511 rows=20000 loops=1)
Planning Time: 4.904 ms
Execution Time: 8326.839 ms
(65 rows)
```
4. SELECT min(device), max(value) FROM ht_metrics_partially_compressed;Plan before: Finalize Aggregate -> Gather -> Partial Aggregate -> Parallel Append
Plan after: Limit on top of Merge Append
|
To sum up, the problem is that the "limit 1 after sort" plan we have for minmax is chosen over the usual "aggregation" plan, because it has lower cost. Locally for me, the aggregation plan is also slower, so it seems correct. I can still force it by setting The upcoming changes in chunkwise aggregation planning don't seem to fix this: https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3968&var-run2=3979&var-threshold=0.02&var-use_historical_thresholds=true&var-threshold_expression=2%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=false&from=now-2d&to=now (tested with #7025) Also lowering the costs for vectorized aggregation might help. Currenlty we vectorize it only after all the planning is done (post-planning hook), so it's not accounted for in the costs. |
We should investigate this before releasing 2.12. Introduced in #5812
https://grafana.ops.savannah-dev.timescale.com/d/uP2MnQk4z/query-run-times?orgId=1&var-suite=ordered_append_partially_compressed&var-query=09f0fa498afbef68e6b783dd4cfa5083&from=1692389538057&to=1693270942329
https://grafana.ops.savannah-dev.timescale.com/d/uP2MnQk4z/query-run-times?orgId=1&var-suite=planning&var-query=12707f17764cf1858b4a5215c4923452&from=now-30d&to=now&var-pg_major=14&var-pg_major=15&var-machine=m5a.2xlarge&var-orderby=1&var-onlymain=1&var-lastversion=7faad3d18bbe1246ac3a9e58c7197ee45547011c2d5aac3307fde20747cd05f6
The text was updated successfully, but these errors were encountered: