-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: improve tpc-h q20 performance (single-topic) #14797
Comments
Query:
Plan:
Dist Plan:
|
Flink:
Plan:
|
We notice that while there are only 4 |
What is the source you use? I didn't see more than 4 join in our planner tests. https://github.com/risingwavelabs/risingwave/blob/dad438783aafa2f942d8882056c437b9e98c233f/src/frontend/planner_test/tests/testdata/output/tpch.yaml |
There are 2 issues here:
|
now the new RW plan is:
The numbers of The one remaining difference, not sure if better or worse, is that the RW's plan is bushy while Flink's plan is one side deep. RW's performance is better than before, but still has
one data block miss ops seems very high, ~140ops/s
@xxchan could you help take a look? |
link #14811 as both q20 and q4 have |
https://buildkite.com/risingwave-test/tpch-benchmark/builds/991 using it seems that L0 looks a lot like tpch q4: #14811 (comment) |
Analyzed the information given by Grafana
Simple conclusion: task pending due to io timeout. Our default 8-minute timeout had a big impact on this short test. |
The other one, Feb-11, seems to be a success without this timeout error: The throughput is a bit higher, 3% higher than the one with timeout error I suppose the error does not affect the throughput much |
Since Q20 is a pretty complex query, we try to remove some parts of the query to reveal the true bottleneck. Therefore, we introduced three variants of Q20, please check https://github.com/risingwavelabs/kube-bench/blob/main/manifests/tpch/tpch-modified-sinks.template.yaml#L830-L905 Q20-NO-GREATERQuery:
Plan:
q20-no-greater is much much better than q20 on both systems. The improvement is more than 4 times for RW. Therefore, we can conclude that this removed part is likely to be the bottleneck. ps_availqty > (
select
0.005 * sum(l_quantity)
from
lineitem
where
l_partkey = ps_partkey
and l_suppkey = ps_suppkey
) Therefore, let’s look at Q20-ONLY-GREATERQuery:
Plan:
It confirms that the removed part, aka The barrier interval does not matter a lot for RW. And we are using the |
suspect that the memory size, aka cache size, is the bottleneck for |
https://buildkite.com/risingwave-test/tpch-benchmark/builds/1009 Sadly we didn't adjust the setting in kube-bench, e.g. set the memory And the effect is huge.
|
Does this confirm that subquery unnesting is the major cause? |
q20-only-greater metabase: when the cache starts to evict aggressively, I think it has a very strong correlation. |
This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned. |
See performance numbers at https://www.notion.so/risingwave-labs/TPCH-Performance-Numbers-Table-e098ef82884546949333409f0513ada7?pvs=4#8de0bf4bda51444c8381f3b0c10ddfe1
The text was updated successfully, but these errors were encountered: