-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: improve tpc-h q4 performance (single-topic) #14811
Comments
RW Query:
Plan:
Dist Plan:
|
Flink Query:
Plan:
|
Surprisingly, Flink does not have a Since the The other important operator is |
Actually,
Wonder if RW also does such optimization? Edit: Just briefly going through the code, it seems RW does not differentiate 1 or N in top-n/group-top-n processing logic? do you know if it is doable? @xxchan |
We do have a Dedup operator. Let me think about possible optimizations. |
From the pr description, its motivation is not for performance. I guess the impact won’t be very large. |
The state is the same, but just changes to use kv interface. |
I see, thanks for the explanation, let me remove the i.e.:
plan:
Flink:
Query Plan:
|
The performance of
at least unable to explain that RW gets outperformed by such a margin let's remove the
plan:
Flink:
Plan:
which means it becomes a direct comparison of |
The two modified ones:
have similar performance as the original q4, and all three are outperformed: so it looks like the problem is |
It's a CPU intensive HashJoin and we have not optimized for this case 🤔 |
link #14797 as it has two if want to generate flamegraph, checkout: https://github.com/risingwavelabs/risingwave-test/tree/main/benchmarks/tpch#generate-and-upload-flamegraph an example: https://buildkite.com/risingwave-test/tpch-benchmark/builds/995#018dcb7c-c057-40a7-bcad-8874ef9282df many L0s The conjecture is that since the barrier interval is by default 1s, Let's try |
So after setting RW to:
Buildkite: https://buildkite.com/risingwave-test/tpch-benchmark/builds/1010 (ignore the title of this pipeline, mistake) The CPU flame graph: https://buildkite.com/organizations/risingwave-test/pipelines/tpch-benchmark/builds/1010/jobs/018df08c-e78f-4aae-89c8-0d7cf40531cd/artifacts/018df0b5-15b7-43e0-8238-f1b5e300a3bb It looks like LSM Tree ShapeIs this considered proper? cc: @Li0k @Little-Wallace Compute Node and Compactor Node CPU usageWhen compaction is trigger, the CPU usage of compute node would drop as it yield some CPU usage to the compactor node. Do you think if we can make the claim above by these two figures? cc: @Li0k @Little-Wallace Aggressive Cache EvictionIt looks like the same problem #15305, which also appeared in TPC-H q20 #14797 (comment) |
From @MrCroxx on Slack:
By comparing the time when the CPU flame graph is taken and the time when cache eviction becomes aggressive:
|
|
Comparing RW and Flink, both barrier/skpt interval set to 10s Flink: RW: One intersting phenonomeon is that both Flink and RW has three stages of throughput:
And based on the network usage (bandwidth from Kafka): What's causing the current gap seems to be the difference at the last stage. And we have two interseting observations:
Since Flink behaves like executing a stateless query, we can look at RW's per-fragment metrics to validate it. We notice that at last stage, only Then the question is why the data is being filtered away. We remark that the tpch data is ingested by the following order: At the last, it is Check out the data generator: https://github.com/risingwavelabs/tpch-bench/blob/master/pkg/data/lineitem.go#L126-L127 It turns out that the generator is always generating data whose However, the data is the same for RW and Flink (we use fixed random seed in data generator). I have no idea if this can be explained by #14815, the |
It is slightly better after #15478, close it for now. We can re-open if needed. |
See performance numbers at https://www.notion.so/risingwave-labs/TPCH-Performance-Numbers-Table-e098ef82884546949333409f0513ada7?pvs=4#8de0bf4bda51444c8381f3b0c10ddfe1
http://metabase.risingwave-cloud.xyz/question/4834-tpch-q4-bs-medium-1cn-affinity-avg-source-output-rows-per-second-rows-s-history-thtb-367?start_date=2023-11-24
http://metabase.risingwave-cloud.xyz/question/5478-flink-tpch-q4-flink-medium-1tm-avg-job-throughput-per-second-records-s-history-thtb-291?start_date=2023-09-07
The experiments are all executed with
nightly-20240127
.The text was updated successfully, but these errors were encountered: