-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: nexmark q0 #8712
Comments
Please append to screenshots if you find something interesting in the flamegraph. |
https://github.com/risingwavelabs/risingwave/blob/main/src/common/src/array/mod.rs#L500-L514
|
let's save the Edit: |
https://github.com/risingwavelabs/risingwave/blob/main/src/connector/src/parser/json_parser.rs#L122 done Edit: |
https://github.com/risingwavelabs/risingwave/blob/main/src/connector/src/parser/mod.rs#L200
Edit: |
I feel we need to find a good strategy to decide when to In this case, as the Edit: |
Not sure I understand why |
https://github.com/risingwavelabs/risingwave/blob/main/src/connector/src/sink/remote.rs#L276-L294 Kind of depending on the output format requirement if it asks for JSON, then we iterate through each row of the chunk, so we can choose not to compact. Since this is a blackhole sink, we can save the compaction, you are right! So whether to compact becomes something to determine at the stage of optimization? |
Hmm even in risingwave/src/connector/src/sink/remote.rs Lines 290 to 294 in 96aa23d
|
I think so, but this |
But typically will any system consume |
I guess no, so we can save the final compact if it is connected to the sink/MV executor |
For this particular query, we need to be cautious because of this two The project below actually compacts in this case, so
is a wrong statement here, which is a correct one if we fixed this particular bug first. |
Makes sense to me. |
Linking it: #8577 |
Which means we actually do no need to have ok, not much overhead though from flamegraph |
Guess |
done in #8758
I am not sure if we can sink the chunk into a system with arrow format 🤔
I think it can not help the compact performance issue because it will be compact in any project. If we have a plan |
I think the idea I have is slightly different, it's more to avoid compact even when there's filter, as in the case of q0. When sinking, we don't need to build new chunk, instead we build protobuf / json encoded chunk. We can delay the top-most compact call until here, saving cost of building a new chunk, and relying on protobuf / json building step to remove invisible rows. @st1page 's approach is still needed, as a general optimization for when to q0's compact call can be optimized via this complementary approach. To implement this as a optimization requires a bit of refactoring to add |
strongly +1.
I think is not a plan-level attribute. Currently, we do compact the input chunk just to simplify the executor's implementation. But in fact, every executor should handle the visibility properly. e.g.
The question here is that we give the chunk's visibility to SinkExecutor which means it has the chance to do the optimization but it does not. So we need to do optimization in the SinkExecutor. |
The peak throughput gets to 1M rows/s. I guess the left two things are:
|
link #14815 |
Background
In recent benchmark Flink had average throughput of 1M r/s. RW had average throughput of 850K r/s. Requires a 17% improvement to match
Flink
. Thanks to @huangjw806 for spotting this.Flamegraph can be found here, under
Artifacts
.query
plan
Here are screenshots of the flamegraph, highlighting cost centers.
The text was updated successfully, but these errors were encountered: