Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stream): merge stream chunks at MergeExecutor #17968

Merged
merged 14 commits into from
Oct 28, 2024

Conversation

fuyufjh
Copy link
Member

@fuyufjh fuyufjh commented Aug 8, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Resolves #17824 (comment). Please see the background there.

Related work #17967, but the approach of this PR avoids additional pending time. This is because the BufferChunks will build and return the ready chunks immediately when the inner stream returns Poll::Pending.

Benchmark result

Overall, it's a bit better than before.

image

I will run a longevity test as well before merging it.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@fuyufjh fuyufjh requested review from stdrc, lmatz, chenzl25 and wenym1 August 8, 2024 06:33
@fuyufjh fuyufjh changed the title Eric/try buffering chunks on merge executor feat(stream): merge stream chunks at MergeExecutor Aug 8, 2024
@fuyufjh fuyufjh requested review from st1page and removed request for wenym1 August 8, 2024 06:35
} else {
chunk
};
// let chunk = if chunk.selectivity() <= self.materialize_selectivity_threshold {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat a workaround before. As we now compact all data after Exchange, the problem should be mostly resolved.

Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM.

Curious how to export the benchmark comparison result to the table and bar graph? It will be helpful to test any PR that can be significant to performance.

src/stream/src/executor/merge.rs Show resolved Hide resolved
Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Should we monitor the chunk size after BufferChunks?

src/stream/src/executor/merge.rs Outdated Show resolved Hide resolved
Copy link
Member

@stdrc stdrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

src/stream/src/executor/merge.rs Outdated Show resolved Hide resolved
src/stream/src/executor/merge.rs Outdated Show resolved Hide resolved
/// A wrapper that buffers the `StreamChunk`s from upstream until no more ready items are available.
/// Besides, any message other than `StreamChunk` will trigger the buffered `StreamChunk`s
/// to be emitted immediately, as well as the message itself.
struct BufferChunks<S: Stream> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe we can call this BufferedChunkReader since it's very like tokio::io::BufReader and java.io.BufferedReader.

@fuyufjh
Copy link
Member Author

fuyufjh commented Aug 8, 2024

Curious how to export the benchmark comparison result to the table and bar graph? It will be helpful to test any PR that can be significant to performance.

I exported the result from Metabase RW Compare to Excel to calculate the geomean.

@fuyufjh
Copy link
Member Author

fuyufjh commented Aug 8, 2024

LGTM! Should we monitor the chunk size after BufferChunks?

I'd like to take a look for this PR, but generally I don't think we need it in the future.

Copy link
Contributor

This PR has been open for 60 days with no activity.

If it's blocked by code review, feel free to ping a reviewer or ask someone else to review it.

If you think it is still relevant today, and have time to work on it in the near future, you can comment to update the status, or just manually remove the no-pr-activity label.

You can also confidently close this PR to keep our backlog clean. (If no further action taken, the PR will be automatically closed after 7 days. Sorry! 🙏)
Don't worry if you think the PR is still valuable to continue in the future.
It's searchable and can be reopened when it's time. 😄

Copy link
Contributor

Close this PR as there's no further actions taken after it is marked as stale for 7 days. Sorry! 🙏

You can reopen it when you have time to continue working on it.

@github-actions github-actions bot closed this Oct 21, 2024
@fuyufjh
Copy link
Member Author

fuyufjh commented Oct 21, 2024

Well... Let me merge this today, as we just release 2.1

Copy link

gitguardian bot commented Oct 25, 2024

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
9425213 Triggered Generic Password cb84263 ci/scripts/e2e-source-test.sh View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@fuyufjh fuyufjh added this pull request to the merge queue Oct 28, 2024
Merged via the queue into main with commit e99ad67 Oct 28, 2024
28 of 30 checks passed
@fuyufjh fuyufjh deleted the eric/try_buffering_chunks_on_merge_executor branch October 28, 2024 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: merge small chunks for sink executor
5 participants