Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(stream): refactor trait Executor to get rid of info() #15167

Merged
merged 18 commits into from
Feb 26, 2024

Conversation

stdrc
Copy link
Member

@stdrc stdrc commented Feb 21, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

As part of the effort to ensure correctness of executor info (#12459), this PR refactors the Executor trait to make executor implementations not aware of ExecutorInfo.

Main changes happen in src/stream/src/executor/mod.rs:

  1. Remove info() schema() pk_indices() identity() methods from Executor trait;
  2. Rename Executor trait to Execute since its only method is execute;
  3. Remove unused info fields from executors (except HashAgg, HashJoin, SimpleAgg, Sink, TemporalJoin, for not making this PR too big);
  4. Introduce Executor struct to combine ExecutorInfo and Box<dyn Execute>, so that (almost) everywhere we used Box<dyn Executor> before, we use Executor now.

Other files are just changed accordingly.

Why?

You may ask why change so. As you may find, most of our Executor implementations contain an info: ExecutorInfo field, but never access it. It's only used when impl Executor, returned in info() method. In most cases, an executor's info is only accessed by its downstream, if the downstream needs that (mostly to get the pk_indices as some input pk). Only a few executors need to access itself's schema to create output array builders.

This situation has two problems:

  1. Many duplicated lines of code are wasted: the info field, the method implementations in impl Executor and the incorrect but insignificant executor info in executor unit tests;
  2. Executors may accidentally construct an info which is inconsistent with what frontend derived (this did happen before bug(stream): pk indices consistency between optimizer and executor #12310 and fix(stream): ensure correct executor schema #13292).

By moving the info field out of Executor trait (now Execute), we can make it easier to directly use ExecutorParams's info as the executor info and make executor implementations clean and neat.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Copy link

gitguardian bot commented Feb 21, 2024

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
9425213 Triggered Generic Password 9f895fc ci/scripts/regress-test.sh View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Our GitHub checks need improvements? Share your feedbacks!

@wangrunji0408
Copy link
Contributor

Can we drop trait Executor completely and use BoxedMessageStream instead? 🤔

@stdrc
Copy link
Member Author

stdrc commented Feb 21, 2024

Can we drop trait Executor completely and use BoxedMessageStream instead? 🤔

Maybe possible, at least we can Box<dyn FnOnce() -> BoxedMessageStream>. But I'm not sure it can simplify or complicate the current situation.

@stdrc stdrc marked this pull request as ready for review February 23, 2024 05:27
Signed-off-by: Richard Chien <[email protected]>

fn identity(&self) -> &str {
&self.identity
}
}

#[tokio::test]
async fn test_cdc_backfill() -> StreamResult<()> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StrikeW Please help check if this test is as expected after change, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirmed. I will add assert in the test later.

Comment on lines +148 to +156
let mut info = params.info.clone();
info.identity = format!("{} (flow controlled)", info.identity);

let exec = FlowControlExecutor::new(
(info, exec).into(),
params.actor_context,
node.rate_limit.map(|x| x as _),
)
.boxed())
);
Ok((params.info, exec).into())
Copy link
Member Author

@stdrc stdrc Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems if we don't wrap FlowControlExecutor directly in new_boxed_executor but to let it have a dedicated from_proto, we can make new_boxed_executor return Box<dyn Execute> instead of Executor, further reducing the possibility to return an inconsistent info.

let checked = epoch_check(source.info().into(), source.boxed().execute());
let (_, source) = MockSource::channel();
let source = source
.stop_on_finish(false)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why we cannot directly return Executor from MockSource::channel()

Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!

src/stream/src/executor/mod.rs Show resolved Hide resolved

fn identity(&self) -> &str {
&self.identity
}
}

#[tokio::test]
async fn test_cdc_backfill() -> StreamResult<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirmed. I will add assert in the test later.

Signed-off-by: Richard Chien <[email protected]>
@stdrc stdrc enabled auto-merge February 26, 2024 06:58
@stdrc stdrc added this pull request to the merge queue Feb 26, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 26, 2024
@stdrc stdrc added this pull request to the merge queue Feb 26, 2024
Merged via the queue into main with commit ead3e53 Feb 26, 2024
25 of 27 checks passed
@stdrc stdrc deleted the rc/refactor-executor-trait branch February 26, 2024 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants