Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(streaming): Support in memory backfill executor for mv on mv #6341

Merged
merged 27 commits into from
Nov 21, 2022

Conversation

chenzl25
Copy link
Contributor

@chenzl25 chenzl25 commented Nov 14, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

PLEASE DO NOT LEAVE THIS EMPTY !!!

Please explain IN DETAIL what the changes are in this PR and why they are needed:

  • In this PR, we introduce BackfillExecutor based on the RFC: RFC: Use Backfill To Let Mv On Mv Stream Again rfcs#13
  • Currently, it is a pure in memory executor without any state persistence logic, but we can add it later for recovery.
  • Also we only read committed data from the upstream mv in this PR, but we can support uncommitted read later.
  • We support uncommitted read, because we already have scheduled BackfillExecutor together with the upstream mv.
  • In order to keep the code changed as small as possible, we slightly extend the ChainNode and add a chain_type field to decide which executor will be used to create mv on mv. So at this moment, BackfillExecutor is an implementation of the ChainNode.
  • BTW,BackfillExecutor has a special read pattern which is it will read the storage through a stream, but likely stop the reading and drop the stream without consuming all the data of the stream. This behavior has never met by storage before, so it is also some kind of testing for the storage.
  • For more details, please read the comment of the BackfillExecutor.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

If your pull request contains user-facing changes, please specify the types of the changes, and create a release note. Otherwise, please feel free to remove this section.

Types of user-facing changes

Please keep the types that apply to your changes, and remove those that do not apply.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

Please create a release note for your changes. In the release note, focus on the impact on users, and mention the environment or conditions where the impact may occur.

Refer to a related PR or issue link (optional)

#6275

Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM

src/stream/src/executor/backfill.rs Outdated Show resolved Hide resolved
let executor = RearrangedChainExecutor::new(
)
.boxed(),
ChainType::Rearrange => RearrangedChainExecutor::new(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm: Will ChainExecutor or RearrangedChainExecutor be used in any cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChainExecutor is still used in StreamIndexScan as before, while RearrangedChainExecutor will be replaced by BackfillExecutor.

@BugenZhao
Copy link
Member

Will take a look later. 🥵

@hzxa21 hzxa21 self-requested a review November 14, 2022 06:15
@codecov
Copy link

codecov bot commented Nov 14, 2022

Codecov Report

Merging #6341 (bde4afe) into main (709e1be) will decrease coverage by 0.00%.
The diff coverage is 79.08%.

@@            Coverage Diff             @@
##             main    #6341      +/-   ##
==========================================
- Coverage   73.94%   73.93%   -0.01%     
==========================================
  Files         981      982       +1     
  Lines      159159   159928     +769     
==========================================
+ Hits       117690   118245     +555     
- Misses      41469    41683     +214     
Flag Coverage Δ
rust 73.93% <79.08%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/batch/src/executor/row_seq_scan.rs 20.00% <0.00%> (-0.21%) ⬇️
src/batch/src/lib.rs 100.00% <ø> (ø)
...ntend/src/optimizer/plan_node/stream_index_scan.rs 41.40% <0.00%> (-0.33%) ⬇️
src/storage/src/hummock/event_handler/mod.rs 0.00% <ø> (ø)
src/storage/src/hummock/local_version/mod.rs 100.00% <ø> (ø)
...e/src/hummock/shared_buffer/shared_buffer_batch.rs 89.51% <0.00%> (+0.96%) ⬆️
src/storage/src/hummock/utils.rs 74.87% <0.00%> (-0.62%) ⬇️
src/stream/src/executor/backfill.rs 0.00% <0.00%> (ø)
src/stream/src/executor/mod.rs 48.18% <ø> (ø)
src/stream/src/from_proto/chain.rs 0.00% <0.00%> (ø)
... and 24 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@StrikeW StrikeW self-requested a review November 14, 2022 09:56
Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM. Great job!

src/stream/src/executor/backfill.rs Outdated Show resolved Hide resolved
src/stream/src/executor/backfill.rs Outdated Show resolved Hide resolved
src/stream/src/executor/backfill.rs Outdated Show resolved Hide resolved
src/stream/src/executor/backfill.rs Outdated Show resolved Hide resolved
src/stream/src/executor/backfill.rs Outdated Show resolved Hide resolved
src/stream/src/executor/backfill.rs Outdated Show resolved Hide resolved
src/stream/src/executor/backfill.rs Outdated Show resolved Hide resolved
src/stream/src/from_proto/chain.rs Show resolved Hide resolved
value_indices,
);

BackfillExecutor::new(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we expected the BatchQueryExecutor to be more general to support some simple pruning, but it seems to be unused by the Backfill. I'm not sure what the future plan is, should we keep that, or totally remove that and try something like what we did in BatchLookupJoin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think currently BatchQueryExecutor is only used by the StreamIndexScan. I haven't figured out why it use ChainExecutor instead of RearrangedChainExecutor before. Once we can replace all of them by BackfillExecutor I think we don't need the BatchQueryExecutor . However, once we remove the BatchQuery from the ChainNode, it seems like ChainNode is not a proper name.

@chenzl25 chenzl25 requested a review from BugenZhao November 21, 2022 03:53
Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants