Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(frontend): separate cdc table scan from stream table scan #13332

Merged
merged 9 commits into from
Nov 10, 2023

Conversation

kwannoel
Copy link
Contributor

@kwannoel kwannoel commented Nov 9, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Part of #13146

Just separate them on the stream plan layer first. Don't touch the LogicalScan, core, proto definitions first, so we can keep each PR small and separate them step by step.

This PR just copy the whole stream table scan, then:

  1. Remove cdc backfill parts in stream_table_scan.
  2. Remove stream_table_Scan parts in cdc backfill.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@kwannoel
Copy link
Contributor Author

kwannoel commented Nov 9, 2023

The main conflict with #13276 is cdc backfill has its own function to construct its state catalog.

So it should be easy to resolve, I can help with it.

@kwannoel
Copy link
Contributor Author

kwannoel commented Nov 9, 2023

Any specific tests I should run for cdc backfill?

@kwannoel kwannoel marked this pull request as draft November 9, 2023 06:41
@kwannoel kwannoel marked this pull request as ready for review November 9, 2023 06:45
Copy link

codecov bot commented Nov 9, 2023

Codecov Report

Merging #13332 (6164b72) into main (0b9cb1f) will decrease coverage by 0.03%.
Report is 1 commits behind head on main.
The diff coverage is 28.96%.

@@            Coverage Diff             @@
##             main   #13332      +/-   ##
==========================================
- Coverage   67.83%   67.80%   -0.03%     
==========================================
  Files        1525     1526       +1     
  Lines      259575   259782     +207     
==========================================
+ Hits       176072   176143      +71     
- Misses      83503    83639     +136     
Flag Coverage Δ
rust 67.80% <28.96%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/frontend/src/handler/create_table.rs 83.79% <100.00%> (+12.40%) ⬆️
...c/frontend/src/optimizer/plan_node/logical_scan.rs 93.56% <100.00%> (+4.83%) ⬆️
...ntend/src/optimizer/plan_node/stream_table_scan.rs 96.98% <100.00%> (+7.38%) ⬆️
src/frontend/src/optimizer/plan_node/mod.rs 91.33% <66.66%> (-0.20%) ⬇️
src/frontend/src/handler/explain.rs 78.01% <70.00%> (-1.19%) ⬇️
...d/src/optimizer/plan_node/stream_cdc_table_scan.rs 17.30% <17.30%> (ø)

... and 20 files with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

@kwannoel
Copy link
Contributor Author

Hi all PTAL

@kwannoel
Copy link
Contributor Author

kwannoel commented Nov 10, 2023

@fuyufjh @StrikeW @BugenZhao @yezizp2012 those who were involved in the related discussion

Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a plan test to show what it looks like in explain statement?

@BugenZhao
Copy link
Member

This PR just copy the whole stream table scan, then:

  1. Remove cdc backfill parts in stream_table_scan.
  2. Remove stream_table_Scan parts in cdc backfill.

Can we extract some common methods? Or the maintainability is still not that good IMO. 😕

Copy link
Contributor

@StrikeW StrikeW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest lgtm

Any specific tests I should run for cdc backfill?

There is a cdc.share_stream.slt e2e test, you can run it locally with ./risedev slt @kwannoel

@kwannoel
Copy link
Contributor Author

kwannoel commented Nov 10, 2023

This PR just copy the whole stream table scan, then:

  1. Remove cdc backfill parts in stream_table_scan.
  2. Remove stream_table_Scan parts in cdc backfill.

Can we extract some common methods? Or the maintainability is still not that good IMO. 😕

Won't do too much refactor first. Because #13276 is still in progress and I don't want to introduce too much conflicts. I have made a note of it in the parent issue for this PR.

Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

.into(),
"Remove the FORMAT and ENCODE specification".into(),
)
.into()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to this file are bug fix for explain of cdc. Previously this logic is only located in handle_create_table.

This logic can be further unified with that in handle_create_table.

Further refactoring should:

  1. Provide an options struct to simplify argument passing.
  2. unify the match functionality between both, to avoid this bug from happening again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring will not be done here. It is out of scope for purposes of this PR. Only just fix so we can add a planner test.

@kwannoel
Copy link
Contributor Author

Could you please add a plan test to show what it looks like in explain statement?

done

Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


let node_body =
// don't need batch plan for cdc source
PbNodeBody::StreamScan(StreamScanNode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to use a new fragment / proto message for StreamCdcScanNode. Of course this can be done in the future. cc. @StrikeW

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, we don't need to follow the bad design of StreamScanNode, which has 2 inputs - an empty MergeNode as placeholder and a strange BatchScanNode but not really a BatchScan

As I mentioned in #13242 (comment): This part is shitty but changing it (StreamScanNode) will break compatibility. But here we have the chance to change.

Copy link
Contributor Author

@kwannoel kwannoel Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to use a new fragment / proto message for StreamCdcScanNode. Of course this can be done in the future. cc. @StrikeW

Agree, this is my plan, stated in the PR description:

Just separate them on the stream plan layer first. Don't touch the LogicalScan, core, proto definitions first, so we can keep each PR small and separate them step by step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#13146 (comment)

Added a checklist to track the changes.

@kwannoel kwannoel force-pushed the kwannoel/refactor-scan branch from cef3a8a to 6164b72 Compare November 10, 2023 10:17
@kwannoel kwannoel added this pull request to the merge queue Nov 10, 2023
Merged via the queue into main with commit 287604f Nov 10, 2023
7 of 8 checks passed
@kwannoel kwannoel deleted the kwannoel/refactor-scan branch November 10, 2023 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants