Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cdc): introduce with option to configure cdc snapshot #16426

Merged
merged 24 commits into from
Apr 28, 2024

Conversation

StrikeW
Copy link
Contributor

@StrikeW StrikeW commented Apr 22, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Introduce snapshot.interval and snapshot.batch_size WITH options to create a table from cdc source.

  • snapshot.interval: barrier interval for buffering upstream events
  • snapshot.batch_size: batch size for snapshot read to upstream table

follow up of #16349

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Introduce snapshot.interval and snapshot.batch_size WITH options to create a table from cdc source.

  • snapshot.interval: barrier interval for buffering upstream events, default to 1
  • snapshot.batch_size: batch size for snapshot read to upstream table, default to 1000

@StrikeW StrikeW added the user-facing-changes Contains changes that are visible to users label Apr 22, 2024
@StrikeW StrikeW changed the title refactor(cdc): introduce with option to configure cdc snapshot feat(cdc): introduce with option to configure cdc snapshot Apr 22, 2024
@StrikeW StrikeW force-pushed the siyuan/cdc-snapshot-options branch from 1a11d29 to e13726a Compare April 28, 2024 04:16
@StrikeW StrikeW marked this pull request as ready for review April 28, 2024 04:17
Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest LGTM

proto/stream_plan.proto Outdated Show resolved Hide resolved
@StrikeW StrikeW requested a review from fuyufjh April 28, 2024 08:10
@StrikeW StrikeW added this pull request to the merge queue Apr 28, 2024
Self {
disable_backfill: false,
snapshot_barrier_interval: 1,
snapshot_batch_size: 1000,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we follow the batch_size specified in the config file instead?

pub use upstream_table::external::ExternalStorageTable;

#[derive(Debug, Clone)]
pub struct CdcScanOptions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to be a duplicate of the one defined in the frontend crate with the same name. Can we merge them?

Comment on lines +99 to +102
let options = scan_options.unwrap_or(CdcScanOptions {
disable_backfill,
..Default::default()
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest moving this into from_proto so that we can accept a single field of scan_options: CdcScanOptions in new.

Merged via the queue into main with commit b213243 Apr 28, 2024
33 of 34 checks passed
@StrikeW StrikeW deleted the siyuan/cdc-snapshot-options branch April 28, 2024 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants