Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cdc): only allow ADD and DROP in auto schema change #18245

Merged
merged 9 commits into from
Aug 28, 2024

Conversation

StrikeW
Copy link
Contributor

@StrikeW StrikeW commented Aug 26, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

  • Since we only support ADD and DROP column, we check whether the new columns and the original columns is a subset of the other in Meta
  • Add more test

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@graphite-app graphite-app bot requested a review from a team August 27, 2024 06:41
@StrikeW StrikeW changed the title fix(cdc): reject RENAME in auto schema change (WIP) fix(cdc): reject RENAME in auto schema change Aug 27, 2024
@StrikeW StrikeW linked an issue Aug 27, 2024 that may be closed by this pull request
6 tasks
@StrikeW StrikeW requested review from hzxa21 and BugenZhao August 27, 2024 08:23
@StrikeW StrikeW changed the title fix(cdc): reject RENAME in auto schema change fix(cdc): only allow ADD and DROP in auto schema change Aug 27, 2024
@@ -1123,6 +1124,7 @@ async fn derive_schema_for_cdc_table(
constraints: &Vec<TableConstraint>,
connect_properties: WithOptionsSecResolved,
need_auto_schema_map: bool,
original_catalog: Option<Arc<TableCatalog>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comments for this parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 177 to 181
let allowed_ddl = ["ADD COLUMN", "DROP COLUMN"];
let upper_upstream_ddl = upstream_ddl.to_uppercase();
let is_allowed = allowed_ddl
.iter()
.any(|&allowed_ddl| upper_upstream_ddl.contains(allowed_ddl));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fragile. Shall we instead compare the previous and current schema to check whether one is the subset of the other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare the previous and current schema to check whether one is the subset of the other?

Not quite get it. Here the goal is to skip upstream messages generated by unsupported ALTER statements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we only support ADD and DROP, in which cases we must have...

the previous and current schema to check whether one is the subset of the other

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds ok to me, based on we only support ADD and DROP.

Comment on lines 177 to 181
let allowed_ddl = ["ADD COLUMN", "DROP COLUMN"];
let upper_upstream_ddl = upstream_ddl.to_uppercase();
let is_allowed = allowed_ddl
.iter()
.any(|&allowed_ddl| upper_upstream_ddl.contains(allowed_ddl));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fragile.

+1.
What if there are both ALTER and ADD/DROP COLUMN statements in upstream_ddl? From debezium's doc: "The ddl field can contain multiple DDL statements."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ddl field can contain multiple DDL statements.

I am aware of this statement, but I cannot construct this case even with transaction. And it seems mysql doesn't support DDL transactions. So actually the parser assumes each message only contains a single DDL.

@StrikeW StrikeW requested review from BugenZhao and hzxa21 August 28, 2024 09:37
Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@StrikeW StrikeW enabled auto-merge August 28, 2024 09:52
@StrikeW StrikeW added this pull request to the merge queue Aug 28, 2024
Merged via the queue into main with commit 0239e22 Aug 28, 2024
36 of 37 checks passed
@StrikeW StrikeW deleted the siyuan/auto-schema-test branch August 28, 2024 10:30
github-merge-queue bot pushed a commit that referenced this pull request Aug 28, 2024
Comment on lines +984 to +985
if !(original_column_names.is_subset(&new_column_names)
|| original_column_names.is_superset(&new_column_names))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realize that comparing the name sets is not sufficient, as users may alter the data type of a column, or even set a new default value. Those functionalities are not supported by us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. So there is one more check in below.

                // skip the schema change if there is no change to original columns
                if original_column_names == new_column_names {
                    continue;
                }

Copy link
Member

@BugenZhao BugenZhao Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if

  1. alter column type, we skip since there's no change on the names
  2. then, add column, we do schema change.

Will we also accidentally apply the changes happened in step 1?

Copy link
Contributor Author

@StrikeW StrikeW Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we also accidentally apply the changes happened in step 1?

Yes, that would be a problem. Since we apply the latest version column schema from upstream. Maybe we should also check column type besides the names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for point out the issue, I submit the patch in
f112e6d. (#18322)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tracking: automatically schema change for cdc table
3 participants