Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cdc): support alter cdc table schema #17334

Merged
merged 19 commits into from
Jul 7, 2024
Merged

Conversation

StrikeW
Copy link
Contributor

@StrikeW StrikeW commented Jun 19, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Support ALTER ADD/DROP COLUMN for cdc table based on shared source

close #16287

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@StrikeW StrikeW changed the title feat(cdc): support alter cdc table schema feat(cdc): support alter cdc table schema (WIP) Jun 19, 2024
Copy link

gitguardian bot commented Jun 25, 2024

⚠️ GitGuardian has uncovered 4 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
9425213 Triggered Generic Password 44bebe4 ci/scripts/e2e-source-test.sh View secret
9425213 Triggered Generic Password 44bebe4 ci/scripts/e2e-source-test.sh View secret
9425213 Triggered Generic Password 4b55d07 ci/scripts/e2e-source-test.sh View secret
9425213 Triggered Generic Password 4b55d07 ci/scripts/e2e-source-test.sh View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@StrikeW StrikeW force-pushed the siyuan/cdc-alter-table branch from 6ce9713 to 44bebe4 Compare June 25, 2024 07:20
@StrikeW StrikeW force-pushed the siyuan/cdc-alter-table branch from 9f0fee7 to 4b55d07 Compare June 25, 2024 08:15
@StrikeW StrikeW changed the title feat(cdc): support alter cdc table schema (WIP) feat(cdc): support alter cdc table schema Jun 25, 2024
@StrikeW StrikeW marked this pull request as ready for review June 25, 2024 15:10
@BugenZhao BugenZhao requested a review from xxchan June 26, 2024 05:30
e2e_test/source/cdc_inline/alter/cdc_table_alter.slt Outdated Show resolved Hide resolved
Comment on lines 46 to 51
hostname = '${MYSQL_HOST:localhost}',
port = '${MYSQL_TCP_PORT:8306}',
username = 'root',
password = '123456',
database.name = 'testdb1',
server.id = '5185'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use ${RISEDEV_MYSQL_WITH_OPTIONS_COMMON}?


Seems that similar comments have been made in #16986 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it but it doesn't work when I run test locally with risedev slt. It seems we have to launch the mysql via risedev to let the RISEDEV_MYSQL_WITH_OPTIONS_COMMON work, but I use docker compose to launch mysql locally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we have to launch the mysql via risedev to let the RISEDEV_MYSQL_WITH_OPTIONS_COMMON work, but I use docker compose to launch mysql locally.

This is true. However, I would still recommend launching MySQL with RiseDev as it makes it easier for other developers to test locally without needing to follow any other conventions.

Copy link
Member

@xxchan xxchan Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I use docker compose to launch mysql locally.

You CAN still use your own setup, by using user-managed.

  - use: mysql
    port: 3306
    user-managed: true
    # - In user-managed mode, these configs are not validated by risedev.
    #   They are passed as-is to risedev-env default user for MySQL operations.
    # - This is not used in RISEDEV_MYSQL_WITH_OPTIONS_COMMON.
    user: root
    password: ""
    database: "risedev"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you don't use the mysql started by risedev? Have you tried it and met any problems?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can stop and start one or more db services with clicks, no need to edit the yaml.

You still uses risedev to start RisingWave. So relying on risedev can also save the click.. Why not?


Anyway, I encourage you to just give it a try. I believe basically no effort is needed to change the workflow and the DX improvement is gained for free.

If it doesn't work well, please share some feedbacks about what is not good enough, so that we can keep improving the DX for all developers.

If the reasons above are still not convincing enough to you, please also tell me why, so that I can understand it better.

Finally, if you just don't want to try it, it's also OK, but please use the suggested style for the benefits of others. For it to work in your machine, I think it's just some additional env vars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to manually set MYSQL_ env vars manually.
When you change the configs, the env vars are also updated automatically.

Thanks for the clarification. I got your idea.
I find that the mysql -e can read the env vars in the risedev-env file, but psql -c cannot. I still need to set those $PGxxx env vars in my shell to make the psql -c work locally. I checked that postgres related env vars has written into the risedev-env file. Do you have any idea? @xxchan
Here is the config I used:

  default:
    steps:
      - use: meta-node
      - use: compute-node
      - use: frontend
      - use: postgres
        port: 8432
        user-managed: true
        application: "connector"
        user: myuser
        password: "123456"
        database: "mydb"
      - use: mysql
        port: 8306
        user-managed: true
        user: root
        password: "123456"
        database: "mydb"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing this comment.

Just to double confirm: Did you run with risedev slt? And what's the errors you found?

Perhaps you can try this first:

system ok
echo $PGPORT
----
placeholder

Then running risedev slt will return errors "result mismatch", and shows what's the output of the echo command, i.e., the value of PGPORT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing this comment.

Wait, why this reply is "2 days ago", but I really just received it just now. 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing this comment.

Wait, why this reply is "2 days ago", but I really just received it just now. 😂

Maybe due to I forget to submit the pending comment. 😄

src/frontend/src/handler/alter_table_column.rs Outdated Show resolved Hide resolved
Comment on lines 180 to 181
// Add the new column to the table definition if it is not created by `create table (*)` syntax.
if !columns.is_empty() {
Copy link
Member

@BugenZhao BugenZhao Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look good to me. 😕

  • I remember that we had a discussion long ago (frontend: refactor source schema resolution #9828) that we should purify the definition of a table/source before persisting it into the catalog. In this case, for example, * is completely a syntax-sugar and should be first expanded to column definitions. cc @st1page
  • Considering empty columns as created with * is also a fragile assumption. Users are allowed to create a table without columns like CREATE TABLE t;.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for forgetting this point during my previous review of the #16986.
I think we should expand on CREATE TABLE (*). This is because operations such as CREATE SINK INTO TABLE, ALTER TABLE RENAME, and others rely on reparsing the SQL statement, and the parsed columns should remain consistent with the previous plan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cdc table was the first to meet this problem, because in other scenarios with a schema registry, we simply prevented users from doing things like add columns.
https://github.com/risingwavelabs/risingwave/blob/main/src/frontend/src/handler/alter_table_column.rs#L117
https://github.com/risingwavelabs/risingwave/blob/main/src/frontend/src/handler/alter_source_column.rs#L73

And the only way to change their schema was to refresh the schema registry using alter table, which implied the assumption that their which implies the assumption that their schema needs to always be aligned upstream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before finding a elegent way to support that. We can tentatively assume that the cdc table of create table(*) is similar to the table with connector of the schema registry. Let's temporarily prevent users from using add columns on such a table?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this PR we can prevent users from adding/drop columns on a table with wildcard.
And here is the future design #17472

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this PR we can prevent users from adding/drop columns on a table with wildcard.

ok, we can ban ALTER for tables created by create table(*) in this pr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, we can ban ALTER for tables created by create table(*) in this pr.

Does this also mean that we no longer need let mut columns_altered = original_catalog.columns.clone();?

@BugenZhao BugenZhao requested a review from st1page June 26, 2024 06:29
@cyliu0
Copy link
Collaborator

cyliu0 commented Jul 2, 2024

/test

@risingwave-ci
Copy link

risingwave-ci commented Jul 2, 2024

@StrikeW StrikeW requested review from BugenZhao and xxchan July 4, 2024 00:30
Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code impl generally LGTM

@@ -1971,7 +2000,6 @@ impl DdlController {
} = actor_graph_builder
.generate_graph(&self.env, stream_job, expr_context)
.await?;
assert!(dispatchers.is_empty());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still keep this assertion depending on the table job type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

src/meta/src/manager/catalog/mod.rs Show resolved Hide resolved
src/frontend/src/handler/create_table.rs Outdated Show resolved Hide resolved
@StrikeW StrikeW requested a review from BugenZhao July 5, 2024 03:50
Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM!

e2e_test/source/cdc_inline/alter/cdc_table_alter.slt Outdated Show resolved Hide resolved
@StrikeW StrikeW enabled auto-merge July 7, 2024 10:02
@StrikeW StrikeW added this pull request to the merge queue Jul 7, 2024
Merged via the queue into main with commit 739d005 Jul 7, 2024
31 of 33 checks passed
@StrikeW StrikeW deleted the siyuan/cdc-alter-table branch July 7, 2024 11:01
@xxchan xxchan mentioned this pull request Aug 6, 2024
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: altering table column failed when table is built upon a shared cdc source
7 participants