-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(source): create source stream with retry during recovery #19805
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may remove the workaround here to get it tested.
risingwave/e2e_test/source_inline/kafka/issue_19563.slt.serial
Lines 49 to 51 in 1e24ca4
# XXX: wait until source reader is ready. then produce data. | |
# This is a temporary workaround for a data loss bug https://github.com/risingwavelabs/risingwave/issues/19681#issuecomment-2532183002 | |
sleep 2s |
BTW, is there any test case covered the CDC sync issue #19681?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM
I am testing #19681 with the test pipeline https://buildkite.com/risingwave-test/ch-benchmark-pg-cdc-shared-source |
Test passed https://buildkite.com/risingwave-test/ch-benchmark-pg-cdc-shared-source/builds/133 |
LOG.info( | ||
"creating cdc source job: publication '{}' doesn't exist, creating...", | ||
pubName); | ||
DbzSourceUtils.createPostgresPublicationInValidate(userProps); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to create the publication in meta instead of doing it in the source reader? I thought the CREATE SOURCE logic is unchanged (block the 1st barrier until source reader creation succeeds) before and after this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see in the source_executor.rs
, we forward the initial barrier first before we create the source reader. IIRC, after Meta collected the initial barrier the CREATE SOURCE
will return success, and then following CREATE TABLE
command can be issued to Meta. So during validation of CREATE TABLE
, the pg publication may not have been created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this race exist for a long time ago, not the new problem introduced by this PR.
Will the ch-benchmark test the case when upstream is unavailable? If not, I think we still in lack of a test case for that. |
Link to the context: #19681 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Previously we reverted the retry for source executor #19754. This PR we re-introduce the retry mechanism but only during recovery process. Because when source job create in the first time, it should fail if cannot reach the upstream system.
And we merge the patch #19714 into this PR
passed main-cron https://buildkite.com/risingwavelabs/main-cron/builds/4092
close #19681
Checklist
Documentation
Release note