Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: potential loss of data in sink into table #17602

Closed
wenym1 opened this issue Jul 8, 2024 · 1 comment · Fixed by #17651
Closed

bug: potential loss of data in sink into table #17602

wenym1 opened this issue Jul 8, 2024 · 1 comment · Fixed by #17651
Assignees
Labels
type/bug Something isn't working
Milestone

Comments

@wenym1
Copy link
Contributor

wenym1 commented Jul 8, 2024

Describe the bug

Currently, when we handle the SINK INTO TABLE statement, we run two commands atomically: a CreateStreamingJob to create a streaming graph to generate the new input, and a ReplaceTable to recreate the the downstream streaming graph that writes to the table.

When handling ReplaceTable, we drop all the actors that write the downstream table, and recreate new ones to accept the new input. However, after dropping the original actors, we don't wait for the uncommitted data to be committed. Since the data are uncommitted, it is not visible to the newly created actors. Therefore, when the new actors try to read keys that has uncommitted data, it won't read the correct one, which may cause data inconsistency. On the other hand, for ReplaceTable command that handles ALTER TABLE statement, it will treat it as a configuration, and will wait for all data to be committed before dropping and recreating the actors, and therefore won't cause bug.

Potential solution:

  1. When handling SINK INTO TABLE statement, treat it as configuration change and run pause/resume command before and after the command.
  2. Do not run ReplaceTable when handling SINK INTO TABLE. Instead, modify the inputs of downstream streaming graph on the flight. We can leverage the mutation information to ensure that the inputs configuration is changed at the boundary of barrier.

Error message/log

Found by reading code.

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

No response

Additional context

No response

@wenym1 wenym1 added the type/bug Something isn't working label Jul 8, 2024
@github-actions github-actions bot added this to the release-1.10 milestone Jul 8, 2024
@wenym1
Copy link
Contributor Author

wenym1 commented Jul 8, 2024

cc @shanicky

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants