-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: recovery test for sources #16356
Comments
Agree! Besides e2e unit tests, I also would like to have fuzzing similar to that of e2e deterministic recovery test. |
We can use inline system commands like Test Variables:
|
I've managed to using the |
To do this, |
IIRC, the recovery process just happens in meta inner, any connection to the fe will not be influenced. |
In addition to the exactly-once semantic guarantee, it is also meaningful to simply test whether re-creating the subscription to the upstream system after recovery will cause problems, such as the issue in #17112 |
In our previous tests, we did not thoroughly validate whether each source connector could resume consumption from any specific record position while ensuring exactly-once processing. The primary challenges in achieving this were the need to restart clusters within the CI environment to perform recovery operations and the inability to control the consumption rate to manage progress.
The absence of specific tests led to unintentional breaches of the exactly-once semantics when building new sources. This has been a critical issue, especially since we identified that the fs source connector could duplicate reading the current message during recovery, thanks to @stdrc . By implementing these tests, we aim to strengthen the guarantees around exactly-once semantics across all connectors.
We now support triggering recovery via SQL commands (#16259), controlling read speeds with rate limits (#15948), and supporting truncation at any position for chunks. Our new support for bash commands in the slt (#12451 (comment)) makes it easier to control external components. The inclusion of 'key as' syntax allows clear marking of each message's offset to check for overlaps (#13707).
To test recovery, we can use a smaller dataset, for example, 20 messages in Kafka, setting stream parallelism to 1 and streaming_rate_limit to 1. We will trigger recovery at any time between 0-20 seconds to ensure that all 20 messages are read correctly and to verify that there are no duplicates in the offsets. The same test applies to the fs source; if a data line is read twice, the recorded offset will be the same.
This test will help ensure that all our connectors meet the exactly-once requirements, safeguarding the integrity of our data processing systems.
The text was updated successfully, but these errors were encountered: