-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: Seems like deterministic test will occasionally panic when broken fifo_channel
error happens
#4788
Comments
broken fifo_channel
error happensbroken fifo_channel
error happens
cc @KveinAxel @wangrunji0408, may I have your confirmation? thanks a lot! |
I think backtrack is disabled. |
I found that I can not reproduce this bug anymore.. I will think about constructing another example |
Based on the panic location, it seems like a bug caused by madsim. This will happen when a tonic client drops a response stream, but the server is still trying to send a response. I also failed to reproduce this error in risingwave. But I'll have a try to construct a minimal example in madsim. |
Describe the bug
Known issue: Our batch engine can not propate errors between different executor now, see #3908 . Now we just throw broken channel error up.
However, for deterministic e2e test, it seems like sometimes we will crash/panic on some node and lead to test failed. I guess madsim somehow process this kind of error wrongly and lead to the crash?
The broken error should init from here:
https://github.com/singularity-data/risingwave/blob/b64d8294c091413fa517eb70a0659ae3c6bcd3d0/src/batch/src/task/fifo_channel.rs#L47-L59
To Reproduce
create a slt file in e2e_test/batch/, (My is named b2.slt) content
For madsim:
MADSIM_TEST_NUM=100 ./risedev sslt -- "e2e_test/batch/b2.slt"
(Did not provide a constant madsim seed here cuz I can not reproduce with that seed...)
If everything goes right, you should get
For playground:
./risedev p
./risedev slt -p 4566 -d dev './e2e_test/batch/b2.slt'
Expected behavior
Expect deterministic e2e test do not panic and succeed. (Or there is some bug in our current server...)
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: