Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Seems like deterministic test will occasionally panic when broken fifo_channel error happens #4788

Closed
BowenXiao1999 opened this issue Aug 22, 2022 · 4 comments · Fixed by #4901
Assignees
Labels
type/bug Something isn't working

Comments

@BowenXiao1999
Copy link
Contributor

BowenXiao1999 commented Aug 22, 2022

Describe the bug
Known issue: Our batch engine can not propate errors between different executor now, see #3908 . Now we just throw broken channel error up.

However, for deterministic e2e test, it seems like sometimes we will crash/panic on some node and lead to test failed. I guess madsim somehow process this kind of error wrongly and lead to the crash?

The broken error should init from here:
https://github.com/singularity-data/risingwave/blob/b64d8294c091413fa517eb70a0659ae3c6bcd3d0/src/batch/src/task/fifo_channel.rs#L47-L59

To Reproduce
create a slt file in e2e_test/batch/, (My is named b2.slt) content

statement ok
create table t(v int);

statement ok
insert into t values(0);

statement ok
flush;

statement error
select (1/v) from t;

statement ok
drop table t;

For madsim:
MADSIM_TEST_NUM=100 ./risedev sslt -- "e2e_test/batch/b2.slt"
(Did not provide a constant madsim seed here cuz I can not reproduce with that seed...)
If everything goes right, you should get

2022-11-27T20:26:07.389264Z ERROR node{id=3 name="frontend-2"}:task{id=27296}: risingwave_frontend::session: failed to handle sql:
select (1/v) from t;:
Scheduler error: Task fail
2022-11-27T20:26:07.389264Z ERROR node{id=3 name="frontend-2"}:task{id=27296}: pgwire::pg_protocol: Error: QueryError: Scheduler error: Task fail
2022-11-27T20:26:07.404476Z ERROR node{id=5 name="compute-2"}:task{id=27677}: risingwave_batch::task::task_execution: Execution failed [TaskId { task_id: 0, stage_id: 0, query_id: "fad59924-fbff-41a4-9e2a-cd3f5dbe5c8c" }]: Invalid Parameter Value: Division by zero
disabled backtrace
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: ConnectionReset, error: "connection reset" }', /Users/bowen/.cargo/registry/src/github.com-1ecc6299db9ec823/madsim-tonic-0.2.1/src/transport/server.rs:249:50
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
note: run with `MADSIM_TEST_SEED=1661163288` environment variable to reproduce this error
      and make sure `MADSIM_CONFIG_HASH=C2F7CAF3EA64B636`
[cargo-make] ERROR - Unable to execute script.
[cargo-make] WARN - Build Failed.

For playground:
./risedev p
./risedev slt -p 4566 -d dev './e2e_test/batch/b2.slt'

Expected behavior
Expect deterministic e2e test do not panic and succeed. (Or there is some bug in our current server...)

Additional context
Add any other context about the problem here.

@BowenXiao1999 BowenXiao1999 added the type/bug Something isn't working label Aug 22, 2022
@BowenXiao1999 BowenXiao1999 changed the title Seems like deterministic test will occasionally panic when broken fifo_channel error happens bug: Seems like deterministic test will occasionally panic when broken fifo_channel error happens Aug 22, 2022
@BowenXiao1999
Copy link
Contributor Author

BowenXiao1999 commented Aug 22, 2022

cc @KveinAxel @wangrunji0408, may I have your confirmation? thanks a lot!

@liurenjie1024
Copy link
Contributor

I think backtrack is disabled.

@BowenXiao1999
Copy link
Contributor Author

I found that I can not reproduce this bug anymore.. I will think about constructing another example

@wangrunji0408
Copy link
Contributor

Based on the panic location, it seems like a bug caused by madsim. This will happen when a tonic client drops a response stream, but the server is still trying to send a response.

I also failed to reproduce this error in risingwave. But I'll have a try to construct a minimal example in madsim.
Thanks for your catch! 🥰

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants