You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this experiment, we applied network partition from compute node to meta node. (Didn't say between compute node and meta node because we can specify direction, although it seems to have little thing to do with the following problem).
The duration is 10min.
We triggered the fault around 12:17:02. The partition will exist until 12:27:02.
We can see that while the partition exists, we can execute the select query without a problem.
But when we try to create table t1. The query was stuck for a long time and returned the error message at 12:27:03. This is exactly when the partition experiment finishes.
Although it is expected that the SQL fails, is it reasonable to let the create table t1 stuck for such a long time?
For such a query, I think that we always expect it to finish with single-digit latency. Shall we add a timeout here?
After the partition experiment finished, we retried to create table t1 and it succeeded.
However, after that, when the network works as normal, executing a select query will fail.
This issue has been open for 60 days with no activity.
If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.
You can also confidently close this issue as not planned to keep our backlog clean.
Don't worry if you think the issue is still valuable to continue in the future.
It's searchable and can be reopened when it's time. 😄
Describe the bug
https://buildkite.com/risingwave-test/longevity-chaos-mesh/builds/376#018ca450-a95e-4c3b-953d-68d154850aef
This experiment is implemented by @xuefengze.
In this experiment, we applied network partition from compute node to meta node. (Didn't say between compute node and meta node because we can specify
direction
, although it seems to have little thing to do with the following problem).The duration is 10min.
We triggered the fault around 12:17:02. The partition will exist until 12:27:02.
We can see that while the partition exists, we can execute the
select query
without a problem.But when we try to
create table t1
. The query was stuck for a long time and returned the error message at 12:27:03. This is exactly when the partition experiment finishes.Although it is expected that the SQL fails, is it reasonable to let the
create table t1
stuck for such a long time?For such a query, I think that we always expect it to finish with single-digit latency. Shall we add a timeout here?
After the partition experiment finished, we retried to
create table t1
and it succeeded.However, after that, when the network works as normal, executing a
select query
will fail.Is it due to a similar issue as found in #14030?
Error message/log
No response
To Reproduce
No response
Expected behavior
No response
How did you deploy RisingWave?
No response
The version of RisingWave
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: