-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Daily Chaos Mesh with longevity pipeline failed #15677
Comments
It seems the panic originates from storage: https://grafana.test.risingwave-cloud.xyz/d/liz0yRCZz1/log-search-dashboard?from=1710344178000&orgId=1&to=1710352236000&var-data_source=PE59595AED52CF917&var-namespace=longcmkf-20240313-153221&var-pod=benchmark-risingwave-compute-c-1&var-search=
|
May be related to the race fixed in #15738. Let's see whether the same panic still occurs in recent runs. |
Also, I would like to understand more about the test case. When the issue happened, we were running
I think this means we will randomly create network partition between one CN and meta. When network partition happens, do we expect DDL like |
Yes, I think MV creation happened after the network partition was resolved.
If DDL happens during a network partition, I believe DDL should not succeed as no communication can happen between the meta node and CN. |
Yes, the partition began at 17:36:03 UTC, and the duration is 600s, so it had already ended at 17:46:03 UTC. |
Thanks for the explanation. That makes sense to me. The storage panic happens when there are more than one storage table instances relevant to the same vnode id, which indicates there is a race somewhere. Let's see whether #15738 fixes it. Just FYI, I noticed that when the network partition is ongoing, the partitioned CN exits by itself (and restarts by k8s I guess) because meta has expired the worker node. Relevant codes: risingwave/src/rpc_client/src/meta_client.rs Line 294 in 13e2d9a
|
@xuefengze Is the issue resolved in recent runs? |
Yes, all recent runs were successful. |
Thanks for the confirmation. I will close the issue. |
The faults end(faults duration: 17:36:01 - 17:46:01) before running the SQL command.
The reason that
CREATE MV nexmark_q10_1_chaos_mesh
failed is because compute-1 restarted.https://grafana.test.risingwave-cloud.xyz/d/liz0yRCZz1/log-search-dashboard?orgId=1&var-data_source=PE59595AED52CF917&var-namespace=longcmkf-20240313-153221&from=1710344178000&to=1710352236000&var-pod=benchmark-risingwave-compute-c-1&var-search=
The text was updated successfully, but these errors were encountered: