Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute pod terminated 133 in longevity test #17115

Closed
huangjw806 opened this issue Jun 5, 2024 · 5 comments
Closed

Compute pod terminated 133 in longevity test #17115

huangjw806 opened this issue Jun 5, 2024 · 5 comments
Assignees
Labels
Milestone

Comments

@huangjw806
Copy link
Contributor

================================================================================
longevity-test Result
================================================================================
Result               FAIL                
Pipeline Message     run all nexmark (8 sets of nexmark queries) with 10k throughput daily
Namespace            reglngvty-20240604-150219
TestBed              medium-arm-3cn-all-affinity
RW Version           nightly-20240604    
Test Start time      2024-06-04 15:12:20 
Test End time        2024-06-05 03:14:36 
Test Queries         nexmark_q0,nexmark_q1,nexmark_q2,nexmark_q3,nexmark_q4,nexmark_q5,nexmark_q6_group_top1,nexmark_q7,nexmark_q8,nexmark_q9,nexmark_q10,nexmark_q12,nexmark_q14,nexmark_q15,nexmark_q16,nexmark_q17,nexmark_q18,nexmark_q19,nexmark_q20,nexmark_q21,nexmark_q22,nexmark_q101,nexmark_q102,nexmark_q103,nexmark_q104,nexmark_q105
Grafana Metric       https://grafana.test.risingwave-cloud.xyz/d/EpkBw5W4k/risingwave-dev-dashboard?orgId=1&var-datasource=Prometheus:%20test-useast1-eks-a&var-namespace=reglngvty-20240604-150219&from=1717513940000&to=1717557276000
Grafana Logs         https://grafana.test.risingwave-cloud.xyz/d/liz0yRCZz1/log-search-dashboard?orgId=1&var-data_source=Logging:%20test-useast1-eks-a&var-namespace=reglngvty-20240604-150219&from=1717513940000&to=1717557276000
Memory Dumps         https://s3.console.aws.amazon.com/s3/buckets/test-useast1-mgmt-bucket-archiver?region=us-east-1&bucketType=general&prefix=k8s/reglngvty-20240604-150219/&showversions=false
Buildkite Job        https://buildkite.com/risingwave-test/longevity-test/builds/1445

================================================================================
Restarted/Crashed Pods Details
 ================================================================================
Pod crashed/Restarted: benchmark-risingwave-compute-c-1 restart_count:1  phase:Running status:True
image
@huangjw806
Copy link
Contributor Author

cn log:

2024-06-04T17:30:31.892459844Z ERROR risingwave_stream::task::stream_manager: actor exit with error actor_id=527 error=Executor error: exchange channel to downstream actor 520 closed unexpectedly |  
-- | --
  | 2024-06-04T17:30:31.893456125Z ERROR risingwave_stream::task::stream_manager: actor exit with error actor_id=524 error=Executor error: exchange channel to downstream actor 520 closed unexpectedly |  
  | 2024-06-04T17:30:31.953815933Z ERROR risingwave_stream::task::stream_manager: actor exit with error actor_id=503 error=Executor error: exchange channel to downstream actor 496 closed unexpectedly |  
  | 2024-06-04T17:30:32.078816588Z ERROR risingwave_stream::task::stream_manager: actor exit with error actor_id=1355 error=Executor error: exchange channel to downstream actor 1353 closed unexpectedly |  
  | 2024-06-04T17:30:32.11588205Z ERROR risingwave_stream::task::stream_manager: actor exit with error actor_id=1352 error=Executor error: exchange channel from local upstream actor 1355 closed unexpectedly |  
  | 2024-06-04T17:30:32.179353862Z ERROR risingwave_stream::task::stream_manager: actor exit with error actor_id=1445 error=Executor error: exchange channel to downstream actor 1443 closed unexpectedly |  
  | 2024-06-04T17:30:32.179885705Z ERROR risingwave_stream::task::stream_manager: actor exit with error actor_id=1442 error=Executor error: exchange channel from local upstream actor 1445 closed unexpectedly |  
  | 2024-06-04T17:30:32.180378778Z ERROR risingwave_stream::task::stream_manager: actor exit with error actor_id=1451 error=Executor error: exchange channel to downstream actor 1445 closed unexpectedly |  
  | 2024-06-04T17:30:32.413403419Z  INFO aws_smithy_runtime_api::client::connection: smithy connection was poisoned |  
  | 2024-06-04T17:30:32.466255337Z ERROR risingwave_stream::task::stream_manager: actor exit with error actor_id=2012 error=Executor error: exchange channel to downstream actor 2010 closed unexpectedly |  
  | 2024-06-04T17:30:32.634497599Z ERROR risingwave_stream::task::stream_manager: actor exit with error actor_id=1745 error=Executor error: exchange channel to downstream actor 1737 closed unexpectedly |  
  | 2024-06-04T17:30:32.796143891Z ERROR risingwave_stream::task::stream_manager: actor exit with error actor_id=908 error=Executor error: exchange channel to downstream actor 906 closed unexpectedly |  
  | 2024-06-04T17:30:32.796764549Z ERROR risingwave_stream::task::stream_manager: actor exit with error actor_id=905 error=Executor error: exchange channel from local upstream actor 908 closed unexpectedly |  
  | 2024-06-04T17:30:32.818215072Z ERROR risingwave_stream::task::stream_manager: actor exit with error actor_id=914 error=Executor error: exchange channel to downstream actor 907 closed unexpectedly |  
  | 2024-06-04T17:30:32.940149312Z ERROR risingwave_stream::task::stream_manager: actor exit with error actor_id=959 error=actor exited unexpectedly |  
  |   |  
  | Backtrace: |  
  | 0: std::backtrace_rs::backtrace::libunwind::trace |  
  | at ./rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5 |  
  | 1: std::backtrace_rs::backtrace::trace_unsynchronized |  
  | at ./rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5 |  
  | 2: std::backtrace::Backtrace::create |  
  | at ./rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/backtrace.rs:331:13 |  
  | 3: anyhow::error::<impl anyhow::Error>::msg


@huangjw806
Copy link
Contributor Author

Looks like the same problem in chaos-mesh test.

================================================================================
chaos-mesh Result
================================================================================
Result               FAIL                
Pipeline Message     Nightly nexmark     
Namespace            longcmkf-20240604-153100
TestBed              medium-arm-3cn-all-affinity
RW Version           nightly-20240604    
Test Start time      2024-06-04 15:34:29 
Test End time        2024-06-04 17:27:58 
Test Queries         q0,q1,q2,q3,q4,q5,q7,q8,q9,q10,q14,q15,q16,q17,q18,q20,q21,q22,q101,q102,q103,q104,q105
Grafana Metric       https://grafana.test.risingwave-cloud.xyz/d/EpkBw5W4k/risingwave-dev-dashboard?orgId=1&var-datasource=Prometheus:%20test-useast1-eks-a&var-namespace=longcmkf-20240604-153100&from=1717515269000&to=1717522078000
Grafana Logs         https://grafana.test.risingwave-cloud.xyz/d/liz0yRCZz1/log-search-dashboard?orgId=1&var-data_source=Logging:%20test-useast1-eks-a&var-namespace=longcmkf-20240604-153100&from=1717515269000&to=1717522078000
Memory Dumps         https://s3.console.aws.amazon.com/s3/buckets/test-useast1-mgmt-bucket-archiver?region=us-east-1&bucketType=general&prefix=k8s/longcmkf-20240604-153100/&showversions=false
Buildkite Job        https://buildkite.com/risingwave-test/chaos-mesh/builds/865

@lmatz lmatz added the type/bug Something isn't working label Jun 5, 2024
@hzxa21
Copy link
Collaborator

hzxa21 commented Jun 5, 2024

#17111

@fuyufjh
Copy link
Member

fuyufjh commented Jul 10, 2024

Is it fixed by #17111?

@hzxa21
Copy link
Collaborator

hzxa21 commented Jul 10, 2024

Yes, I confirm that this is fixed by #17111.

@hzxa21 hzxa21 closed this as completed Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants