Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute node dies when running regress boolean test #6340

Closed
curiosityyy opened this issue Nov 13, 2022 · 2 comments
Closed

compute node dies when running regress boolean test #6340

curiosityyy opened this issue Nov 13, 2022 · 2 comments
Labels
type/bug Something isn't working

Comments

@curiosityyy
Copy link
Contributor

Describe the bug

I ran the regress boolean test and the compute node always died. I parsed the logs below.

===
=== Dump log file /Users/xxx/project/risingwave/.risingwave/log/compute-node-5688.log ===
===

2022-11-13T15:06:51.565424Z  INFO risingwave_rt: parking lot deadlock detection enabled
2022-11-13T15:06:51.571070Z  INFO risingwave_compute: Compute node options: ComputeNodeOpts { host: "127.0.0.1:5688", client_address: Some("127.0.0.1:5688"), state_store: "hummock+memory", prometheus_listener_addr: "127.0.0.1:1222", metrics_level: 1, meta_address: "http://127.0.0.1:5690", config_path: "/Users/xxx/project/risingwave/.risingwave/config/risingwave.toml", enable_jaeger_tracing: false, async_stack_trace: On, file_cache_dir: "" }
2022-11-13T15:06:51.571333Z  INFO risingwave_compute: Server Listening at 127.0.0.1:5688
2022-11-13T15:06:51.572888Z  INFO risingwave_compute: Client address is 127.0.0.1:5688
2022-11-13T15:06:51.577433Z  INFO risingwave_compute::server: Starting compute node with config ComputeNodeConfig { server: ServerConfig { heartbeat_interval_ms: 1000, max_heartbeat_interval_secs: 600, connection_pool_size: 16 }, batch: BatchConfig { worker_threads_num: None, developer: DeveloperConfig { batch_output_channel_size: 64, batch_chunk_size: 1024, stream_enable_executor_row_count: false, stream_enable_managed_cache: true, stream_connector_message_buffer_size: 16, unsafe_stream_hash_agg_cache_size: 65536, unsafe_stream_join_cache_size: 65536, unsafe_stream_extreme_cache_size: 1024, stream_chunk_size: 1024 } }, streaming: StreamingConfig { barrier_interval_ms: 250, in_flight_barrier_nums: 40, checkpoint_frequency: 10, minimal_scheduling: false, worker_node_parallelism: 4, actor_runtime_worker_threads_num: None, total_memory_available_bytes: 137438953472, developer: DeveloperConfig { batch_output_channel_size: 64, batch_chunk_size: 1024, stream_enable_executor_row_count: false, stream_enable_managed_cache: true, stream_connector_message_buffer_size: 16, unsafe_stream_hash_agg_cache_size: 65536, unsafe_stream_join_cache_size: 65536, unsafe_stream_extreme_cache_size: 1024, stream_chunk_size: 1024 } }, storage: StorageConfig { sstable_size_mb: 256, block_size_kb: 1024, bloom_false_positive: 0.01, share_buffers_sync_parallelism: 1, share_buffer_compaction_worker_threads_number: 4, shared_buffer_capacity_mb: 4096, data_directory: "hummock_001", write_conflict_detection_enabled: true, block_cache_capacity_mb: 4096, meta_cache_capacity_mb: 1024, disable_remote_compactor: false, enable_local_spill: true, local_object_store: "tempdisk", share_buffer_upload_concurrency: 8, compactor_memory_limit_mb: 5120, sstable_id_remote_fetch_number: 10, file_cache: FileCacheConfig { capacity_mb: 1024, total_buffer_capacity_mb: 128, cache_file_fallocate_unit_mb: 512, cache_meta_fallocate_unit_mb: 16, cache_file_max_write_size_mb: 4 }, min_sst_size_for_streaming_upload: 33554432, max_sub_compaction: 4, object_store_use_batch_delete: true, enable_state_store_v1: false } } with debug assertions on
2022-11-13T15:06:51.595376Z  INFO risingwave_compute::server: Assigned worker node id 1
2022-11-13T15:06:51.603914Z  WARN risingwave_object_store::object: You're using Hummock in-memory remote object store. This should never be used in benchmarks and production environment.
2022-11-13T15:06:51.624396Z  INFO risingwave_tracing: tracing service started with slow_request_threshold_ms=100
2022-11-13T15:06:51.625786Z  INFO risingwave_compute::server: start embedded compactor
2022-11-13T15:06:51.632440Z DEBUG risingwave_storage::hummock::compactor: Succeeded subscribe_compact_tasks.
2022-11-13T15:06:51.634930Z  INFO risingwave_common_service::metrics_manager: Prometheus listener for Prometheus is set up on http://127.0.0.1:1222
2022-11-13T15:08:33.734185Z  INFO risingwave_stream::executor::source::source_executor: start with state actor_id=5 state=None
2022-11-13T15:08:33.734185Z  INFO risingwave_stream::executor::source::source_executor: start with state actor_id=6 state=None
2022-11-13T15:08:33.734185Z  INFO risingwave_stream::executor::source::source_executor: start with state actor_id=7 state=None
2022-11-13T15:08:33.734185Z  INFO risingwave_stream::executor::source::source_executor: start with state actor_id=8 state=None
2022-11-13T15:08:33.735977Z  WARN risingwave_storage::hummock::state_store: sealing invalid epoch
2022-11-13T15:08:33.736427Z  WARN risingwave_storage::hummock::state_store: syncing invalid epoch
2022-11-13T15:09:49.332914Z  INFO risingwave_stream::executor::source::source_executor: start with state actor_id=16 state=None
2022-11-13T15:09:49.332946Z  INFO risingwave_stream::executor::source::source_executor: start with state actor_id=15 state=None
2022-11-13T15:09:49.332946Z  INFO risingwave_stream::executor::source::source_executor: start with state actor_id=14 state=None
2022-11-13T15:09:49.332948Z  INFO risingwave_stream::executor::source::source_executor: start with state actor_id=13 state=None
Sun Nov 13 15:10:02 UTC 2022 [risedev]: Program exited with 139

===
=== Dump log file /Users/xxx/project/risingwave/.risingwave/log/frontend-4566.log ===
===

2022-11-13T15:06:52.893179Z  INFO risingwave_rt: parking lot deadlock detection enabled
2022-11-13T15:06:52.903871Z  INFO risingwave_frontend::session: Starting frontend node with
frontend config FrontendConfig { server: ServerConfig { heartbeat_interval_ms: 1000, max_heartbeat_interval_secs: 600, connection_pool_size: 16 } }
batch config BatchConfig { worker_threads_num: None, developer: DeveloperConfig { batch_output_channel_size: 64, batch_chunk_size: 1024, stream_enable_executor_row_count: false, stream_enable_managed_cache: true, stream_connector_message_buffer_size: 16, unsafe_stream_hash_agg_cache_size: 65536, unsafe_stream_join_cache_size: 65536, unsafe_stream_extreme_cache_size: 1024, stream_chunk_size: 1024 } }
2022-11-13T15:06:52.904834Z  INFO risingwave_frontend::session: Client address is 127.0.0.1:4566
2022-11-13T15:06:52.934018Z  INFO risingwave_common_service::metrics_manager: Prometheus listener for Prometheus is set up on http://127.0.0.1:2222
2022-11-13T15:08:38.504493Z  INFO risingwave_frontend::scheduler::hummock_snapshot_manager: Unpin epoch 3349998217003008 with RPC
2022-11-13T15:08:52.922750Z  INFO risingwave_frontend::scheduler::hummock_snapshot_manager: Unpin epoch 3349999298412544 with RPC
2022-11-13T15:09:12.459521Z  INFO risingwave_frontend::scheduler::hummock_snapshot_manager: Unpin epoch 3350000559980544 with RPC
2022-11-13T15:09:22.922804Z  INFO risingwave_frontend::scheduler::hummock_snapshot_manager: Unpin epoch 3350001100587008 with RPC
2022-11-13T15:09:32.923235Z  INFO risingwave_frontend::scheduler::hummock_snapshot_manager: Unpin epoch 3350001821483008 with RPC
2022-11-13T15:09:49.344871Z  INFO risingwave_frontend::scheduler::hummock_snapshot_manager: Unpin epoch 3350003001196544 with RPC
2022-11-13T15:10:02.415913Z  INFO risingwave_frontend::scheduler::hummock_snapshot_manager: Unpin epoch 3350003705708544 with RPC
2022-11-13T15:10:02.518909Z ERROR risingwave_frontend::scheduler::distributed::stage: Stage QueryId { id: "ba79804d-e2b9-4770-90f0-86a1eb6cac9d" }-0 failed to schedule tasks, error: TaskExecutionError("internal error: error reading a body from connection: broken pipe")
2022-11-13T15:10:02.519021Z ERROR risingwave_frontend::scheduler::distributed::query: Query stage QueryId { id: "ba79804d-e2b9-4770-90f0-86a1eb6cac9d" }-0 failed: TaskExecutionError("internal error: error reading a body from connection: broken pipe").
2022-11-13T15:10:02.518956Z ERROR risingwave_frontend::scheduler::distributed::stage: Stage QueryId { id: "ba79804d-e2b9-4770-90f0-86a1eb6cac9d" }-1 failed to schedule tasks, error: RpcError(GrpcStatus(Status { code: Unknown, message: "error reading a body from connection: broken pipe", source: Some(hyper::Error(Body, Error { kind: Io(Kind(BrokenPipe)) })) }))
2022-11-13T15:10:02.519345Z ERROR risingwave_frontend::session: failed to handle sql:
INSERT INTO BOOLTBL2 (f1)
   VALUES (bool 'XXX');:
internal error: internal error: error reading a body from connection: broken pipe
2022-11-13T15:10:02.792123Z ERROR risingwave_common_service::observer_manager: Receives meta's notification err Status { code: Unknown, message: "error reading a body from connection: broken pipe", source: Some(hyper::Error(Body, Error { kind: Io(Kind(BrokenPipe)) })) }
2022-11-13T15:10:02.922667Z  WARN risingwave_rpc_client::meta_client: Failed to send_heartbeat: error gRPC error (The service is currently unavailable): error trying to connect: tcp connect error: Connection refused (os error 61)
2022-11-13T15:10:03.924491Z  WARN risingwave_rpc_client::meta_client: Failed to send_heartbeat: error gRPC error (The service is currently unavailable): error trying to connect: tcp connect error: Connection refused (os error 61)
2022-11-13T15:10:04.924231Z  WARN risingwave_rpc_client::meta_client: Failed to send_heartbeat: error gRPC error (The service is currently unavailable): error trying to connect: tcp connect error: Connection refused (os error 61)
2022-11-13T15:10:05.924030Z  WARN risingwave_rpc_client::meta_client: Failed to send_heartbeat: error gRPC error (The service is currently unavailable): error trying to connect: tcp connect error: Connection refused (os error 61)
2022-11-13T15:10:06.924137Z  WARN risingwave_rpc_client::meta_client: Failed to send_heartbeat: error gRPC error (The service is currently unavailable): error trying to connect: tcp connect error: Connection refused (os error 61)
2022-11-13T15:10:07.923686Z  WARN risingwave_rpc_client::meta_client: Failed to send_heartbeat: error gRPC error (The service is currently unavailable): error trying to connect: tcp connect error: Connection refused (os error 61)
2022-11-13T15:10:08.923939Z  WARN risingwave_rpc_client::meta_client: Failed to send_heartbeat: error gRPC error (The service is currently unavailable): error trying to connect: tcp connect error: Connection refused (os error 61)

To Reproduce

Assume that you have properly install everything.
Then use ./risedev d to start the cluster and run below command

RUST_BACKTRACE=1 target/debug/risingwave_regress_test -h 127.0.0.1 \
  -p 4566 \
  -u root \
  --input `pwd`/src/tests/regress/data \
  --output `pwd`/src/tests/regress/output \
  --schedule `pwd`/src/tests/regress/data/schedule \
  --mode risingwave

Expected behavior

The compute node will not die and the test can be finished.

Additional context

No response

@curiosityyy curiosityyy added the type/bug Something isn't working label Nov 13, 2022
@github-actions github-actions bot added this to the release-0.1.14 milestone Nov 13, 2022
@curiosityyy
Copy link
Contributor Author

Seems that it always dies at running this sql

INSERT INTO BOOLTBL2 (f1)
   VALUES (bool 'XXX');

@xiangjinwu
Copy link
Contributor

Thanks for reporting. Merging this into an ongoing thread #6205.

@xiangjinwu xiangjinwu closed this as not planned Won't fix, can't repro, duplicate, stale Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants