source parser/sink writer errors may lead to standalone cluster being entirely unavailable during recovery #17946

kwannoel · 2024-08-06T15:31:16Z

source/sink parser errors should only take down Compute nodes. It shouldn't affect meta nodes / fe nodes if there's no panic.

Needs investigation to reproduce it.

xxchan · 2024-08-06T15:35:31Z

#16813 Here's a bug for you to try 🤪

st1page · 2024-08-06T15:36:24Z

sink a null value into a PG with NOT NULL constraint

kwannoel · 2024-08-06T15:39:17Z

Before this, we can also try run the recover command, and run cluster commands which do not depend on CN, only meta and fe, to see if it's a recovery in single node mode problem, or a source / sink problem.

fuyufjh · 2024-08-07T03:17:42Z

Yesterday I was trying to reproduce the case but failed. I created a PG sink without sink decouple and ingested some bad data (nulls). The Meta node kept recovering forever, which is expected, but nothing was down. 🙁

fuyufjh · 2024-08-07T03:18:55Z

Also notice that it's standalone mode instead of single-node mode. Here is the command I used to start the process.

CONNECTOR_LIBS_PATH=/Users/eric/Workspace/risingwave/.risingwave/bin/connector-node/libs cargo run --bin risingwave -- standalone --meta-opts="--listen-addr 127.0.0.1:5690  --advertise-addr 127.0.0.1:5690  --dashboard-host 127.0.0.1:5691  --prometheus-host 127.0.0.1:1250  --backend etcd  --etcd-endpoints 127.0.0.1:2388  --state-store hummock+minio://hummockadmin:[email protected]:9301/hummock001  --data-directory hummock_001"      --compute-opts="--listen-addr 127.0.0.1:5688  --prometheus-listener-addr 127.0.0.1:1222  --advertise-addr 127.0.0.1:5688  --async-stack-trace verbose  --parallelism 4  --total-memory-bytes 8589934592  --role both  --meta-address http://127.0.0.1:5690"      --frontend-opts="--listen-addr 127.0.0.1:4566 --advertise-addr 127.0.0.1:4566 --prometheus-listener-addr 127.0.0.1:2222 --health-check-listener-addr 127.0.0.1:6786 --meta-addr http://127.0.0.1:5690"      --compactor-opts="--listen-addr 127.0.0.1:6660   --prometheus-listener-addr 127.0.0.1:1260   --advertise-addr 127.0.0.1:6660   --meta-address http://127.0.0.1:5690"

kwannoel · 2024-08-07T06:32:14Z

Another user encounters:

    "message": "librdkafka: Global error: Authentication (Local: Authentication failure): <REDACTED> SASL authentication error: invalid username or password for <REDACTED>

This error also causes standalone cluster to crash loop.

kwannoel added the needs-investigation label Aug 6, 2024

kwannoel self-assigned this Aug 6, 2024

github-actions bot added this to the release-2.0 milestone Aug 6, 2024

kwannoel modified the milestones: release-2.0, release-2.1 Aug 19, 2024

kwannoel modified the milestones: release-2.1, future-release-2.2 Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

source parser/sink writer errors may lead to standalone cluster being entirely unavailable during recovery #17946

source parser/sink writer errors may lead to standalone cluster being entirely unavailable during recovery #17946

kwannoel commented Aug 6, 2024

xxchan commented Aug 6, 2024

st1page commented Aug 6, 2024

kwannoel commented Aug 6, 2024 •

edited

Loading

fuyufjh commented Aug 7, 2024

fuyufjh commented Aug 7, 2024

kwannoel commented Aug 7, 2024

source parser/sink writer errors may lead to standalone cluster being entirely unavailable during recovery #17946

source parser/sink writer errors may lead to standalone cluster being entirely unavailable during recovery #17946

Comments

kwannoel commented Aug 6, 2024

xxchan commented Aug 6, 2024

st1page commented Aug 6, 2024

kwannoel commented Aug 6, 2024 • edited Loading

fuyufjh commented Aug 7, 2024

fuyufjh commented Aug 7, 2024

kwannoel commented Aug 7, 2024

kwannoel commented Aug 6, 2024 •

edited

Loading