Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source parser/sink writer errors may lead to standalone cluster being entirely unavailable during recovery #17946

Open
kwannoel opened this issue Aug 6, 2024 · 6 comments
Assignees
Milestone

Comments

@kwannoel
Copy link
Contributor

kwannoel commented Aug 6, 2024

source/sink parser errors should only take down Compute nodes. It shouldn't affect meta nodes / fe nodes if there's no panic.

Needs investigation to reproduce it.

@kwannoel kwannoel self-assigned this Aug 6, 2024
@github-actions github-actions bot added this to the release-2.0 milestone Aug 6, 2024
@xxchan
Copy link
Member

xxchan commented Aug 6, 2024

#16813 Here's a bug for you to try 🤪

@st1page
Copy link
Contributor

st1page commented Aug 6, 2024

sink a null value into a PG with NOT NULL constraint

@kwannoel
Copy link
Contributor Author

kwannoel commented Aug 6, 2024

Before this, we can also try run the recover command, and run cluster commands which do not depend on CN, only meta and fe, to see if it's a recovery in single node mode problem, or a source / sink problem.

@fuyufjh
Copy link
Member

fuyufjh commented Aug 7, 2024

Yesterday I was trying to reproduce the case but failed. I created a PG sink without sink decouple and ingested some bad data (nulls). The Meta node kept recovering forever, which is expected, but nothing was down. 🙁

@fuyufjh
Copy link
Member

fuyufjh commented Aug 7, 2024

Also notice that it's standalone mode instead of single-node mode. Here is the command I used to start the process.

CONNECTOR_LIBS_PATH=/Users/eric/Workspace/risingwave/.risingwave/bin/connector-node/libs cargo run --bin risingwave -- standalone --meta-opts="--listen-addr 127.0.0.1:5690  --advertise-addr 127.0.0.1:5690  --dashboard-host 127.0.0.1:5691  --prometheus-host 127.0.0.1:1250  --backend etcd  --etcd-endpoints 127.0.0.1:2388  --state-store hummock+minio://hummockadmin:[email protected]:9301/hummock001  --data-directory hummock_001"      --compute-opts="--listen-addr 127.0.0.1:5688  --prometheus-listener-addr 127.0.0.1:1222  --advertise-addr 127.0.0.1:5688  --async-stack-trace verbose  --parallelism 4  --total-memory-bytes 8589934592  --role both  --meta-address http://127.0.0.1:5690"      --frontend-opts="--listen-addr 127.0.0.1:4566 --advertise-addr 127.0.0.1:4566 --prometheus-listener-addr 127.0.0.1:2222 --health-check-listener-addr 127.0.0.1:6786 --meta-addr http://127.0.0.1:5690"      --compactor-opts="--listen-addr 127.0.0.1:6660   --prometheus-listener-addr 127.0.0.1:1260   --advertise-addr 127.0.0.1:6660   --meta-address http://127.0.0.1:5690"

@kwannoel
Copy link
Contributor Author

kwannoel commented Aug 7, 2024

Another user encounters:

    "message": "librdkafka: Global error: Authentication (Local: Authentication failure): <REDACTED> SASL authentication error: invalid username or password for <REDACTED>

This error also causes standalone cluster to crash loop.

@kwannoel kwannoel modified the milestones: release-2.0, release-2.1 Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants