Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta node hang DDL processing when connection setup timeout of SASL connection #15235

Closed
StrikeW opened this issue Feb 24, 2024 · 3 comments · Fixed by #15313
Closed

Meta node hang DDL processing when connection setup timeout of SASL connection #15235

StrikeW opened this issue Feb 24, 2024 · 3 comments · Fixed by #15313
Assignees
Labels
priority/critical type/bug Something isn't working
Milestone

Comments

@StrikeW
Copy link
Contributor

StrikeW commented Feb 24, 2024

Describe the bug

The Meta node hangs again which blocked all DDLs. And there are many lines of WARN log of connection timeout of librdkafka:

{"timestamp":"2024-02-24T06:09:17.35828774Z","level":"WARN","fields":{"message":"librdkafka: FAIL [thrd:sasl_ssl://b0-xxx.aws.confluent.cloud:9092/boot]: sasl_ssl://b0-xxx.aws.confluent.cloud:9092/0: Connection setup timed out in state CONNECT (after 30034ms in state CONNECT, 1 identical error(s) suppressed)","log.target":"librdkafka","log.module_path":"madsim_rdkafka::std_::client","log.file":"/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/madsim-rdkafka-0.3.0+0.34.0/src/std/client.rs","log.line":78},"target":"librdkafka"}

Meta node cannot process DDL commands and it seems due to the connection timeout of librdkafka. (output of show processlist)
image

Error message/log

"message":"librdkafka: FAIL [thrd:sasl_ssl://b0-xxx.aws.confluent.cloud:9092/boot]: sasl_ssl://b0-xxx.aws.confluent.cloud:9092/0: Connection setup timed out in state CONNECT (after 30032ms in state CONNECT, 1 identical error(s) suppressed

https://grafana.prod.risingwave.cloud/explore?panes=%7B%22rF-%22:%7B%22datasource%22:%22P5EC303186A5DB006%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bapp%3D%5C%22risingwave-meta-default-0%5C%22,%20namespace%3D%5C%22rwc-g1hmdvc3u9f88otor7j1kbpin2-thumbtack-prod-poc%5C%22%7D%20%7C~%20%60%28WARN%7CERROR%29%60%20%7C%20json%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22P5EC303186A5DB006%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221708750800000%22,%22to%22:%221708756259000%22%7D%7D%7D&schemaVersion=1&orgId=1

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

tenant: https://grafana.prod.risingwave.cloud/d/AdminDashboard_Tenant/tenant?var-datasource=PE662C12516FAE815&var-id=3&orgId=1

The version of RisingWave

PostgreSQL 9.5-RisingWave-1.6.1 (02ee186)

Additional context

No response

@StrikeW StrikeW added the type/bug Something isn't working label Feb 24, 2024
@github-actions github-actions bot added this to the release-1.7 milestone Feb 24, 2024
@StrikeW
Copy link
Contributor Author

StrikeW commented Feb 24, 2024

cc @tabVersion @yezizp2012

@StrikeW
Copy link
Contributor Author

StrikeW commented Feb 28, 2024

Caused by confluentinc/librdkafka#4460
@wangrunji0408 please help to update madsim-rdkafka and patch to release branch of v1.6 and v1.7, thanks.

@wangrunji0408
Copy link
Contributor

#15313 for main
#15314 for release-1.7
#15315 for release-1.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/critical type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants