Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent consumer disconnection with broker #1351

Open
7 tasks
techgeekengineer007 opened this issue Dec 3, 2024 · 0 comments
Open
7 tasks

Frequent consumer disconnection with broker #1351

techgeekengineer007 opened this issue Dec 3, 2024 · 0 comments

Comments

@techgeekengineer007
Copy link

techgeekengineer007 commented Dec 3, 2024

Description

I am experiencing persistent Kafka broker connectivity issues when using the Confluent Kafka Go client. The consumer frequently disconnects from brokers, as indicated by multiple disconnection and reconnection attempts in the log.

I am running with multiple consumer pods and cannot keep up live. Lags are increasing, but consumers are down.
Also, sometimes 2 -3 consumers keep up live for 4-5 hours on the event load and stop after that. Assume that I using almost 10 consumer pods to process those events. frequent disconnections with multiple consumer pods and the challenges with maintaining stability.
Your help is greatly appreciated.

How to reproduce

max.poll.interval.ms: 600000
session. timeout.ms: 60000

Error log (broker):
identical error(s) suppressed) (_TRANSPORT): identical to last error %6|1733137672.496|FAIL|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Disconnected (after 2507976ms in state UP, 1 identical error(s) suppressed) %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Broker changed state UP -> DOWN %7|1733137672.496|METADATA|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: Requesting metadata for 1/1 topics: broker down %7|1733137672.496|METADATA|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Request metadata for 1 topic(s): broker down %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Broker changed state DOWN -> INIT %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Broker changed state INIT -> TRY_CONNECT %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: broker in state TRY_CONNECT connecting %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Broker changed state TRY_CONNECT -> CONNECT %7|1733137672.496|FAIL|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator/2: b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094: Disconnected (after 9064857ms in state UP) (_TRANSPORT) %6|1733137672.496|FAIL|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094: Disconnected (after 9064857ms in state UP) %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Broker changed state UP -> DOWN %7|1733137672.496|METADATA|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: Requesting metadata for 1/1 topics: broker down %7|1733137672.496|METADATA|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Request metadata for 1 topic(s): broker down %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Broker changed state DOWN -> INIT %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Broker changed state INIT -> TRY_CONNECT %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator/2: broker in state TRY_CONNECT connecting %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Broker changed state TRY_CONNECT -> CONNECT {"level":"error","caller":"/kafka.go:331","time":"2024-12-02T11:07:52Z","message":"% Error: Local: Broker transport failure: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Disconnected (after 2507976ms in state UP, 1 identical error(s) suppressed)\n"} {"level":"error","caller":"/kafka.go:331","time":"2024-12-02T11:07:52Z","message":"% Error: Local: Broker transport failure: GroupCoordinator: b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094: Disconnected (after 9064857ms in state UP)\n"} %7|1733137672.496|FAIL|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Disconnected (after 2507977ms in state UP, 1 identical error(s) suppressed) (_TRANSPORT): identical to last error %6|1733137672.496|FAIL|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Disconnected (after 2507977ms in state UP, 1 identical error(s) suppressed) %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Broker changed state UP -> DOWN {"level":"error","caller":"/kafka.go:331","time":"2024-12-02T11:07:52Z","message":"% Error: Local: Broker transport failure: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Disconnected (after 2507977ms in state UP, 1 identical error(s) suppressed)\n"} %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3: Selected for cluster connection: broker down (broker has 1 connection attempt(s)) {"level":"error","caller":"/kafka.go:331","time":"2024-12-02T11:07:52Z","message":"% Error: Local: All broker connections are down: 4/4 brokers are down\n"} %7|1733137672.496|METADATA|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: Skipping metadata refresh of 1 topic(s): broker down: no usable brokers {"level":"info","caller":"/kafka.go:341","time":"2024-12-02T11:07:52Z","message":"Closing consumer"} %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Broker changed state DOWN -> INIT %7|1733137672.496|SUBSCRIPTION|my-kafka-app-system#consumer-4| [thrd:main]: Group "group-my-activity": effective subscription list changed from 1 to 0 topic(s): %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3: Received CONNECT op %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Broker changed state INIT -> TRY_CONNECT %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3: Broker changed state INIT -> TRY_CONNECT %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: broker in state TRY_CONNECT connecting %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Broker changed state TRY_CONNECT -> CONNECT %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3: broker in state TRY_CONNECT connecting %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3: Broker changed state TRY_CONNECT -> CONNECT {"level":"info","caller":"/kafka.go:360","time":"2024-12-02T11:07:52Z","message":"% EAGER rebalance: 2 partition(s) revoked: [my-activity[8]@unset my-activity[9]@unset]"} {"level":"info","caller":"/kafka.go:373","time":"2024-12-02T11:07:52Z","message":"% Committed offsets to Kafka: []"} %7|1733137672.497|NODENAME|my-kafka-app-system#consumer-4| [thrd:main]: GroupCoordinator/2: Broker nodename changed from "b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094" to "" %7|1733137672.497|NODEID|my-kafka-app-system#consumer-4| [thrd:main]: GroupCoordinator/2: Broker nodeid changed from 2 to -1 %7|1733137672.497|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Connecting to ipv4#xx.xxx.xx.xxx:9094 (ssl) with socket 15 %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:app]: Terminating instance (destroy flags none (0x0)) %7|1733137672.497|TERMINATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Handle is terminating in state CONNECT: 11 refcnts (0x7f28580eb040), 4 toppar(s), 1 active toppar(s), 0 outbufs, 0 waitresps, 0 retrybufs: failed 0 request(s) in retry+outbuf %7|1733137672.497|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Connecting to ipv4#xx.xxx.xx.xxx:9094 (ssl) with socket 28 %7|1733137672.497|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Connected to ipv4#xx.xxx.xx.xxx:9094 %7|1733137672.497|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Broker changed state CONNECT -> SSL_HANDSHAKE %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Destroy internal %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Removing all topics %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Sending TERMINATE to ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2 %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Sending TERMINATE to ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1 %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Sending TERMINATE to b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3 %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Sending TERMINATE to GroupCoordinator %7|1733137672.497|TERM|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: Received TERMINATE op in state INIT: 3 refcnts, 0 toppar(s), 0 active toppar(s), 0 outbufs, 0 waitresps, 0 retrybufs %7|1733137672.497|CONNECT|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Connecting to ipv4#xx.xxx.xx.xxx:9094 (ssl) with socket 14 %7|1733137672.497|FAIL|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: Client is terminating (after 9064894ms in state INIT) (_DESTROY) %7|1733137672.497|STATE|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: Broker changed state INIT -> DOWN %7|1733137672.497|BRKTERM|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: terminating: broker still has 3 refcnt(s), 0 buffer(s), 0 partition(s) %7|1733137672.497|TERMINATE|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: Handle is terminating in state DOWN: 2 refcnts (0x7f28580e84d0), 0 toppar(s), 0 active toppar(s), 0 outbufs, 0 waitresps, 0 retrybufs: failed 0 request(s) in retry+outbuf %7|1733137672.497|STATE|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: Broker changed state DOWN -> INIT %7|1733137672.497|TERMINATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Handle is terminating in state CONNECT: 2 refcnts (0x7f28580e7850), 0 toppar(s), 0 active toppar(s), 0 outbufs, 0 waitresps, 0 retrybufs: failed 0 request(s) in retry+outbuf %7|1733137672.497|CONNECT|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Connected to ipv4#xx.xxx.xx.xxx:9094 %7|1733137672.497|STATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Broker changed state CONNECT -> SSL_HANDSHAKE

Checklist

Please provide the following information:

  • confluent-kafka-go and librdkafka version (LibraryVersion(v2.3.0)):
  • Apache Kafka broker version: 3.5.1
  • Client configuration: ConfigMap{...}
  • Operating system:
  • Provide client logs (with "debug": ".." as necessary)
  • Provide broker log excerpts
  • Critical issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant