You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am experiencing persistent Kafka broker connectivity issues when using the Confluent Kafka Go client. The consumer frequently disconnects from brokers, as indicated by multiple disconnection and reconnection attempts in the log.
I am running with multiple consumer pods and cannot keep up live. Lags are increasing, but consumers are down.
Also, sometimes 2 -3 consumers keep up live for 4-5 hours on the event load and stop after that. Assume that I using almost 10 consumer pods to process those events. frequent disconnections with multiple consumer pods and the challenges with maintaining stability.
Your help is greatly appreciated.
Description
I am experiencing persistent Kafka broker connectivity issues when using the Confluent Kafka Go client. The consumer frequently disconnects from brokers, as indicated by multiple disconnection and reconnection attempts in the log.
I am running with multiple consumer pods and cannot keep up live. Lags are increasing, but consumers are down.
Also, sometimes 2 -3 consumers keep up live for 4-5 hours on the event load and stop after that. Assume that I using almost 10 consumer pods to process those events. frequent disconnections with multiple consumer pods and the challenges with maintaining stability.
Your help is greatly appreciated.
How to reproduce
max.poll.interval.ms: 600000
session. timeout.ms: 60000
Error log (broker):
identical error(s) suppressed) (_TRANSPORT): identical to last error %6|1733137672.496|FAIL|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Disconnected (after 2507976ms in state UP, 1 identical error(s) suppressed) %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Broker changed state UP -> DOWN %7|1733137672.496|METADATA|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: Requesting metadata for 1/1 topics: broker down %7|1733137672.496|METADATA|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Request metadata for 1 topic(s): broker down %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Broker changed state DOWN -> INIT %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Broker changed state INIT -> TRY_CONNECT %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: broker in state TRY_CONNECT connecting %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Broker changed state TRY_CONNECT -> CONNECT %7|1733137672.496|FAIL|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator/2: b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094: Disconnected (after 9064857ms in state UP) (_TRANSPORT) %6|1733137672.496|FAIL|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094: Disconnected (after 9064857ms in state UP) %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Broker changed state UP -> DOWN %7|1733137672.496|METADATA|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: Requesting metadata for 1/1 topics: broker down %7|1733137672.496|METADATA|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Request metadata for 1 topic(s): broker down %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Broker changed state DOWN -> INIT %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Broker changed state INIT -> TRY_CONNECT %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator/2: broker in state TRY_CONNECT connecting %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Broker changed state TRY_CONNECT -> CONNECT {"level":"error","caller":"/kafka.go:331","time":"2024-12-02T11:07:52Z","message":"% Error: Local: Broker transport failure: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Disconnected (after 2507976ms in state UP, 1 identical error(s) suppressed)\n"} {"level":"error","caller":"/kafka.go:331","time":"2024-12-02T11:07:52Z","message":"% Error: Local: Broker transport failure: GroupCoordinator: b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094: Disconnected (after 9064857ms in state UP)\n"} %7|1733137672.496|FAIL|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Disconnected (after 2507977ms in state UP, 1 identical error(s) suppressed) (_TRANSPORT): identical to last error %6|1733137672.496|FAIL|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Disconnected (after 2507977ms in state UP, 1 identical error(s) suppressed) %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Broker changed state UP -> DOWN {"level":"error","caller":"/kafka.go:331","time":"2024-12-02T11:07:52Z","message":"% Error: Local: Broker transport failure: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Disconnected (after 2507977ms in state UP, 1 identical error(s) suppressed)\n"} %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3: Selected for cluster connection: broker down (broker has 1 connection attempt(s)) {"level":"error","caller":"/kafka.go:331","time":"2024-12-02T11:07:52Z","message":"% Error: Local: All broker connections are down: 4/4 brokers are down\n"} %7|1733137672.496|METADATA|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: Skipping metadata refresh of 1 topic(s): broker down: no usable brokers {"level":"info","caller":"/kafka.go:341","time":"2024-12-02T11:07:52Z","message":"Closing consumer"} %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Broker changed state DOWN -> INIT %7|1733137672.496|SUBSCRIPTION|my-kafka-app-system#consumer-4| [thrd:main]: Group "group-my-activity": effective subscription list changed from 1 to 0 topic(s): %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3: Received CONNECT op %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Broker changed state INIT -> TRY_CONNECT %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3: Broker changed state INIT -> TRY_CONNECT %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: broker in state TRY_CONNECT connecting %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Broker changed state TRY_CONNECT -> CONNECT %7|1733137672.496|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3: broker in state TRY_CONNECT connecting %7|1733137672.496|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3: Broker changed state TRY_CONNECT -> CONNECT {"level":"info","caller":"/kafka.go:360","time":"2024-12-02T11:07:52Z","message":"% EAGER rebalance: 2 partition(s) revoked: [my-activity[8]@unset my-activity[9]@unset]"} {"level":"info","caller":"/kafka.go:373","time":"2024-12-02T11:07:52Z","message":"% Committed offsets to Kafka: []"} %7|1733137672.497|NODENAME|my-kafka-app-system#consumer-4| [thrd:main]: GroupCoordinator/2: Broker nodename changed from "b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094" to "" %7|1733137672.497|NODEID|my-kafka-app-system#consumer-4| [thrd:main]: GroupCoordinator/2: Broker nodeid changed from 2 to -1 %7|1733137672.497|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Connecting to ipv4#xx.xxx.xx.xxx:9094 (ssl) with socket 15 %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:app]: Terminating instance (destroy flags none (0x0)) %7|1733137672.497|TERMINATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Handle is terminating in state CONNECT: 11 refcnts (0x7f28580eb040), 4 toppar(s), 1 active toppar(s), 0 outbufs, 0 waitresps, 0 retrybufs: failed 0 request(s) in retry+outbuf %7|1733137672.497|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1: Connecting to ipv4#xx.xxx.xx.xxx:9094 (ssl) with socket 28 %7|1733137672.497|CONNECT|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Connected to ipv4#xx.xxx.xx.xxx:9094 %7|1733137672.497|STATE|my-kafka-app-system#consumer-4| [thrd:ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz]: ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2: Broker changed state CONNECT -> SSL_HANDSHAKE %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Destroy internal %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Removing all topics %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Sending TERMINATE to ssl://b-2.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/2 %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Sending TERMINATE to ssl://b-1.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amazonaws.com:9094/1 %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Sending TERMINATE to b-3.kafka2-portal-msk-def.xxxx.c2.kafka.us-east-2.amaz/3 %7|1733137672.497|DESTROY|my-kafka-app-system#consumer-4| [thrd:main]: Sending TERMINATE to GroupCoordinator %7|1733137672.497|TERM|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: Received TERMINATE op in state INIT: 3 refcnts, 0 toppar(s), 0 active toppar(s), 0 outbufs, 0 waitresps, 0 retrybufs %7|1733137672.497|CONNECT|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Connecting to ipv4#xx.xxx.xx.xxx:9094 (ssl) with socket 14 %7|1733137672.497|FAIL|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: Client is terminating (after 9064894ms in state INIT) (_DESTROY) %7|1733137672.497|STATE|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: Broker changed state INIT -> DOWN %7|1733137672.497|BRKTERM|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: terminating: broker still has 3 refcnt(s), 0 buffer(s), 0 partition(s) %7|1733137672.497|TERMINATE|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: Handle is terminating in state DOWN: 2 refcnts (0x7f28580e84d0), 0 toppar(s), 0 active toppar(s), 0 outbufs, 0 waitresps, 0 retrybufs: failed 0 request(s) in retry+outbuf %7|1733137672.497|STATE|my-kafka-app-system#consumer-4| [thrd::0/internal]: :0/internal: Broker changed state DOWN -> INIT %7|1733137672.497|TERMINATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Handle is terminating in state CONNECT: 2 refcnts (0x7f28580e7850), 0 toppar(s), 0 active toppar(s), 0 outbufs, 0 waitresps, 0 retrybufs: failed 0 request(s) in retry+outbuf %7|1733137672.497|CONNECT|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Connected to ipv4#xx.xxx.xx.xxx:9094 %7|1733137672.497|STATE|my-kafka-app-system#consumer-4| [thrd:GroupCoordinator]: GroupCoordinator: Broker changed state CONNECT -> SSL_HANDSHAKE
Checklist
Please provide the following information:
LibraryVersion(v2.3.0)
):ConfigMap{...}
"debug": ".."
as necessary)The text was updated successfully, but these errors were encountered: