Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transaction do_abort_tx request: not found, assuming already aborted. #24468

Open
vsarunas opened this issue Dec 6, 2024 · 15 comments
Open

Transaction do_abort_tx request: not found, assuming already aborted. #24468

vsarunas opened this issue Dec 6, 2024 · 15 comments
Assignees
Labels
kind/bug Something isn't working

Comments

@vsarunas
Copy link

vsarunas commented Dec 6, 2024

Version & Environment

Redpanda version: (use rpk version):

rpk version
Version:   v24.2.10
Git ref:   74404e718c
Build date: 2024-11-08T13:25:02Z
OS/Arch:   darwin/arm64
Go version: go1.23.1
Redpanda Cluster
 node-0 v24.2.8 - a18efa7206b0266473842dab463e7a55298dad13

macOS, Linux VM running a Docker container with rpk 24.2.8.

What went wrong?

rdkafka client was getting rejects "Local: This instance has been fenced by a newer instance" when trying to abort a transaction; (2024-12-06T06:56:02.818413+01:00)

This section of logs look suspicious from the same time as the client reject:

INFO  2024-12-06 05:56:02,893 [shard 0:main] tx - [N:dc-1+loadBalancerNext+assigner S:Stable G:27] group.cc:3181 - attempting expiration of producer: {producer_identity: id=12, epoch=786} transaction
INFO  2024-12-06 05:56:02,918 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+loadBalancerNext+transactional] trying to abort transaction. pid: {producer_identity: id=12, epoch=786} sequence: 0
INFO  2024-12-06 05:56:02,918 [shard 0:main] tx - tx_gateway_frontend.cc:578 - [tx_id=dc-1@keres@core@default+loadBalancerNext+transactional] found transaction {id: dc-1@keres@core@default+loadBalancerNext+transactional, status: ongoing, pid: {producer_identity: id=12, epoch=786}, last_pid: {producer_identity: id=-1, epoch=-1}, etag: 10, seq: 0, partitions: } to abort
INFO  2024-12-06 05:56:02,944 [shard 0:kafk] tx - [N:dc-1+loadBalancerNext+assigner S:Stable G:27] group.cc:2850 - do_abort_tx request:- producer/transaction {producer_identity: id=12, epoch=786} not found, sequence: 0, assuming already aborted.

Transaction in CompleteAbort state:

rpk cluster txn describe
COORDINATOR TRANSACTIONAL-ID                          PRODUCER-ID PRODUCER-EPOCH STATE      START-TIMESTAMP      TIMEOUT TOPICS
0      dc-1@keres@core@default+loadBalancerNext+transactional       12      787       CompleteAbort  2024-12-06T06:56:02.943Z 1m0s

Restart of rpk cleared up the state and allowed the application to work further.

Full logs
INFO  2024-12-06 05:43:46,407 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=61} sequence: 0
INFO  2024-12-06 05:43:46,407 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=61} sequence: 0
INFO  2024-12-06 05:43:46,407 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=61} sequence: 0
INFO  2024-12-06 05:43:46,407 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=61} sequence: 0
INFO  2024-12-06 05:43:46,407 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=61} sequence: 0
INFO  2024-12-06 05:43:46,408 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=61} sequence: 0
INFO  2024-12-06 05:43:46,408 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=61} sequence: 0
INFO  2024-12-06 05:43:46,408 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=61} sequence: 0
INFO  2024-12-06 05:44:47,516 [shard 0:main] tx - [N:dc-1+metadata-entryAugmentation S:Stable G:23] group.cc:3181 - attempting expiration of producer: {producer_identity: id=4, epoch=63} transaction
INFO  2024-12-06 05:44:47,517 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:47,517 [shard 0:main] tx - tx_gateway_frontend.cc:578 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] found transaction {id: dc-1@keres@core@default+metadata-entry+Augmentation, status: ongoing, pid: {producer_identity: id=4, epoch=63}, last_pid: {producer_identity: id=-1, epoch=-1}, etag: 10, seq: 0, partitions: } to abort
INFO  2024-12-06 05:44:47,519 [shard 0:kafk] tx - [N:dc-1+metadata-entryAugmentation S:Stable G:23] group.cc:2850 - do_abort_tx request:- producer/transaction {producer_identity: id=4, epoch=63} not found, sequence: 0, assuming already aborted.
INFO  2024-12-06 05:44:47,713 [shard 0:main] tx - tx_gateway_frontend.cc:2497 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] producer {producer_identity: id=4, epoch=63} is fenced of by {producer_identity: id=4, epoch=64}
INFO  2024-12-06 05:44:47,720 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:47,724 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
WARN  2024-12-06 05:44:47,725 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/7}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=63}, group: 3434, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60212, evicted: false, transaction_state: {first: 142, last: 142, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:44:47,725 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/9}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=63}, group: 3436, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60212, evicted: false, transaction_state: {first: 81, last: 81, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:44:47,725 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/13}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=63}, group: 3440, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60211, evicted: false, transaction_state: {first: 177, last: 177, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:44:47,725 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/3}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=63}, group: 3430, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60212, evicted: false, transaction_state: {first: 143, last: 143, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:44:47,725 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/2}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=63}, group: 3429, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60212, evicted: false, transaction_state: {first: 189, last: 189, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:44:47,725 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/11}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=63}, group: 3438, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60212, evicted: false, transaction_state: {first: 172, last: 172, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:44:47,725 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/8}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=63}, group: 3435, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60212, evicted: false, transaction_state: {first: 77, last: 77, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
INFO  2024-12-06 05:44:47,725 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:47,730 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:47,730 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:47,730 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:47,730 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:47,730 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:47,730 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:47,730 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:47,730 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:57,725 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:57,726 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:57,726 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:57,726 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:57,726 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:57,726 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:44:57,726 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=63} sequence: 0
INFO  2024-12-06 05:53:21,900 [shard 0:main] kafka - server.cc:167 - Disconnected 192.168.64.1:54663 (applying protocol, Connection reset by peer)
INFO  2024-12-06 05:53:21,900 [shard 0:main] kafka - server.cc:167 - Disconnected 192.168.64.1:54522 (applying protocol, Connection reset by peer)
INFO  2024-12-06 05:53:21,900 [shard 0:main] kafka - server.cc:167 - Disconnected 192.168.64.1:54519 (applying protocol, Connection reset by peer)
INFO  2024-12-06 05:53:21,900 [shard 0:main] kafka - server.cc:167 - Disconnected 192.168.64.1:54489 (applying protocol, Connection reset by peer)
INFO  2024-12-06 05:53:21,900 [shard 0:main] kafka - server.cc:167 - Disconnected 192.168.64.1:54463 (applying protocol, Connection reset by peer)
INFO  2024-12-06 05:53:21,900 [shard 0:main] kafka - server.cc:167 - Disconnected 192.168.64.1:54439 (applying protocol, Connection reset by peer)
INFO  2024-12-06 05:53:21,904 [shard 0:main] kafka - server.cc:167 - Disconnected 192.168.64.1:54426 (applying protocol, Connection reset by peer)
INFO  2024-12-06 05:53:21,904 [shard 0:main] kafka - server.cc:167 - Disconnected 192.168.64.1:54368 (applying protocol, Connection reset by peer)
INFO  2024-12-06 05:53:30,557 [shard 0:main] kafka - server.cc:167 - Disconnected 192.168.64.1:53632 (applying protocol, Connection reset by peer)
INFO  2024-12-06 05:53:30,557 [shard 0:main] kafka - server.cc:167 - Disconnected 192.168.64.1:53615 (applying protocol, Connection reset by peer)
INFO  2024-12-06 05:53:30,557 [shard 0:main] kafka - server.cc:167 - Disconnected 192.168.64.1:53610 (applying protocol, Connection reset by peer)
INFO  2024-12-06 05:53:30,557 [shard 0:main] kafka - server.cc:167 - Disconnected 192.168.64.1:53609 (applying protocol, Connection reset by peer)
INFO  2024-12-06 05:53:30,557 [shard 0:main] kafka - server.cc:167 - Disconnected 192.168.64.1:53605 (applying protocol, Connection reset by peer)
INFO  2024-12-06 05:56:02,783 [shard 0:main] tx - [N:dc-1+metadata-entryAugmentation S:Stable G:25] group.cc:3181 - attempting expiration of producer: {producer_identity: id=4, epoch=65} transaction
INFO  2024-12-06 05:56:02,785 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=65} sequence: 1
INFO  2024-12-06 05:56:02,786 [shard 0:main] tx - tx_gateway_frontend.cc:578 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] found transaction {id: dc-1@keres@core@default+metadata-entry+Augmentation, status: ongoing, pid: {producer_identity: id=4, epoch=65}, last_pid: {producer_identity: id=-1, epoch=-1}, etag: 10, seq: 1, partitions: } to abort
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/2}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3573, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60008, evicted: false, transaction_state: {first: 296, last: 296, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/22}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3593, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60008, evicted: false, transaction_state: {first: 228, last: 228, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/3}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3574, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60008, evicted: false, transaction_state: {first: 224, last: 224, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/11}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3582, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60008, evicted: false, transaction_state: {first: 268, last: 268, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/8}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3579, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60008, evicted: false, transaction_state: {first: 122, last: 122, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/7}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3578, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60008, evicted: false, transaction_state: {first: 218, last: 218, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/13}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3584, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60008, evicted: false, transaction_state: {first: 264, last: 264, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/9}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3580, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60009, evicted: false, transaction_state: {first: 138, last: 138, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/22}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3449, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60008, evicted: false, transaction_state: {first: 146, last: 146, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/7}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3434, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60008, evicted: false, transaction_state: {first: 144, last: 144, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/8}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3435, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60009, evicted: false, transaction_state: {first: 79, last: 79, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/11}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3438, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60008, evicted: false, transaction_state: {first: 174, last: 174, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/2}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3429, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60008, evicted: false, transaction_state: {first: 191, last: 191, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/3}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3430, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60008, evicted: false, transaction_state: {first: 145, last: 145, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/13}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3440, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60008, evicted: false, transaction_state: {first: 179, last: 179, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/9}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3436, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60008, evicted: false, transaction_state: {first: 83, last: 83, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [N:dc-1+metadata-entryAugmentation S:Stable G:25] group.cc:3190 - producer {producer_identity: id=4, epoch=65} transaction expiration result: tx::errc::unknown_server_error
WARN  2024-12-06 05:56:02,789 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/23}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=65}, group: 3594, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60009, evicted: false, transaction_state: {first: 164, last: 164, sequence: 1, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
INFO  2024-12-06 05:56:02,893 [shard 0:main] tx - [N:dc-1+loadBalancerNext+assigner S:Stable G:27] group.cc:3181 - attempting expiration of producer: {producer_identity: id=12, epoch=786} transaction
INFO  2024-12-06 05:56:02,918 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+loadBalancerNext+transactional] trying to abort transaction. pid: {producer_identity: id=12, epoch=786} sequence: 0
INFO  2024-12-06 05:56:02,918 [shard 0:main] tx - tx_gateway_frontend.cc:578 - [tx_id=dc-1@keres@core@default+loadBalancerNext+transactional] found transaction {id: dc-1@keres@core@default+loadBalancerNext+transactional, status: ongoing, pid: {producer_identity: id=12, epoch=786}, last_pid: {producer_identity: id=-1, epoch=-1}, etag: 10, seq: 0, partitions: } to abort
INFO  2024-12-06 05:56:02,944 [shard 0:kafk] tx - [N:dc-1+loadBalancerNext+assigner S:Stable G:27] group.cc:2850 - do_abort_tx request:- producer/transaction {producer_identity: id=12, epoch=786} not found, sequence: 0, assuming already aborted.
INFO  2024-12-06 05:56:02,950 [shard 0:main] tx - tx_gateway_frontend.cc:2497 - [tx_id=dc-1@keres@core@default+loadBalancerNext+transactional] producer {producer_identity: id=12, epoch=786} is fenced of by {producer_identity: id=12, epoch=787}
INFO  2024-12-06 05:56:02,971 [shard 0:main] tx - tx_gateway_frontend.cc:2497 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] producer {producer_identity: id=4, epoch=65} is fenced of by {producer_identity: id=4, epoch=66}
INFO  2024-12-06 05:56:52,199 [shard 0:main] storage - segment.cc:811 - Creating new segment /var/lib/redpanda/data/kafka/instrument-reference-data-pending-dc-1/0_329/2458979-4-v1.log
INFO  2024-12-06 05:57:04,170 [shard 0:main] tx - [N:dc-1+metadata-entryAugmentation S:Stable G:25] group.cc:3181 - attempting expiration of producer: {producer_identity: id=4, epoch=67} transaction
INFO  2024-12-06 05:57:04,172 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=67} sequence: 0
INFO  2024-12-06 05:57:04,172 [shard 0:main] tx - tx_gateway_frontend.cc:578 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] found transaction {id: dc-1@keres@core@default+metadata-entry+Augmentation, status: ongoing, pid: {producer_identity: id=4, epoch=67}, last_pid: {producer_identity: id=-1, epoch=-1}, etag: 10, seq: 0, partitions: } to abort
INFO  2024-12-06 05:57:04,182 [shard 0:kafk] tx - [N:dc-1+metadata-entryAugmentation S:Stable G:25] group.cc:2850 - do_abort_tx request:- producer/transaction {producer_identity: id=4, epoch=67} not found, sequence: 0, assuming already aborted.
INFO  2024-12-06 05:57:04,376 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=67} sequence: 0
INFO  2024-12-06 05:57:04,390 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=67} sequence: 0
WARN  2024-12-06 05:57:04,390 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/3}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=67}, group: 3574, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60221, evicted: false, transaction_state: {first: 226, last: 226, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:57:04,390 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/11}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=67}, group: 3582, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60221, evicted: false, transaction_state: {first: 270, last: 270, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:57:04,390 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/8}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=67}, group: 3579, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60221, evicted: false, transaction_state: {first: 124, last: 124, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:57:04,390 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/7}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=67}, group: 3578, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60221, evicted: false, transaction_state: {first: 220, last: 220, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:57:04,391 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/13}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=67}, group: 3584, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60221, evicted: false, transaction_state: {first: 266, last: 266, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:57:04,391 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/9}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=67}, group: 3580, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60221, evicted: false, transaction_state: {first: 140, last: 140, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:57:04,391 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/8}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=67}, group: 3435, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60221, evicted: false, transaction_state: {first: 81, last: 81, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:57:04,391 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/11}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=67}, group: 3438, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60221, evicted: false, transaction_state: {first: 176, last: 176, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:57:04,391 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/3}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=67}, group: 3430, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60221, evicted: false, transaction_state: {first: 147, last: 147, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:57:04,391 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/13}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=67}, group: 3440, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60221, evicted: false, transaction_state: {first: 181, last: 181, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:57:04,391 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/9}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=67}, group: 3436, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60221, evicted: false, transaction_state: {first: 85, last: 85, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:57:04,391 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/23}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=67}, group: 3594, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60221, evicted: false, transaction_state: {first: 166, last: 166, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:57:04,391 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/2}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=67}, group: 3573, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60221, evicted: false, transaction_state: {first: 298, last: 298, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:57:04,391 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/22}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=67}, group: 3449, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60220, evicted: false, transaction_state: {first: 148, last: 148, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:57:04,391 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/7}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=67}, group: 3434, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60220, evicted: false, transaction_state: {first: 146, last: 146, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:57:04,391 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/2}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=67}, group: 3429, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60221, evicted: false, transaction_state: {first: 193, last: 193, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
INFO  2024-12-06 05:57:04,426 [shard 0:main] tx - tx_gateway_frontend.cc:2497 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] producer {producer_identity: id=4, epoch=67} is fenced of by {producer_identity: id=4, epoch=68}
INFO  2024-12-06 05:58:05,508 [shard 0:main] tx - [N:dc-1+metadata-entryAugmentation S:Stable G:25] group.cc:3181 - attempting expiration of producer: {producer_identity: id=4, epoch=69} transaction
INFO  2024-12-06 05:58:05,512 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=69} sequence: 0
INFO  2024-12-06 05:58:05,512 [shard 0:main] tx - tx_gateway_frontend.cc:578 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] found transaction {id: dc-1@keres@core@default+metadata-entry+Augmentation, status: ongoing, pid: {producer_identity: id=4, epoch=69}, last_pid: {producer_identity: id=-1, epoch=-1}, etag: 10, seq: 0, partitions: } to abort
WARN  2024-12-06 05:58:05,514 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/22}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=69}, group: 3593, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60006, evicted: false, transaction_state: {first: 232, last: 232, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:58:05,514 [shard 0:main] tx - [N:dc-1+metadata-entryAugmentation S:Stable G:25] group.cc:3190 - producer {producer_identity: id=4, epoch=69} transaction expiration result: tx::errc::unknown_server_error
INFO  2024-12-06 05:58:05,716 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=69} sequence: 0
INFO  2024-12-06 05:58:05,717 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=69} sequence: 0
INFO  2024-12-06 05:58:05,717 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=69} sequence: 0
INFO  2024-12-06 05:58:05,735 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=69} sequence: 0
WARN  2024-12-06 05:58:05,742 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/13}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=69}, group: 3584, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60233, evicted: false, transaction_state: {first: 268, last: 268, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:58:05,742 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/8}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=69}, group: 3435, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60233, evicted: false, transaction_state: {first: 83, last: 83, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:58:05,742 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/2}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=69}, group: 3429, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60233, evicted: false, transaction_state: {first: 195, last: 195, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:58:05,742 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/7}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=69}, group: 3434, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60233, evicted: false, transaction_state: {first: 148, last: 148, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:58:05,742 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/22}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=69}, group: 3449, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60233, evicted: false, transaction_state: {first: 150, last: 150, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:58:05,742 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/2}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=69}, group: 3573, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60233, evicted: false, transaction_state: {first: 300, last: 300, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:58:05,742 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/23}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=69}, group: 3594, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60233, evicted: false, transaction_state: {first: 168, last: 168, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:58:05,742 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/13}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=69}, group: 3440, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60233, evicted: false, transaction_state: {first: 183, last: 183, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:58:05,742 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/3}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=69}, group: 3430, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60233, evicted: false, transaction_state: {first: 149, last: 149, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:58:05,742 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/11}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=69}, group: 3438, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60233, evicted: false, transaction_state: {first: 178, last: 178, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:58:05,742 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/9}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=69}, group: 3580, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60233, evicted: false, transaction_state: {first: 142, last: 142, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 05:58:05,742 [shard 0:main] tx - [{kafka/metadata-entry-snapshots-dc-1/9}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=69}, group: 3436, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60233, evicted: false, transaction_state: {first: 87, last: 87, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
INFO  2024-12-06 05:58:05,796 [shard 0:main] tx - tx_gateway_frontend.cc:2497 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] producer {producer_identity: id=4, epoch=69} is fenced of by {producer_identity: id=4, epoch=70}
INFO  2024-12-06 05:58:15,759 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=69} sequence: 0
INFO  2024-12-06 05:58:15,769 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=69} sequence: 0
INFO  2024-12-06 05:58:15,769 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=69} sequence: 0
INFO  2024-12-06 05:58:15,769 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=69} sequence: 0
INFO  2024-12-06 05:58:15,769 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=69} sequence: 0
INFO  2024-12-06 05:58:15,769 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=69} sequence: 0
INFO  2024-12-06 05:58:15,769 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=69} sequence: 0
INFO  2024-12-06 05:58:15,769 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=69} sequence: 0
INFO  2024-12-06 05:59:06,990 [shard 0:main] tx - [N:dc-1+metadata-entryAugmentation S:Stable G:25] group.cc:3181 - attempting expiration of producer: {producer_identity: id=4, epoch=71} transaction
INFO  2024-12-06 05:59:06,991 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=71} sequence: 0
INFO  2024-12-06 05:59:06,991 [shard 0:main] tx - tx_gateway_frontend.cc:578 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] found transaction {id: dc-1@keres@core@default+metadata-entry+Augmentation, status: ongoing, pid: {producer_identity: id=4, epoch=71}, last_pid: {producer_identity: id=-1, epoch=-1}, etag: 10, seq: 0, partitions: } to abort
INFO  2024-12-06 05:59:06,994 [shard 0:kafk] tx - [N:dc-1+metadata-entryAugmentation S:Stable G:25] group.cc:2850 - do_abort_tx request:- producer/transaction {producer_identity: id=4, epoch=71} not found, sequence: 0, assuming already aborted.
INFO  2024-12-06 05:59:07,149 [shard 0:main] tx - tx_gateway_frontend.cc:2497 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] producer {producer_identity: id=4, epoch=71} is fenced of by {producer_identity: id=4, epoch=72}
INFO  2024-12-06 05:59:07,178 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=71} sequence: 0
INFO  2024-12-06 05:59:07,179 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=71} sequence: 0
INFO  2024-12-06 05:59:07,179 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=71} sequence: 0
INFO  2024-12-06 05:59:07,179 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=71} sequence: 0
INFO  2024-12-06 05:59:07,182 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=71} sequence: 0
INFO  2024-12-06 05:59:07,183 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=71} sequence: 0
INFO  2024-12-06 05:59:07,183 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=71} sequence: 0
INFO  2024-12-06 05:59:07,183 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=71} sequence: 0
INFO  2024-12-06 05:59:07,183 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=71} sequence: 0
INFO  2024-12-06 06:00:08,305 [shard 0:main] tx - [N:dc-1+metadata-entryAugmentation S:Stable G:25] group.cc:3181 - attempting expiration of producer: {producer_identity: id=4, epoch=73} transaction
INFO  2024-12-06 06:00:08,306 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:08,306 [shard 0:main] tx - tx_gateway_frontend.cc:578 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] found transaction {id: dc-1@keres@core@default+metadata-entry+Augmentation, status: ongoing, pid: {producer_identity: id=4, epoch=73}, last_pid: {producer_identity: id=-1, epoch=-1}, etag: 10, seq: 0, partitions: } to abort
WARN  2024-12-06 06:00:08,306 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/3}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=73}, group: 3574, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60001, evicted: false, transaction_state: {first: 232, last: 232, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
WARN  2024-12-06 06:00:08,307 [shard 0:main] tx - [{kafka/metadata-entry-updates-dc-1/9}] - rm_stm.cc:1461 - state of transaction { id: {producer_identity: id=4, epoch=73}, group: 3580, requests: { inflight: 0, finished: 1 }, ms_since_last_update: 60001, evicted: false, transaction_state: {first: 146, last: 146, sequence: 0, timeout: {60000}, coordinator partition: 32, status: initialized }} is unknown:tx::errc::unknown_server_error
INFO  2024-12-06 06:00:08,308 [shard 0:kafk] tx - [N:dc-1+metadata-entryAugmentation S:Stable G:25] group.cc:2850 - do_abort_tx request:- producer/transaction {producer_identity: id=4, epoch=73} not found, sequence: 0, assuming already aborted.
INFO  2024-12-06 06:00:08,468 [shard 0:main] tx - tx_gateway_frontend.cc:2497 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] producer {producer_identity: id=4, epoch=73} is fenced of by {producer_identity: id=4, epoch=74}
INFO  2024-12-06 06:00:08,506 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:08,506 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:08,506 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:08,506 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:08,506 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:08,506 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:08,506 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:08,506 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:08,506 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:08,506 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:08,506 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:08,506 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:08,506 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:08,506 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:08,506 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:08,506 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:18,310 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
INFO  2024-12-06 06:00:18,314 [shard 0:main] tx - tx_gateway_frontend.cc:513 - [tx_id=dc-1@keres@core@default+metadata-entry+Augmentation] trying to abort transaction. pid: {producer_identity: id=4, epoch=73} sequence: 0
@vsarunas vsarunas added the kind/bug Something isn't working label Dec 6, 2024
@dotnwat
Copy link
Member

dotnwat commented Dec 6, 2024

@bharathv replication team?

@bharathv
Copy link
Contributor

bharathv commented Dec 6, 2024

@vsarunas Based on my reading of the logs, everything is working as expected. There was an expired group transaction which triggered an internal abort of the transaction. Internal aborts (initiated by Redpanda, not client) are treated specially compared to client initiated aborts by bumping the epoch. This is done to fence all further client requests with previous epoch and the client has to init again to make progress.

So I believe that resulted in a "FENCED" error when the client attempted to abort again.

Restart of rpk

What exactly did you restart? Client or the broker? I'd expect the client to init_transactions() again in this case and it should unblock itself.

@vsarunas
Copy link
Author

vsarunas commented Dec 9, 2024

@bharathv, the application was restarted but could not complete a transaction. After restart of the broker, application started to work correctly.

The issue reproduced again on another system for the same user; but same version v24.2.8. Upgraded now to v24.3.1.

Same issue group.cc:2850 - do_abort_tx request:- producer/transaction {producer_identity: id=1007, epoch=25} not found, sequence: 0, assuming already aborted., did you see the full log when expanding Full logs ?

@bharathv
Copy link
Contributor

bharathv commented Dec 9, 2024

did you see the full log when expanding Full logs ?

Yes, i did, this log line is not a problem.

the application was restarted but could not complete a transaction. After restart of the broker, application started to work correctly.

Thanks, do you have client logs (timestamps and error codes, pre and post RP restart) and a rough pseudo code of what it does? (exception handling)

@blindspotbounty
Copy link

Seems I found the reason of such a strange error, i.e. "This instance has been fenced by a newer instance" but still can't understand the root cause.

In short, we are reading consumer topic and perform transaction for every received message.
At some point, something happens to response for AddPartitionsToTxnRequest and it timeouts.

From librdkafka logs:

2025-01-08T17:01:16.166060+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: Transactional API commit_transaction (intermediary, calling) result set at rd_kafka_txn_endtxn_complete:2112: Success (no previous result)
2025-01-08T17:01:16.166062+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: Committed transaction now acked by application
2025-01-08T17:01:16.166064+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: Transaction successfully committed
2025-01-08T17:01:16.166068+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: Transaction state change CommitNotAcked -> Ready
2025-01-08T17:01:16.166069+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: Transactional API commit_transaction (intermediary, calling) result set at rd_kafka_txn_op_commit_transaction_ack:2504: Success (no previous result)
2025-01-08T17:01:16.166071+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:app]: Transactional API commit_transaction return at rd_kafka_commit_transaction:2596: Success
2025-01-08T17:01:16.166074+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:app]: Transactional API called: begin_transaction (in txn state Ready, idemp state Assigned, API timeout 0)
2025-01-08T17:01:16.166075+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: Transaction state change Ready -> InTransaction
2025-01-08T17:01:16.166078+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: 192.168.64.2:9092/0: Wake-up: begin transaction
2025-01-08T17:01:16.166080+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: TxnCoordinator/0: Wake-up: begin transaction
2025-01-08T17:01:16.166082+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: linux-dev:9092/bootstrap: Wake-up: begin transaction
2025-01-08T17:01:16.166085+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: Wake-up sent to 3 broker threads in state >= INIT: begin transaction
2025-01-08T17:01:16.166087+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: Transactional API begin_transaction (intermediary, calling) result set at rd_kafka_txn_op_begin_transaction:1446: Success (no previous result)
2025-01-08T17:01:16.166089+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:app]: Transactional API begin_transaction return at rd_kafka_begin_transaction:1463: Success
2025-01-08T17:01:16.166090+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:app]: Marked cache-assignment-snapshots-dc-1 [18] as part of transaction: scheduling registration
2025-01-08T17:01:16.166092+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:app]: Transactional API called: send_offsets_to_transaction (in txn state InTransaction, idemp state Assigned, API timeout -1)
2025-01-08T17:01:16.166095+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator/0: Sent AddOffsetsToTxnRequest (v0, 147 bytes @ 0, CorrId 79)
2025-01-08T17:01:16.166097+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: TxnCoordinator/0: Registering partitions with transaction
2025-01-08T17:01:16.166098+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator/0: Sent AddPartitionsToTxnRequest (v0, 160 bytes @ 0, CorrId 80)
2025-01-08T17:01:16.166100+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator/0: Received AddOffsetsToTxnResponse (v0, 6 bytes, CorrId 79, rtt 3.22ms)
2025-01-08T17:01:16.166101+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: AddOffsetsToTxn response from TxnCoordinator/0: NO_ERROR ()
2025-01-08T17:01:16.166103+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:192.168.64.2:9092/0]: 192.168.64.2:9092/0: Sent TxnOffsetCommitRequest (v3, 497 bytes @ 0, CorrId 52)
2025-01-08T17:01:16.166105+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:192.168.64.2:9092/0]: 192.168.64.2:9092/0: Received TxnOffsetCommitResponse (v3, 46 bytes, CorrId 52, rtt 1.78ms)
2025-01-08T17:01:16.166108+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: Transactional API send_offsets_to_transaction (intermediary, calling) result set at rd_kafka_txn_handle_TxnOffsetCommit:1632: Success (no previous result)
2025-01-08T17:01:16.166110+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:app]: Transactional API send_offsets_to_transaction return at rd_kafka_send_offsets_to_transaction:2040: Success
2025-01-08T17:01:16.166111+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:app]: Transactional API called: commit_transaction (in txn state InTransaction, idemp state Assigned, API timeout -1)
2025-01-08T17:01:16.166113+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: Transaction state change InTransaction -> BeginCommit
2025-01-08T17:01:16.166115+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: Transactional API commit_transaction (intermediary, calling) result set at rd_kafka_txn_op_begin_commit:2472: Success (no previous result)
2025-01-08T17:01:16.166118+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:app]: Flushing 35 outstanding message(s) prior to commit
2025-01-08T17:01:16.166119+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:app]: 192.168.64.2:9092/0: Wake-up: flushing
2025-01-08T17:01:16.166121+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:app]: TxnCoordinator/0: Wake-up: flushing
2025-01-08T17:01:16.166122+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:app]: linux-dev:9092/bootstrap: Wake-up: flushing
2025-01-08T17:01:16.166126+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:app]: Wake-up sent to 3 broker threads in state >= UP: flushing

then after 60 seconds (transaction timeout), librdkafka timeout the request and perform re-connect:

2025-01-08T17:02:16.220768+02:00 NOTICE storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator/0: Timed out AddPartitionsToTxnRequest in flight (after 60058ms, timeout #0)
2025-01-08T17:02:16.220774+02:00 WARNING storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator/0: Timed out 1 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests
2025-01-08T17:02:16.220799+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator/0: 192.168.64.2:9092: 1 request(s) timed out: disconnect (after 60604ms in state UP) (_TIMED_OUT)
2025-01-08T17:02:16.220805+02:00 ERROR storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator: 192.168.64.2:9092: 1 request(s) timed out: disconnect (after 60604ms in state UP)
2025-01-08T17:02:16.220807+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator: Broker changed state UP -> DOWN
2025-01-08T17:02:16.220813+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: Broadcasting state change
2025-01-08T17:02:16.220814+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator/0: Purging bufq with 0 buffers
2025-01-08T17:02:16.220817+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator/0: Purging bufq with 0 buffers
2025-01-08T17:02:16.220818+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator/0: Updating 0 buffers on connection reset
2025-01-08T17:02:16.220820+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: Requesting metadata for 1/1 topics: broker down
2025-01-08T17:02:16.220821+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:main]: TxnCoordinator/0: Transaction coordinator is now down
2025-01-08T17:02:16.220824+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: 192.168.64.2:9092/0: Request metadata for 1 topic(s): broker down
2025-01-08T17:02:16.220826+02:00 ERROR storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:app]: dc-1@macbook-pro@core@default#producer-47: TxnCoordinator: 192.168.64.2:9092: 1 request(s) timed out: disconnect (after 60604ms in state UP)
2025-01-08T17:02:16.220829+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator: Broker changed state DOWN -> INIT
2025-01-08T17:02:16.220830+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: Broadcasting state change
2025-01-08T17:02:16.220831+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator: Broker changed state INIT -> TRY_CONNECT
2025-01-08T17:02:16.220834+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: Broadcasting state change
2025-01-08T17:02:16.220834+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator/0: broker in state TRY_CONNECT connecting
2025-01-08T17:02:16.220837+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator: Broker changed state TRY_CONNECT -> CONNECT
2025-01-08T17:02:16.220839+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: Broadcasting state change
2025-01-08T17:02:16.220840+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:192.168.64.2:9092/0]: 192.168.64.2:9092/0: cache-assignment-snapshots-dc-1 [18]: timed out 0+1 message(s) (MsgId 1..1): message.timeout.ms exceeded
2025-01-08T17:02:16.220846+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:192.168.64.2:9092/0]: Need transaction abort before beginning partition drain in state Assigned for PID{Id:9,Epoch:6831} epoch bump for 0 partition(s) with in-flight requests: 1 message(s) timed out on cache-assignment-snapshots-dc-1 [18]
2025-01-08T17:02:16.220885+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:192.168.64.2:9092/0]: Idempotent producer state change Assigned -> WaitTxnAbort
2025-01-08T17:02:16.220889+02:00 ERROR storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:192.168.64.2:9092/0]: Current transaction failed in state BeginCommit: 1 message(s) timed out on cache-assignment-snapshots-dc-1 [18] (_TIMED_OUT, requires epoch bump)
2025-01-08T17:02:16.220890+02:00 DEBUG storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:192.168.64.2:9092/0]: Transaction state change BeginCommit -> AbortableError

That leads to "Local: This instance has been fenced by a newer instance".

However, it seems that the main problem is around AddPartitionsToTxnRequest that was timeout.
I see the beginning of the transaction:

TRACE 2025-01-08 15:01:16,236 [shard 0:main] tx - tx_gateway_frontend.cc:1464 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] begin_tx request for pid: {producer_identity: id=9, epoch=6831} at ntp: {kafka/cache-assignment-snapshots-dc-1/13} result: tx::errc::none
TRACE 2025-01-08 15:01:16,236 [shard 0:main] tx - tm_stm.h:83 - released_lock name:add_partition_to_tx, tx_id:dc-1@macbook-pro@core@default+loadBalancerNext+transactional
TRACE 2025-01-08 15:01:16,236 [shard 0:main] kafka - handler.h:68 - [client_id: {dc-1@macbook-pro@core@default}] handling produce v7 request {transactional_id={dc-1@macbook-pro@core@default+loadBalancerNext+transactional} acks=-1 timeout_ms=30000 topics=[{name=cache-assignment-snapshots-dc-1 partitions=[{partition_index=13 records={batch {records: 1, size: 400
} v2_format true valid_crc true}},]},]}
TRACE 2025-01-08 15:01:16,238 [shard 0:main] kafka - handler.h:68 - [client_id: {dc-1@macbook-pro@core@default}] handling end_txn v1 request {transactional_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional producer_id=9 producer_epoch=6831 committed=true}
TRACE 2025-01-08 15:01:16,238 [shard 0:main] tx - tx_gateway_frontend.cc:1620 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] end transaction. producer id: 9, producer epoch: 6831, committed: true
TRACE 2025-01-08 15:01:16,238 [shard 0:main] tx - tx_gateway_frontend.cc:400 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] getting coordinator ntp
TRACE 2025-01-08 15:01:16,238 [shard 0:main] tx - tm_stm.h:60 - got_lock name:end_txn, tx_id:dc-1@macbook-pro@core@default+loadBalancerNext+transactional
TRACE 2025-01-08 15:01:16,238 [shard 0:main] tx - tx_gateway_frontend.cc:2492 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] Getting latest tx for pid: {producer_identity: id=9, epoch=6831} in term: 31
TRACE 2025-01-08 15:01:16,238 [shard 0:main] tx - [{kafka_internal/tx/6}] - tm_stm.cc:297 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] updating transaction status with: preparing_commit in term: 31
TRACE 2025-01-08 15:01:16,238 [shard 0:main] tx - [{kafka_internal/tx/6}] - tm_stm.cc:232 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] updating transaction: {id: dc-1@macbook-pro@core@default+loadBalancerNext+transactional, status: preparing_commit, pid: {producer_identity: id=9, epoch=6831}, last_pid: {producer_identity: id=-1, epoch
=-1}, etag: 31, seq: 24, partitions: {ntp: {kafka/cache-assignment-snapshots-dc-1/13}, etag: 2, revision: 5258}} in term: 31
TRACE 2025-01-08 15:01:16,238 [shard 0:main] tx - [{kafka_internal/tx/6}] - tm_stm.cc:718 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] applying transaction: {id: dc-1@macbook-pro@core@default+loadBalancerNext+transactional, status: preparing_commit, pid: {producer_identity: id=9, epoch=6831}, last_pid: {producer_identity: id=-1, epoch
=-1}, etag: 31, seq: 24, partitions: {ntp: {kafka/cache-assignment-snapshots-dc-1/13}, etag: 2, revision: 5258}}
TRACE 2025-01-08 15:01:16,238 [shard 0:main] tx - tx_gateway_frontend.cc:2143 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] committing transaction: {id: dc-1@macbook-pro@core@default+loadBalancerNext+transactional, status: ongoing, pid: {producer_identity: id=9, epoch=6831}, last_pid: {producer_identity: id=-1, epoch=-1}, etag: 31, seq
: 24, partitions: {ntp: {kafka/cache-assignment-snapshots-dc-1/13}, etag: 2, revision: 5258}} data
TRACE 2025-01-08 15:01:16,238 [shard 0:main] tx - rm_group_frontend.cc:372 - processing name:commit_group_tx, group_id:dc-1+loadBalancerNext+assigner, pid:{producer_identity: id=9, epoch=6831}, tx_seq:24
TRACE 2025-01-08 15:01:16,238 [shard 0:kafk] tx - [N:dc-1+loadBalancerNext+assigner S:Stable G:45] group.cc:1806 - processing commit_tx request: {ntp {kafka/__consumer_offsets/2} pid {producer_identity: id=9, epoch=6831} tx_seq 24 group_id dc-1+loadBalancerNext+assigner timeout 2000}
TRACE 2025-01-08 15:01:16,238 [shard 0:kafk] tx - [N:dc-1+loadBalancerNext+assigner S:Stable G:45] group.cc:2997 - processing do_commit_tx request: pid: dc-1+loadBalancerNext+assigner
TRACE 2025-01-08 15:01:16,240 [shard 0:main] tx - rm_group_frontend.cc:380 - sending name:commit_group_tx, group_id:dc-1+loadBalancerNext+assigner, pid:{producer_identity: id=9, epoch=6831}, tx_seq:24, ec:tx::errc::none
TRACE 2025-01-08 15:01:16,240 [shard 0:main] tx - tx_gateway_frontend.cc:2057 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] marking transaction {id: dc-1@macbook-pro@core@default+loadBalancerNext+transactional, status: ongoing, pid: {producer_identity: id=9, epoch=6831}, last_pid: {producer_identity: id=-1, epoch=-1}, etag: 31, seq: 24
, partitions: {ntp: {kafka/cache-assignment-snapshots-dc-1/13}, etag: 2, revision: 5258}} as commit completed in term: 31
TRACE 2025-01-08 15:01:16,240 [shard 0:main] tx - [{kafka_internal/tx/6}] - tm_stm.cc:335 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] finishing transaction with status of: completed_commit
TRACE 2025-01-08 15:01:16,240 [shard 0:main] tx - [{kafka_internal/tx/6}] - tm_stm.cc:232 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] updating transaction: {id: dc-1@macbook-pro@core@default+loadBalancerNext+transactional, status: completed_commit, pid: {producer_identity: id=9, epoch=6831}, last_pid: {producer_identity: id=-1, epoch
=-1}, etag: 31, seq: 24, partitions: {ntp: {kafka/cache-assignment-snapshots-dc-1/13}, etag: 2, revision: 5258}} in term: 31
TRACE 2025-01-08 15:01:16,240 [shard 0:main] kafka - handler.h:68 - [client_id: {dc-1@macbook-pro@core@default}] handling add_offsets_to_txn v0 request {transactional_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional producer_id=9 producer_epoch=6831 group_id=dc-1+loadBalancerNext+assigner}
TRACE 2025-01-08 15:01:16,240 [shard 0:main] tx - tx_gateway_frontend.cc:1480 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] adding offsets to tx, group_id: dc-1+loadBalancerNext+assigner, producer id: 9, producer epoch: 6831
TRACE 2025-01-08 15:01:16,240 [shard 0:main] tx - tx_gateway_frontend.cc:400 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] getting coordinator ntp
TRACE 2025-01-08 15:01:16,241 [shard 0:main] tx - [{kafka_internal/tx/6}] - tm_stm.cc:718 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] applying transaction: {id: dc-1@macbook-pro@core@default+loadBalancerNext+transactional, status: completed_commit, pid: {producer_identity: id=9, epoch=6831}, last_pid: {producer_identity: id=-1, epoch
=-1}, etag: 31, seq: 24, partitions: {ntp: {kafka/cache-assignment-snapshots-dc-1/13}, etag: 2, revision: 5258}}
TRACE 2025-01-08 15:01:16,241 [shard 0:main] tx - tm_stm.h:83 - released_lock name:end_txn, tx_id:dc-1@macbook-pro@core@default+loadBalancerNext+transactional
TRACE 2025-01-08 15:01:16,241 [shard 0:main] tx - tm_stm.h:60 - got_lock name:add_offsets_to_tx, tx_id:dc-1@macbook-pro@core@default+loadBalancerNext+transactional
TRACE 2025-01-08 15:01:16,241 [shard 0:main] tx - tx_gateway_frontend.cc:2492 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] Getting latest tx for pid: {producer_identity: id=9, epoch=6831} in term: 31
TRACE 2025-01-08 15:01:16,241 [shard 0:main] tx - tx_gateway_frontend.cc:1258 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] initializing transaction {id: dc-1@macbook-pro@core@default+loadBalancerNext+transactional, status: completed_commit, pid: {producer_identity: id=9, epoch=6831}, last_pid: {producer_identity: id=-1, epoch=-1}, eta
g: 31, seq: 24, partitions: {ntp: {kafka/cache-assignment-snapshots-dc-1/13}, etag: 2, revision: 5258}} for add resources
TRACE 2025-01-08 15:01:16,241 [shard 0:main] tx - [{kafka_internal/tx/6}] - tm_stm.cc:425 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] reset transaction state to: {id: dc-1@macbook-pro@core@default+loadBalancerNext+transactional, status: empty, pid: {producer_identity: id=9, epoch=6831}, last_pid: {producer_identity: id=-1, epoch=-1},
 etag: 31, seq: 25, partitions: }
TRACE 2025-01-08 15:01:16,241 [shard 0:main] tx - rm_group_frontend.cc:257 - processing name:begin_group_tx, group_id:dc-1+loadBalancerNext+assigner, pid:{producer_identity: id=9, epoch=6831}, tx_seq:25, timeout: 60000
TRACE 2025-01-08 15:01:16,241 [shard 0:kafk] tx - [N:dc-1+loadBalancerNext+assigner S:Stable G:45] group.cc:1830 - processing begin tx request: {ntp {kafka/__consumer_offsets/2} group_id dc-1+loadBalancerNext+assigner pid {producer_identity: id=9, epoch=6831} tx_seq 25 timeout 60000 tm_partition: 6}
TRACE 2025-01-08 15:01:16,242 [shard 0:main] tx - rm_group_frontend.cc:267 - sending name:begin_group_tx, group_id:dc-1+loadBalancerNext+assigner, pid:{producer_identity: id=9, epoch=6831}, tx_seq:25, ec:tx::errc::none, etag:31
TRACE 2025-01-08 15:01:16,242 [shard 0:main] tx - [{kafka_internal/tx/6}] - tm_stm.cc:232 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] updating transaction: {id: dc-1@macbook-pro@core@default+loadBalancerNext+transactional, status: ongoing, pid: {producer_identity: id=9, epoch=6831}, last_pid: {producer_identity: id=-1, epoch=-1}, eta
g: 31, seq: 25, partitions: } in term: 31
TRACE 2025-01-08 15:01:16,242 [shard 0:main] tx - [{kafka_internal/tx/6}] - tm_stm.cc:718 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] applying transaction: {id: dc-1@macbook-pro@core@default+loadBalancerNext+transactional, status: ongoing, pid: {producer_identity: id=9, epoch=6831}, last_pid: {producer_identity: id=-1, epoch=-1}, eta
g: 31, seq: 25, partitions: }
TRACE 2025-01-08 15:01:16,242 [shard 0:main] tx - [{kafka_internal/tx/6}] - tm_stm.cc:579 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] transaction: {id: dc-1@macbook-pro@core@default+loadBalancerNext+transactional, status: ongoing, pid: {producer_identity: id=9, epoch=6831}, last_pid: {producer_identity: id=-1, epoch=-1}, etag: 31, se
q: 25, partitions: } added with etag: 31
TRACE 2025-01-08 15:01:16,242 [shard 0:main] tx - tm_stm.h:83 - released_lock name:add_offsets_to_tx, tx_id:dc-1@macbook-pro@core@default+loadBalancerNext+transactional
TRACE 2025-01-08 15:01:16,242 [shard 0:main] kafka - handler.h:68 - [client_id: {dc-1@macbook-pro@core@default}] handling add_partitions_to_txn v0 request {transactional_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional producer_id=9 producer_epoch=6831 topics=[{name=cache-assignment-snapshots-dc-1 partitions={18}},]}
TRACE 2025-01-08 15:01:16,242 [shard 0:main] tx - tx_gateway_frontend.cc:400 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] getting coordinator ntp
TRACE 2025-01-08 15:01:16,242 [shard 0:main] tx - tx_gateway_frontend.cc:1185 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] adding partition to tx. pid: 9, epoch: 6831, topics: {{topic: cache-assignment-snapshots-dc-1, partitions: {18}}}
TRACE 2025-01-08 15:01:16,242 [shard 0:main] tx - tm_stm.h:60 - got_lock name:add_partition_to_tx, tx_id:dc-1@macbook-pro@core@default+loadBalancerNext+transactional
TRACE 2025-01-08 15:01:16,242 [shard 0:main] tx - tx_gateway_frontend.cc:2492 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] Getting latest tx for pid: {producer_identity: id=9, epoch=6831} in term: 31
TRACE 2025-01-08 15:01:16,242 [shard 0:main] tx - tx_gateway_frontend.cc:1258 - [tx_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional] initializing transaction {id: dc-1@macbook-pro@core@default+loadBalancerNext+transactional, status: ongoing, pid: {producer_identity: id=9, epoch=6831}, last_pid: {producer_identity: id=-1, epoch=-1}, etag: 31, se
q: 25, partitions: } for add resources
TRACE 2025-01-08 15:01:16,242 [shard 0:main] tx - tm_stm.h:83 - released_lock name:add_partition_to_tx, tx_id:dc-1@macbook-pro@core@default+loadBalancerNext+transactional
TRACE 2025-01-08 15:01:16,243 [shard 0:main] kafka - handler.h:68 - [client_id: {dc-1@macbook-pro@core@default}] handling txn_offset_commit v3 request {transactional_id=dc-1@macbook-pro@core@default+loadBalancerNext+transactional group_id=dc-1+loadBalancerNext+assigner producer_id=9 producer_epoch=6831 generation_id=45 member_id=dc-1@macbook-pro@core@default+C
onsumerGroup { name='dc-1+loadBalancerNext+assigner', topics='[cache-assignment-pending-dc-1]' }-d5510208-b478-4aac-8f93-df2ca4d70338 group_instance_id={dc-1@macbook-pro@core@default+ConsumerGroup { name='dc-1+loadBalancerNext+assigner', topics='[cache-assignment-pending-dc-1]' }} topics={{name=cache-assignment-pending-dc-1 partitions={{partition_index=9 commi
tted_offset=20 committed_leader_epoch=-1 committed_metadata={}}}}}}
TRACE 2025-01-08 15:01:18,826 [shard 0:main] kafka - handler.h:68 - [client_id: {dc-1@macbook-pro@core@default}] handling heartbeat v3 request {group_id=dc-1+loadBalancerNext+assigner generation_id=45 member_id=dc-1@macbook-pro@core@default+ConsumerGroup { name='dc-1+loadBalancerNext+assigner', topics='[cache-assignment-pending-dc-1]' }-d5510208-b478-4aac-8f93
-df2ca4d70338 group_instance_id={dc-1@macbook-pro@core@default+ConsumerGroup { name='dc-1+loadBalancerNext+assigner', topics='[cache-assignment-pending-dc-1]' }}}

and then just lots of heartbeats and attempt of expiration:

TRACE 2025-01-08 15:02:10,145 [shard 0:kafk] kafka - [N:dc-1+loadBalancerNext+assigner S:Stable G:45] group.cc:1354 - Scheduling heartbeat expiration 45000 ms for dc-1@macbook-pro@core@default+ConsumerGroup { name='dc-1+loadBalancerNext+assigner', topics='[cache-assignment-pending-dc-1]' }-d5510208-b478-4aac-8f93-df2ca4d70338
TRACE 2025-01-08 15:02:13,160 [shard 0:main] kafka - handler.h:68 - [client_id: {dc-1@macbook-pro@core@default}] handling heartbeat v3 request {group_id=dc-1+loadBalancerNext+assigner generation_id=45 member_id=dc-1@macbook-pro@core@default+ConsumerGroup { name='dc-1+loadBalancerNext+assigner', topics='[cache-assignment-pending-dc-1]' }-d5510208-b478-4aac-8f93-df2ca4d70338 group_instance_id={dc-1@macbook-pro@core@default+ConsumerGroup { name='dc-1+loadBalancerNext+assigner', topics='[cache-assignment-pending-dc-1]' }}}
TRACE 2025-01-08 15:02:13,160 [shard 0:kafk] kafka - [N:dc-1+loadBalancerNext+assigner S:Stable G:45] group.cc:1638 - Handling heartbeat request {group_id=dc-1+loadBalancerNext+assigner generation_id=45 member_id=dc-1@macbook-pro@core@default+ConsumerGroup { name='dc-1+loadBalancerNext+assigner', topics='[cache-assignment-pending-dc-1]' }-d5510208-b478-4aac-8f93-df2ca4d70338 group_instance_id={dc-1@macbook-pro@core@default+ConsumerGroup { name='dc-1+loadBalancerNext+assigner', topics='[cache-assignment-pending-dc-1]' }}}
TRACE 2025-01-08 15:02:13,160 [shard 0:kafk] kafka - [N:dc-1+loadBalancerNext+assigner S:Stable G:45] group.cc:1354 - Scheduling heartbeat expiration 45000 ms for dc-1@macbook-pro@core@default+ConsumerGroup { name='dc-1+loadBalancerNext+assigner', topics='[cache-assignment-pending-dc-1]' }-d5510208-b478-4aac-8f93-df2ca4d70338
TRACE 2025-01-08 15:02:16,176 [shard 0:main] kafka - handler.h:68 - [client_id: {dc-1@macbook-pro@core@default}] handling heartbeat v3 request {group_id=dc-1+loadBalancerNext+assigner generation_id=45 member_id=dc-1@macbook-pro@core@default+ConsumerGroup { name='dc-1+loadBalancerNext+assigner', topics='[cache-assignment-pending-dc-1]' }-d5510208-b478-4aac-8f93-df2ca4d70338 group_instance_id={dc-1@macbook-pro@core@default+ConsumerGroup { name='dc-1+loadBalancerNext+assigner', topics='[cache-assignment-pending-dc-1]' }}}
TRACE 2025-01-08 15:02:16,176 [shard 0:kafk] kafka - [N:dc-1+loadBalancerNext+assigner S:Stable G:45] group.cc:1638 - Handling heartbeat request {group_id=dc-1+loadBalancerNext+assigner generation_id=45 member_id=dc-1@macbook-pro@core@default+ConsumerGroup { name='dc-1+loadBalancerNext+assigner', topics='[cache-assignment-pending-dc-1]' }-d5510208-b478-4aac-8f93-df2ca4d70338 group_instance_id={dc-1@macbook-pro@core@default+ConsumerGroup { name='dc-1+loadBalancerNext+assigner', topics='[cache-assignment-pending-dc-1]' }}}
TRACE 2025-01-08 15:02:16,176 [shard 0:kafk] kafka - [N:dc-1+loadBalancerNext+assigner S:Stable G:45] group.cc:1354 - Scheduling heartbeat expiration 45000 ms for dc-1@macbook-pro@core@default+ConsumerGroup { name='dc-1+loadBalancerNext+assigner', topics='[cache-assignment-pending-dc-1]' }-d5510208-b478-4aac-8f93-df2ca4d70338
INFO  2025-01-08 15:02:16,244 [shard 0:main] tx - [N:dc-1+loadBalancerNext+assigner S:Stable G:45] group.cc:3237 - attempting expiration of producer: {producer_identity: id=9, epoch=6831} transaction
TRACE 2025-01-08 15:02:16,244 [shard 0:main] tx - [N:dc-1+loadBalancerNext+assigner S:Stable G:45] group.cc:3252 - aborting producer {producer_identity: id=9, epoch=6831} transaction
TRACE 2025-01-08 15:02:16,244 [shard 0:main] tx - [N:dc-1+loadBalancerNext+assigner S:Stable G:45] group.cc:3266 - sending abort tx request for producer {producer_identity: id=9, epoch=6831} with tx_seq: 25 to coordinator partition: 6

@bharathv could you suggest if any other information is required, please?

@bharathv
Copy link
Contributor

bharathv commented Jan 8, 2025

qq:

2025-01-08T17:02:16.220768+02:00 NOTICE storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator/0: Timed out AddPartitionsToTxnRequest in flight (after 60058ms, timeout #0)
2025-01-08T17:02:16.220774+02:00 WARNING storage.transactional-producer.kafka.dc-1@macbook-pro@core@default+loadBalancerNext+transactional : [Kafka] [thrd:TxnCoordinator]: TxnCoordinator/0: Timed out 1 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests

were you able to track this request down in the broker logs? I don't think the snippet pasted above covers it, mind sharing a larger timespan of the broker logs? (starting few minutes before ^^ request reached the broker).

As for the "Fenced error", you cannot continue using the same producer instance once the client hits the error, instead it should be something like..

if (err == err_fenced) {
   p.close();
   p = new Producer(...)
   p.initTransactions()
   ....
} 

That wouldn't still explain what is causing the timeout in the first place but just FYI.

@blindspotbounty
Copy link

I think that re-connection is performed automatically by librdkafka on this timeout and just a cause of request that reached timeout.
Though it is also seems a little bit strange that after reconnection it receives fenced state but would be nice to dig into the reason why the request AddPartitionsToTxnRequest reached timeout.

Btw, after restart of redpanda this issue stopped reproducing again.

@bharathv would it be possible to upload the full logs somewhere?

@bharathv
Copy link
Contributor

bharathv commented Jan 9, 2025

@blindspotbounty please DM me the logs.

@bharathv bharathv self-assigned this Jan 9, 2025
@bharathv
Copy link
Contributor

bharathv commented Jan 10, 2025

I looked at the logs (thanks for sharing), there was an (unexpected) exception when processing the request that kept the connection hanging, which resulted in a client timeout eventually. It is still unclear to me what threw it based on the available logging, I still need to stare at the code a bit more.

How reproducible is this issue? I know you mentioned its rare but wondering if we can improve logging in some suspect places and you could try to reproduce again?

@blindspotbounty
Copy link

Thanks for looking, I am sorry but unfrotunately I don't have any particular scenario that would reproduce this issue.
I wonder if it would help if I enable exception layer when the issue reproduces?

@bharathv
Copy link
Contributor

enable exception layer when the issue reproduces?

you mean seastar exception logger, right? That'd really help, I noticed it was on for the initial part of the log, its too chatty but really helpful in this case.

@blindspotbounty
Copy link

Okay, if/when reproduces I will try to enable this layer as well.

@bharathv
Copy link
Contributor

just to be clear it has to be turned on before (or by the time) the issue happens so seastar can catch the exception and log it when it happens.

@blindspotbounty
Copy link

Yeah, for sure. When this starts reproducing, that is reproducing all the time until restart. So, I can enable all logs and collect it during the precise start of our application.
Unless we have to catch it before redpanda reaches this state.

@bharathv
Copy link
Contributor

bharathv commented Jan 10, 2025

When this starts reproducing, that is reproducing all the time until restart.

Ah ok.. if it is stuck in that state it makes sense, just set exception logger to trace (also helps to have all loggers in TRACE)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants