You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I took a look too and realized it’s a regression introduced by my slot migration PR #445. This change started allowing a replica to report its primary’s slot states and trigger clusterUdpateSlotsConfigWith.
Here's what I think happens in these test failures involving a 3-node shard:
[T1] - Node A, B, and C are in the same shard with A as the primary.
[T2] - Node A loses its primaryship to B via a graceful/manual failover.
[T3] - After winning the election, B broadcasts the news to every node in the cluster, including C.
[T4] - C receives B's latest PING message and correctly registers B as its new primary.
[T5] - C then sends a new PING message to A, claiming B is its primary with all the slots.
[T6] - A still hasn't received B's broadcast message from [T3], and C's PING message from [T4] arrives at A.
And this is where things go wrong—a replicaof cycle is created.
At this point, A still thinks it’s the primary of the shard, and B -> replicaof == A. Since C is still a replica (as before), the role change handling logic doesn’t apply. So, A enters clusterUdpateSlotsConfigWith using C’s slot information (which is up to date with B’s). More importantly, B is the sender, and A assumes B -> replicaof == A. The slot ownership update logic correctly gives the ownership of the slots to B. Because A loses all its slots to another node in the same shard with a higher config epoch, this demotes A to a replica of the winner, B. And we set A -> replicaof = B, completing the replicaof cycle.
Before diving into the code fix. Is it any problematic to handle topology update from replica with higher config epoch? My initial thought was to solve this problem by fixing primary-replica relationship correctly for all the nodes in a given shard.
I took a look too and realized it’s a regression introduced by my slot migration PR #445. This change started allowing a replica to report its primary’s slot states and trigger clusterUdpateSlotsConfigWith.
PR #445 - Slot Migration Changes.
Here's what I think happens in these test failures involving a 3-node shard:
[T1] - Node
A
,B
, andC
are in the same shard withA
as the primary.[T2] - Node
A
loses its primaryship toB
via a graceful/manual failover.[T3] - After winning the election,
B
broadcasts the news to every node in the cluster, includingC
.[T4] -
C
receivesB
's latestPING
message and correctly registersB
as its new primary.[T5] -
C
then sends a newPING
message toA
, claimingB
is its primary with all the slots.[T6] -
A
still hasn't receivedB
's broadcast message from [T3], andC
'sPING
message from [T4] arrives atA
.And this is where things go wrong—a
replicaof
cycle is created.At this point,
A
still thinks it’s the primary of the shard, andB -> replicaof == A
. SinceC
is still a replica (as before), the role change handling logic doesn’t apply. So,A
entersclusterUdpateSlotsConfigWith
usingC
’s slot information (which is up to date withB
’s). More importantly,B
is the sender, andA
assumesB -> replicaof == A
. The slot ownership update logic correctly gives the ownership of the slots toB
. BecauseA
loses all its slots to another node in the same shard with a higher config epoch, this demotesA
to a replica of the winner,B
. And we setA -> replicaof = B
, completing thereplicaof
cycle.Originally posted by @PingXie in #573 (comment)
The text was updated successfully, but these errors were encountered: