You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[T0] Three nodes: primary A with replica B, and an observer node N
[T1] B's PONG message to N claiming A is its primary gets stuck somewhere on the way to N
[T2] B becomes primary after a manual failover and then notifies A (and N but that message will get stuck behind the PONG message at T1)
[T3] A becomes a replica of B
[T4] A, now a replica of B, sends PING to N, which goes through the following steps that end up "promote" B to a primary, indirectly
I have seen stale messages in the past and I also notice that the latest failure in the codecov run, which could alter the timing quite a bit so I think this theory is very plausible.
The fix would be to bail immediately after detecting the stale message
I have a theory about how this could happen.
PONG
message issue, which was fixed in commit 28976a9valkey/src/cluster_legacy.c
Line 3271 in 2b76c8f
valkey/src/cluster_legacy.c
Line 3311 in 2b76c8f
sender
'sreplicaof
based on the stale message at:valkey/src/cluster_legacy.c
Line 3317 in 2b76c8f
Now, imagine the following scenario
[
T0
] Three nodes: primaryA
with replicaB
, and an observer nodeN
[
T1
]B
'sPONG
message toN
claimingA
is its primary gets stuck somewhere on the way toN
[
T2
]B
becomes primary after a manual failover and then notifiesA
(andN
but that message will get stuck behind thePONG
message atT1
)[
T3
]A
becomes a replica ofB
[
T4
]A
, now a replica ofB
, sendsPING
toN
, which goes through the following steps that end up "promote"B
to a primary, indirectlyvalkey/src/cluster_legacy.c
Line 3257 in 2b76c8f
valkey/src/cluster_legacy.c
Line 3267 in 2b76c8f
valkey/src/cluster_legacy.c
Line 3269 in 2b76c8f
valkey/src/cluster_legacy.c
Line 3281 in 2b76c8f
and sets
A
'sreplicaof
toB
valkey/src/cluster_legacy.c
Line 3311 in 2b76c8f
valkey/src/cluster_legacy.c
Line 3317 in 2b76c8f
[
T5
] Finally,B
'sPONG
message toN
from [T1
] arrives onN
and it goes through the following stepsvalkey/src/cluster_legacy.c
Line 3257 in 2b76c8f
valkey/src/cluster_legacy.c
Line 3264 in 2b76c8f
Due to step 4,
B
got promoted to primary, implicitlyvalkey/src/cluster_legacy.c
Line 3267 in 2b76c8f
However the epoch is stale, which is correctly handled
valkey/src/cluster_legacy.c
Line 3271 in 2b76c8f
valkey/src/cluster_legacy.c
Line 3273 in 2b76c8f
We don't bail but instead continue to
valkey/src/cluster_legacy.c
Line 3311 in 2b76c8f
and finally updates
B->replicaof
toA
, completing the loopvalkey/src/cluster_legacy.c
Line 3317 in 2b76c8f
I have seen stale messages in the past and I also notice that the latest failure in the codecov run, which could alter the timing quite a bit so I think this theory is very plausible.
The fix would be to bail immediately after detecting the stale message
valkey/src/cluster_legacy.c
Line 3273 in 2b76c8f
BTW, we have another undetected stale message issue (#798)
Originally posted by @PingXie in #573 (comment)
The text was updated successfully, but these errors were encountered: