Skip to content

Commit

Permalink
Fix potential infinite loop in clusterNodeGetPrimary
Browse files Browse the repository at this point in the history
The original function could enter an infinite loop if there
was a cycle in the replica chain. This change adds a check to
break the loop if the node's replicaof eventually points back
to itself. Additionally, a debugAssert has been added to ensure
the replica/primary chain forms a tree structure and does not
have any cycles.

Signed-off-by: Ping Xie <[email protected]>
  • Loading branch information
PingXie committed Jun 14, 2024
1 parent d211078 commit 61aa3a3
Showing 1 changed file with 9 additions and 35 deletions.
44 changes: 9 additions & 35 deletions src/cluster_legacy.c
Original file line number Diff line number Diff line change
Expand Up @@ -3128,39 +3128,7 @@ int clusterProcessPacket(clusterLink *link) {
clusterUpdateSlotsConfigWith(sender_primary, senderConfigEpoch, hdr->myslots);

/* Explicitly check for a replication loop before attempting the replication
* chain folding logic.
*
* In some rare case, slot config updates (via either PING/PONG or UPDATE)
* can be delivered out of order as illustrated below.
*
* 1. To keep the discussion simple, let's assume we have 2 shards, shard a
* and shard b. Let's also assume there are two slots in total with shard
* a owning slot 1 and shard b owning slot 2.
* 2. Shard a has two nodes: primary A and replica A*; shard b has primary
* B and replica B*.
* 3. A manual failover was initiated on A* and A* just wins the election.
* 4. A* announces to the world that it now owns slot 1 using PING messages.
* These PING messages are queued in the outgoing buffer to every other
* node in the cluster, namely, A, B, and B*.
* 5. Keep in mind that there is no ordering in the delivery of these PING
* messages. For the stale PING message to appear, we need the following
* events in the exact order as they are laid out.
* a. An old PING message before A* becomes the new primary is still queued
* in A*'s outgoing buffer to A. This later becomes the stale message,
* which says A* is a replica of A. It is followed by A*'s election
* winning announcement PING message.
* b. B or B* processes A's election winning announcement PING message
* and sets slots[1]=A*.
* c. A sends a PING message to B (or B*). Since A hasn't learnt that A*
* wins the election, it claims that it owns slot 1 but with a lower
* epoch than B has on slot 1. This leads to B sending an UPDATE to
* A directly saying A* is the new owner of slot 1 with a higher epoch.
* d. A receives the UPDATE from B and executes clusterUpdateSlotsConfigWith.
* A now realizes that it is a replica of A* hence setting myself->replicaof
* to A*.
* e. Finally, the pre-failover PING message queued up in A*'s outgoing
* buffer to A is delivered and processed, out of order though, to A.
* f. This stale PING message creates the replication loop */
* chain folding logic. */
if (myself->replicaof && myself->replicaof->replicaof && myself->replicaof->replicaof != myself) {
/* Safeguard against sub-replicas. A replica's primary can turn itself
* into a replica if its last slot is removed. If no other node takes
Expand Down Expand Up @@ -5853,8 +5821,14 @@ int clusterNodeIsReplica(clusterNode *node) {
}

clusterNode *clusterNodeGetPrimary(clusterNode *node) {
while (node->replicaof != NULL) node = node->replicaof;
return node;
clusterNode *primary = node;
while (primary->replicaof != NULL) {
primary = primary->replicaof;
if (primary == node) break;
}
/* Assert that a node's replicaof/primary chain does not form a cycle. */
debugServerAssert(primary->replicaof == NULL);
return primary;
}

char *clusterNodeGetName(clusterNode *node) {
Expand Down

0 comments on commit 61aa3a3

Please sign in to comment.