Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop the MEET packet if the link node is in handshake state #1436

Merged
merged 2 commits into from
Dec 16, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 26 additions & 4 deletions src/cluster_legacy.c
Original file line number Diff line number Diff line change
Expand Up @@ -3004,7 +3004,8 @@ int clusterIsValidPacket(clusterLink *link) {
}

if (type == server.cluster_drop_packet_filter || server.cluster_drop_packet_filter == -2) {
serverLog(LL_WARNING, "Dropping packet that matches debug drop filter");
serverLog(LL_WARNING, "Dropping packet of type %s that matches debug drop filter",
clusterGetMessageTypeString(type));
return 0;
}

Expand Down Expand Up @@ -3095,7 +3096,7 @@ int clusterProcessPacket(clusterLink *link) {
if (server.debug_cluster_close_link_on_packet_drop &&
(type == server.cluster_drop_packet_filter || server.cluster_drop_packet_filter == -2)) {
freeClusterLink(link);
serverLog(LL_WARNING, "Closing link for matching packet type %hu", type);
serverLog(LL_WARNING, "Closing link for matching packet type %s", clusterGetMessageTypeString(type));
return 0;
}
return 1;
Expand All @@ -3111,8 +3112,8 @@ int clusterProcessPacket(clusterLink *link) {
freeClusterLink(link);
serverLog(
LL_NOTICE,
"Closing link for node that sent a lightweight message of type %hu as its first message on the link",
type);
"Closing link for node that sent a lightweight message of type %s as its first message on the link",
clusterGetMessageTypeString(type));
return 0;
}
clusterNode *sender = link->node;
Expand All @@ -3121,6 +3122,27 @@ int clusterProcessPacket(clusterLink *link) {
return 1;
}

if (type == CLUSTERMSG_TYPE_MEET && link->node && nodeInHandshake(link->node)) {
enjoy-binbin marked this conversation as resolved.
Show resolved Hide resolved
/* If the link is bound to a node and the node is in the handshake state, and we receive
* a MEET packet, it may be that the sender sent multiple MEET packets so in here we are
* dropping the MEET to avoid the assert in setClusterNodeToInboundClusterLink. The assert
* will happen if the other sends a MEET packet because it detects that there is no inbound
* link, this node creates a new node in HANDSHAKE state (with a random node name), and
* respond with a PONG. The other node receives the PONG and removes the CLUSTER_NODE_MEET
* flag. This node is supposed to open an outbound connection to the other node in the next
* cron cycle, but before this happens, the other node re-sends a MEET on the same link
* because it still detects no inbound connection. We improved the re-send logic of MEET in
* #1441, now we will only re-send MEET packet once every handshake timeout period.
*
* Note that in getNodeFromLinkAndMsg, the node in the handshake state has a random name
* and not truly "known", so we don't know the sender. Dropping the MEET packet can prevent
* us from creating a random node, avoid incorrect link binding, and avoid duplicate MEET
* packet eliminate the handshake state. */
serverLog(LL_NOTICE, "Dropping MEET packet from node %.40s because the node is already in handshake state",
link->node->name);
return 1;
}

uint16_t flags = ntohs(hdr->flags);
uint64_t sender_claimed_current_epoch = 0, sender_claimed_config_epoch = 0;
clusterNode *sender = getNodeFromLinkAndMsg(link, hdr);
Expand Down
Loading