-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To avoid bouncing -REDIRECT during FAILOVER #871
Conversation
During the FAILOVER process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the FAILOVER_IN_PROGRESS state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the PSYNC FAILOVER command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the FAILOVER_IN_PROGRESS state, there may be a period of time where both nodes are replicas. In this scenario, if a -REDIRECT is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the FAILOVER_IN_PROGRESS state, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. Signed-off-by: zhaozhao.zz <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #871 +/- ##
============================================
- Coverage 70.35% 70.18% -0.18%
============================================
Files 112 112
Lines 61310 61473 +163
============================================
+ Hits 43136 43146 +10
- Misses 18174 18327 +153
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi, a potential test case for this bug is part of my PR #848 , see FIXME at https://github.com/valkey-io/valkey/pull/848/files#diff-fdef28e02ac8048541ebebbb829193ffd25622dc438bd8d921b167e51564bdd0R177. Just saying 😉
* FAILOVER_IN_PROGRESS state, we temporarily suspend the clients | ||
* that need to be redirected until the replica truly becomes the primary, | ||
* and then resume the execution. */ | ||
blockPostponeClient(c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may block reading clients as well, which is a behavioral change since we are only in CLIENT PAUSE WRITE
during this phase.
The simplest solution I came up with was to add && server.failover_state == NO_FAILOVER
to the if
for the redirect case (in line 3903) and to the if
for the read only replica case in https://github.com/valkey-io/valkey/pull/871/files#diff-1abc5651133d108c0c420d9411925373c711133e7748d9e4f4c97d5fb543fdd9R4012.
The rationale for this is that during a failover, we should prefer to block clients (which will happen here https://github.com/valkey-io/valkey/pull/871/files#diff-1abc5651133d108c0c420d9411925373c711133e7748d9e4f4c97d5fb543fdd9R4082) instead of redirecting them or giving them answers that may not be valid anymore after the failover (keep in mind that failover may still fail at this point in time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may block reading clients as well, which is a behavioral change since we are only in CLIENT PAUSE WRITE during this phase.
When the client does not execute the readonly
command, read operations will also be redirected, so read operations also need to be suspended. This is a special state of FAILOVER_IN_PROGRESS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale for this is that during a failover, we should prefer to block clients (which will happen here https://github.com/valkey-io/valkey/pull/871/files#diff-1abc5651133d108c0c420d9411925373c711133e7748d9e4f4c97d5fb543fdd9R4082) instead of redirecting them or giving them answers that may not be valid anymore after the failover (keep in mind that failover may still fail at this point in time).
This is exactly what this PR does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the client does not execute the
readonly
command, read operations will also be redirected, so read operations also need to be suspended. This is a special state ofFAILOVER_IN_PROGRESS
.
If I understand you correctly, you are saying that reading (which is possible in both failover states up to now) is a problem during FAILOVER_IN_PROGRESS
. Sorry, but I don't understand the reason why this is the case.
But if so, why are we only blocking clients that understand REDIRECT? Shouldn't we block all clients in this phase, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both read and write commands may receive a -REDIRECT
. The issue we are currently addressing is when the primary is demoted to a replica and is in the FAILOVER_IN_PROGRESS
state, the replica may not have become the primary yet. The solution is to pause both read and write commands during FAILOVER_IN_PROGRESS
, there is no need to pause non-read-write commands any time.
why are we only blocking clients that understand REDIRECT?
We don't care about clients without redirect capa, they would never receive -REDIRECT
, instead they can receive -READONLY
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't seem to get my message across, but maybe I understand now why you consider FAILOVER_IN_PROGRESS
to be special:
During the entire failover procedure, no change can happen on the primary since all writing commands are blocked (by postponing them). Therefore, I thought that continuing to answer reads is fine in FAILOVER_WAIT_FOR_SYNC
as well as in FAILOVER_IN_PROGRESS
.
But there is a time delay between the new primary becoming primary and the old primary realizing that the switch happened. If a new client happens to connect to the new primary and writes during this time, we may return stale data to a reading "READWRITE" client on the old primary if we allow reading instead of blocking it.
Is this why we need to block reads as well?
Do we need to document this change? (the documentation of the FAILOVER command does say that only writing clients are blocked)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to document this change? (the documentation of the FAILOVER command does say that only writing clients are blocked)
Yes, see valkey-io/valkey-doc#162
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to me. It would be nice to add a test though.
Signed-off-by: zhaozhao.zz <[email protected]>
To valkey-io/valkey#871 --------- Signed-off-by: zhaozhao.zz <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]>
Fix valkey-io#821 During the `FAILOVER` process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the `PSYNC FAILOVER` command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the `FAILOVER_IN_PROGRESS` state, there may be a period of time where both nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the `FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. --------- Signed-off-by: zhaozhao.zz <[email protected]> Signed-off-by: mwish <[email protected]>
Fix valkey-io#821 During the `FAILOVER` process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the `PSYNC FAILOVER` command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the `FAILOVER_IN_PROGRESS` state, there may be a period of time where both nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the `FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. --------- Signed-off-by: zhaozhao.zz <[email protected]> Signed-off-by: mwish <[email protected]>
Fix #821 During the `FAILOVER` process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the `PSYNC FAILOVER` command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the `FAILOVER_IN_PROGRESS` state, there may be a period of time where both nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the `FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. --------- Signed-off-by: zhaozhao.zz <[email protected]>
Fix #821 During the `FAILOVER` process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the `PSYNC FAILOVER` command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the `FAILOVER_IN_PROGRESS` state, there may be a period of time where both nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the `FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. --------- Signed-off-by: zhaozhao.zz <[email protected]>
Fix valkey-io#821 During the `FAILOVER` process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the `PSYNC FAILOVER` command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the `FAILOVER_IN_PROGRESS` state, there may be a period of time where both nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the `FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. --------- Signed-off-by: zhaozhao.zz <[email protected]> Signed-off-by: Ping Xie <[email protected]>
Fix valkey-io#821 During the `FAILOVER` process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to the `FAILOVER_IN_PROGRESS` state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send the `PSYNC FAILOVER` command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during the `FAILOVER_IN_PROGRESS` state, there may be a period of time where both nodes are replicas. In this scenario, if a `-REDIRECT` is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during the `FAILOVER_IN_PROGRESS state`, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution. --------- Signed-off-by: zhaozhao.zz <[email protected]> Signed-off-by: Ping Xie <[email protected]>
Fix #821
During the
FAILOVER
process, when conditions are met (such as when the force time is reached or the primary and replica offsets are consistent), the primary actively becomes the replica and transitions to theFAILOVER_IN_PROGRESS
state. After the primary becomes the replica, and after handshaking and other operations, it will eventually send thePSYNC FAILOVER
command to the replica, after which the replica will become the primary. This means that the upgrade of the replica to the primary is an asynchronous operation, which implies that during theFAILOVER_IN_PROGRESS
state, there may be a period of time where both nodes are replicas. In this scenario, if a-REDIRECT
is returned, the request will be redirected to the replica and then redirected back, causing back and forth redirection. To avoid this situation, during theFAILOVER_IN_PROGRESS state
, we temporarily suspend the clients that need to be redirected until the replica truly becomes the primary, and then resume the execution.