Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replica redirect read&write to primary in standalone mode #325

Merged
merged 12 commits into from
Jun 27, 2024

Conversation

soloestoy
Copy link
Member

@soloestoy soloestoy commented Apr 16, 2024

To implement #319

  1. replica is able to redirect read and write commands to it's primary in standalone mode
    • reply with "-REDIRECT primary-ip:port"
  2. add a subcommand CLIENT CAPA redirect, a client can announce the capability to handle redirection
    • if a client can handle redirection, the data access commands (read and write) will be redirected
  3. allow readonly and readwrite command in standalone mode, may be a breaking change
    • a client with redirect capability cannot process read commands on a replica by default
    • use READONLY command can allow read commands on a replica

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level LGTM.

(I didn't review the documentation of the config yet.)

src/server.c Outdated Show resolved Hide resolved
@enjoy-binbin enjoy-binbin linked an issue Apr 17, 2024 that may be closed by this pull request
Copy link
Member

@enjoy-binbin enjoy-binbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally LGTM. I think the slot number -1 is acceptable, and i suppose that many clients will find that it is easy to add the support. (but we seem to be lacking the most support from clients right now.)

The other one is announced-ip and announced-port, i'm not sure about this.

src/server.c Outdated Show resolved Hide resolved
Signed-off-by: zhaozhao.zz <[email protected]>
Copy link

codecov bot commented Apr 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.99%. Comparing base (5a51bf5) to head (6aa197f).
Report is 17 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #325      +/-   ##
============================================
- Coverage     70.17%   69.99%   -0.19%     
============================================
  Files           110      110              
  Lines         60077    60108      +31     
============================================
- Hits          42160    42070      -90     
- Misses        17917    18038     +121     
Files Coverage Δ
src/cluster.c 87.95% <ø> (+0.18%) ⬆️
src/commands.def 100.00% <ø> (ø)
src/networking.c 85.46% <100.00%> (+0.09%) ⬆️
src/server.c 88.55% <100.00%> (+0.01%) ⬆️
src/server.h 100.00% <ø> (ø)

... and 19 files with indirect coverage changes

valkey.conf Outdated Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
@hwware
Copy link
Member

hwware commented May 1, 2024

@soloestoy Now it looks like we agree to add "-REDIRECT ip port", I think you could update the top description to match the latest decision. Thanks

@hwware hwware added the major-decision-pending Major decision pending by TSC team label May 1, 2024
@madolson
Copy link
Member

madolson commented May 13, 2024

Highlighting my two concerns:

  • How much complexity are we signing up standalone clients to implement by having them understand the concept of "redirects", that was already built for cluster aware clients. Clients need to understand that when getting this new error message, they should update their internal endpoint to point to this instead, which is not what clients are expected to do today. Since this logic already basically exists for Valkey cluster, my preference is we still wait to understand if that is what we want longterm for standalone Valkey, but could be convinced otherwise. @PingXie Since we discussed this.
  • I don't like that this is adding a server configuration that causes backwards incompatible state, which is basically adding a type of configuration that clients have to know and understand. Specifically, in the happy path, when a client was previously able to directly talk to a replica client and send read commands, it will now get the -REDIRECT error unless it also sends the READONLY command. If an operator is unsure of this behavior, it's a very painful end user experience to start seeing unexpected errors and their only recourse is to flip a server configuration back. I think we should bias towards having clients negotiate this new capability, by sending either a new command (like CLIENT REDIRECT ON) or using the READONLY READWRITE command as flags. This should allow us to remove the config.

@soloestoy
Copy link
Member Author

soloestoy commented May 14, 2024

After the discussion I think we have an agreement about the redirections value, and need a bit change that this feature should not be a config in server. To make it as a flag of per client would be better (CLIENT REDIRECT ON seems not a good choice, since it can be easily confused with CLIENT TRACKING REDIRECT), I'm thinking which command or a new command is better.

@madolson
Copy link
Member

CLIENT TRACKING REDIRECT I did think about that but was hoping nobody really uses is. I think basically everyone that uses server assisted client side caching uses RESP3 which doesn't require that command. We could maybe do CLIENT NODE REDIRECT.

@PingXie
Copy link
Member

PingXie commented May 14, 2024

Summarizing @valkey-io/core-team's discussion and thought process:

  1. This is a breaking change. Standalone replicas can serve reads just fine today. This change when enabled would break the read-replica scenario for unenlightened clients
  2. Breaking changes should be not introduced via server-side knobs, such as configs/etc
  3. Breaking changes should be opt'ed in by the client explicitly
  4. We don't have a systematic way to support client opt-ins today but there have been precedents of client opt-in via special commands like READONLY

@PingXie
Copy link
Member

PingXie commented May 14, 2024

Even if we allow the client to opt-in via a new command like CLIENT NODE REDIRECT, we should not enable this particular behavior server-wide. This is because a Valkey server can be shared by multiple applications. It would be really confusing to application B if application A changes a global setting unilaterally. This brings in another problem however. I can imagine some other behavior changes in the future would only make sense at the server level. It feels like we might need to think through what a "capability" model would look like first or I am worried about getting into another discussion like the half done function support #58.

@madolson
Copy link
Member

I can imagine some other behavior changes in the future would only make sense at the server level.

Can you elaborate more, because I don't think there are that many real cases like this.

@soloestoy
Copy link
Member Author

soloestoy commented May 16, 2024

I removed the server config, and make it as a per client opt-in feature via CLIENT CAPA REDIRECT (we can append more capabilities in future), WDYT?

@soloestoy soloestoy changed the title replica redirect read&write to master in standalone mode replica redirect read&write to primary in standalone mode Jun 18, 2024
@soloestoy soloestoy added major-decision-approved Major decision approved by TSC team and removed major-decision-pending Major decision pending by TSC team labels Jun 18, 2024
Signed-off-by: zhaozhao.zz <[email protected]>
@soloestoy
Copy link
Member Author

it's ready to be merged, @valkey-io/core-team please approve

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general. Some comments about CLIENT CAPA.

src/networking.c Outdated Show resolved Hide resolved
src/server.h Outdated Show resolved Hide resolved
@enjoy-binbin enjoy-binbin added the release-notes This issue should get a line item in the release notes label Jun 19, 2024
Copy link
Member

@enjoy-binbin enjoy-binbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess we also need to update the doc (READONLY) and add a new command doc

tests/integration/replica-redirect.tcl Outdated Show resolved Hide resolved
@enjoy-binbin enjoy-binbin added the needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. label Jun 19, 2024
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Other reviewers: Please take a look at the new subcommand CLIENT CAPA capa [capa...], returns OK, ignores unknown capabilities from the future. (They can be sent by future clients supporting future Valkey versions.)

We need docs for -REDIECT and for CLIENT CAPA.

Signed-off-by: zhaozhao.zz <[email protected]>
@soloestoy soloestoy merged commit 28c5a17 into valkey-io:unstable Jun 27, 2024
20 checks passed
soloestoy added a commit to valkey-io/valkey-doc that referenced this pull request Jun 27, 2024
gmbnomis added a commit to gmbnomis/valkey that referenced this pull request Jul 30, 2024
In case of an instance role switch in standalone mode, clients with
blocking commands get disconnected sending an UNBLOCKED error to the
client.

Now that the REDIRECT error is available as an alternative (valkey-io#325),
issue a REDIRECT (redirecting the client to the primary) for clients
that are supporting it. There is no need to disconnect the client in
this case. This is in-line with the semantics of the MOVED error in
cluster mode.

Signed-off-by: Simon Baatz <[email protected]>
gmbnomis added a commit to gmbnomis/valkey that referenced this pull request Jul 31, 2024
In case of an instance role switch in standalone mode, clients with
blocking commands get disconnected sending an UNBLOCKED error to the
client.

Now that the REDIRECT error is available as an alternative (valkey-io#325),
issue a REDIRECT (redirecting the client to the primary) for clients
that are supporting it. There is no need to disconnect the client in
this case. This is in-line with the semantics of the MOVED error in
cluster mode.

Signed-off-by: Simon Baatz <[email protected]>
gmbnomis added a commit to gmbnomis/valkey that referenced this pull request Aug 1, 2024
In case of an instance role switch in standalone mode, clients with
blocking commands get disconnected sending an UNBLOCKED error to the
client.

Now that the REDIRECT error is available as an alternative (valkey-io#325),
issue a REDIRECT (redirecting the client to the primary) for clients
that are supporting it. If a client is read-only, it may remain
blocked in some situations. There is no need to disconnect the client
in either case. This is in-line with the semantics of the MOVED error
in cluster mode.

Signed-off-by: Simon Baatz <[email protected]>
gmbnomis added a commit to gmbnomis/valkey that referenced this pull request Aug 15, 2024
In case of an instance role switch in standalone mode, clients with
blocking commands get disconnected sending an UNBLOCKED error to the
client.

Now that the REDIRECT error is available as an alternative (valkey-io#325),
issue a REDIRECT (redirecting the client to the primary) for clients
that are supporting it. If a client is read-only, it may remain
blocked in some situations. There is no need to disconnect the client
in either case. This is in-line with the semantics of the MOVED error
in cluster mode.

Signed-off-by: Simon Baatz <[email protected]>
gmbnomis added a commit to gmbnomis/valkey that referenced this pull request Aug 18, 2024
In case of an instance role switch in standalone mode, clients with
blocking commands get disconnected sending an UNBLOCKED error to the
client.

Now that the REDIRECT error is available as an alternative (valkey-io#325),
issue a REDIRECT (redirecting the client to the primary) for clients
that are supporting it. If a client is read-only, it may remain
blocked in some situations. There is no need to disconnect the client
in either case. This is in-line with the semantics of the MOVED error
in cluster mode.

Signed-off-by: Simon Baatz <[email protected]>
@zuiderkwast
Copy link
Contributor

READONLY not returning an error in standalone mode broke some test case for Valkey-py.

Reported by @mkmkme:

Yeah, the test started failing because the test was as simple as

async def test_readonly_invalid_cluster_state(self, r: valkey.Valkey):
    with pytest.raises(exceptions.ValkeyError):
        await r.readonly()

So it assumed readonly command always returns an error

I guess we should have added a check for CLIENT CAPA REDIRECT in the READONLY and READWRITE commands, to keep the old behaviour for old clients.

@zuiderkwast zuiderkwast removed the needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. label Nov 22, 2024
mkmkme added a commit to valkey-io/valkey-py that referenced this pull request Nov 25, 2024
Behaviour of READONLY was changed in
valkey-io/valkey#325 which became a part of
8.0.0. This caused test_readonly_invalid_cluster_state to fail.

This commit takes into account this change.

Signed-off-by: Mikhail Koviazin <[email protected]>
Rafiot pushed a commit to Rafiot/valkey-py that referenced this pull request Dec 3, 2024
Behaviour of READONLY was changed in
valkey-io/valkey#325 which became a part of
8.0.0. This caused test_readonly_invalid_cluster_state to fail.

This commit takes into account this change.

Signed-off-by: Mikhail Koviazin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-approved Major decision approved by TSC team release-notes This issue should get a line item in the release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Support smooth swithover in standalone mode by using REDIRECT
8 participants