Improve dual channel replication stability and fix compatibility issues #804

naglera · 2024-07-18T12:00:10Z

Introduce several improvements to improve the stability of dual-channel replication and fix compatibility issues.

Make dual-channel-replication tests more reliable: use pause instead of forced sleep.
Fix race conditions when freeing RDB client.
Check if sync was stopped during local buffer streaming.
Fix $ENDOFFSET reply format to work on 32-bit machines too.

codecov · 2024-07-18T15:29:22Z

Codecov Report

Attention: Patch coverage is 85.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 70.36%. Comparing base (59aa008) to head (468c969).
Report is 4 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #804      +/-   ##
============================================
+ Coverage     70.18%   70.36%   +0.17%     
============================================
  Files           112      112              
  Lines         61311    61318       +7     
============================================
+ Hits          43031    43146     +115     
+ Misses        18280    18172     -108

Files	Coverage Δ
src/debug.c	`53.98% <100.00%> (+0.13%)`	⬆️
src/networking.c	`88.78% <100.00%> (+0.12%)`	⬆️
src/server.h	`100.00% <ø> (ø)`
src/replication.c	`87.15% <66.66%> (-0.25%)`	⬇️

... and 13 files with indirect coverage changes

tests/helpers/bg_server_sleep.tcl

madolson · 2024-07-18T20:16:56Z

https://github.com/valkey-io/valkey/actions/runs/9998100042

madolson · 2024-07-18T20:49:14Z

Hmm, test is still failing: https://github.com/valkey-io/valkey/actions/runs/9998100042/job/27636220483

naglera · 2024-07-21T15:23:35Z

Hi @madolson I replaced all uses of DEBUG SLEEP and DEBUG SLEEP-AFTER-FORK commad with SIGSTOP and SIGCOUNT, Dual channel tcl tests should now be stable.

madolson · 2024-07-22T03:34:26Z

https://github.com/valkey-io/valkey/actions/runs/10034245063

enjoy-binbin · 2024-07-22T08:55:30Z

new ci: https://github.com/valkey-io/valkey/actions/runs/10037736850

src/server.h

src/debug.c

madolson

Most of these changes seem conceptually OK to me, and the tests do look green now.

madolson · 2024-07-23T21:58:23Z

src/replication.c

+        replicationAbortDualChannelSyncTransfer();
+        replicationUnsetPrimary();


This crashed in the sanitizer test: https://github.com/valkey-io/valkey/actions/runs/10066315835/job/27827425020?pr=804

naglera · 2024-07-24T16:50:17Z

Summery of the remaining failures. I think that only number 6 is related to dual-channel.

In 6 replica reached OOM and failed to sync. I don't think its a bug but a busy machine.

hwware · 2024-07-25T14:48:03Z

Summery of the remaining failures. I think that only number 6 is related to dual-channel.

test "Fuzzer corrupt restore payloads - sanitize_dump: $sanitize_dump"

Verify the nodes configured with prefer hostname only show hostname for new nodes in tests/unit/cluster/hostnames.tcl

Empty-shard migration target is auto-updated after failover in target shard

Empty-shard migration source is auto-updated after failover in source shard

Verify health as fail for killed node

Diskless load swapdb (different replid): replica enter loading dual-channel-replication-enabled

Cluster is writable

Slot migration tests

In 6 replica reached OOM and failed to sync. I don't think its a bug but a busy machine.

Try to checkout this commit ff6b780 (This is the commit for dual channel replication feature merged), , and then cherry-pick your update, run once Daily CI

If all test cases are green, I think we can merge your PR (According to my observation, some other prs cause our current Daily CI failure)

*** [err]: Psync established after RDB load - beyond grace period in tests/integration/dual-channel-replication.tcl log message of '"*Replica main channel failed to establish PSYNC within the grace period*"' not found in ./tests/tmp/server.7063.182/stdout after line: 0 till line: 196 Signed-off-by: naglera <[email protected]>

Deflaked the following tests: 1. dual-channel-replication with multiple replicas 2. Test diverse replica sync: dual-channel on/off 3. Test replica's buffer limit reached 4. dual-channel-replication fails when primary diskless disabled First check that replica is online and then wait for value to propegate. test failed in https://github.com/valkey-io/valkey/actions/runs/9986538309/job/27599242506 Signed-off-by: naglera <[email protected]>

Signed-off-by: naglera <[email protected]>

…after client killed we should wait before asserting replconf will fail Signed-off-by: naglera <[email protected]>

Signed-off-by: naglera <[email protected]>

Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: naglera <[email protected]>

Signed-off-by: Ubuntu <[email protected]> Signed-off-by: naglera <[email protected]>

…er stream Signed-off-by: Ubuntu <[email protected]> Signed-off-by: naglera <[email protected]>

…cleanup Signed-off-by: Ubuntu <[email protected]> Signed-off-by: naglera <[email protected]>

Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: naglera <[email protected]>

Signed-off-by: naglera <[email protected]>

This reverts commit fc2465d. Signed-off-by: naglera <[email protected]>

Signed-off-by: naglera <[email protected]>

replicaition buffer streaming Signed-off-by: naglera <[email protected]>

Signed-off-by: naglera <[email protected]>

This reverts commit a967cb4. Signed-off-by: naglera <[email protected]>

This reverts commit 00e62bd. Signed-off-by: naglera <[email protected]>

This reverts commit 4b47959. Signed-off-by: naglera <[email protected]>

This reverts commit a88ef2d. Signed-off-by: naglera <[email protected]>

This reverts commit bb53f85. Signed-off-by: naglera <[email protected]>

This reverts commit 9e3ad13. Signed-off-by: naglera <[email protected]>

Signed-off-by: naglera <[email protected]>

This PR is based on: #12109 valkey-io/valkey#60 Closes: #11678 **Motivation** During a full sync, when master is delivering RDB to the replica, incoming write commands are kept in a replication buffer in order to be sent to the replica once RDB delivery is completed. If RDB delivery takes a long time, it might create memory pressure on master. Also, once a replica connection accumulates replication data which is larger than output buffer limits, master will kill replica connection. This may cause a replication failure. The main benefit of the rdb channel replication is streaming incoming commands in parallel to the RDB delivery. This approach shifts replication stream buffering to the replica and reduces load on master. We do this by opening another connection for RDB delivery. The main channel on replica will be receiving replication stream while rdb channel is receiving the RDB. This feature also helps to reduce master's main process CPU load. By opening a dedicated connection for the RDB transfer, the bgsave process has access to the new connection and it will stream RDB directly to the replicas. Before this change, due to TLS connection restriction, the bgsave process was writing RDB bytes to a pipe and the main process was forwarding it to the replica. This is no longer necessary, the main process can avoid these expensive socket read/write syscalls. It also means RDB delivery to replica will be faster as it avoids this step. In summary, replication will be faster and master's performance during full syncs will improve. **Implementation steps** 1. When replica connects to the master, it sends 'rdb-channel-repl' as part of capability exchange to let master to know replica supports rdb channel. 2. When replica lacks sufficient data for PSYNC, master sends +RDBCHANNELSYNC reply with replica's client id. As the next step, the replica opens a new connection (rdb-channel) and configures it against the master with the appropriate capabilities and requirements. It also sends given client id back to master over rdbchannel, so that master can associate these channels. (initial replica connection will be referred as main-channel) Then, replica requests fullsync using the RDB channel. 3. Prior to forking, master attaches the replica's main channel to the replication backlog to deliver replication stream starting at the snapshot end offset. 4. The master main process sends replication stream via the main channel, while the bgsave process sends the RDB directly to the replica via the rdb-channel. Replica accumulates replication stream in a local buffer, while the RDB is being loaded into the memory. 5. Once the replica completes loading the rdb, it drops the rdb channel and streams the accumulated replication stream into the db. Sync is completed. **Some details** - Currently, rdbchannel replication is supported only if `repl-diskless-sync` is enabled on master. Otherwise, replication will happen over a single connection as in before. - On replica, there is a limit to replication stream buffering. Replica uses a new config `replica-full-sync-buffer-limit` to limit number of bytes to accumulate. If it is not set, replica inherits `client-output-buffer-limit <replica>` hard limit config. If we reach this limit, replica stops accumulating. This is not a failure scenario though. Further accumulation will happen on master side. Depending on the configured limits on master, master may kill the replica connection. **API changes in INFO output:** 1. New replica state: `send_bulk_and_stream`. Indicates full sync is still in progress for this replica. It is receiving replication stream and rdb in parallel. ``` slave0:ip=127.0.0.1,port=5002,state=send_bulk_and_stream,offset=0,lag=0 ``` Replica state changes in steps: - First, replica sends psync and receives +RDBCHANNELSYNC :`state=wait_bgsave` - After replica connects with rdbchannel and delivery starts: `state=send_bulk_and_stream` - After full sync: `state=online` 2. On replica side, replication stream buffering metrics: - replica_full_sync_buffer_size: Currently accumulated replication stream data in bytes. - replica_full_sync_buffer_peak: Peak number of bytes that this instance accumulated in the lifetime of the process. ``` replica_full_sync_buffer_size:20485 replica_full_sync_buffer_peak:1048560 ``` **API changes in CLIENT LIST** In `client list` output, rdbchannel clients will have 'C' flag in addition to 'S' replica flag: ``` id=11 addr=127.0.0.1:39108 laddr=127.0.0.1:5001 fd=14 name= age=5 idle=5 flags=SC db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1920 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=0 ``` **Config changes:** - `replica-full-sync-buffer-limit`: Controls how much replication data replica can accumulate during rdbchannel replication. If it is not set, a value of 0 means replica will inherit `client-output-buffer-limit <replica>` hard limit config to limit accumulated data. - `repl-rdb-channel` config is added as a hidden config. This is mostly for testing as we need to support both rdbchannel replication and the older single connection replication (to keep compatibility with older versions and rdbchannel replication will not be enabled if repl-diskless-sync is not enabled). it affects both the master (not to respond to rdb channel requests), and the replica (not to declare capability) **Internal API changes:** Changes that were introduced to Redis replication: - New replication capability is added to replconf command: `capa rdb-channel-repl`. Indicates replica is capable of rdb channel replication. Replica sends it when it connects to master along with other capabilities. - If replica needs fullsync, master replies `+RDBCHANNELSYNC <client-id>` to the replica's PSYNC request. - When replica opens rdbchannel connection, as part of replconf command, it sends `rdb-channel 1` to let master know this is rdb channel. Also, it sends `main-ch-client-id <client-id>` as part of replconf command so master can associate channels. **Testing:** As rdbchannel replication is enabled by default, we run whole test suite with it. Though, as we need to support both rdbchannel and single connection replication, we'll be running some tests twice with `repl-rdb-channel yes/no` config. **Replica state diagram** ``` * * Replica state machine * * * Main channel state * ┌───────────────────┐ * │RECEIVE_PING_REPLY │ * └────────┬──────────┘ * │ +PONG * ┌────────▼──────────┐ * │SEND_HANDSHAKE │ RDB channel state * └────────┬──────────┘ ┌───────────────────────────────┐ * │+OK ┌───► RDB_CH_SEND_HANDSHAKE │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_AUTH_REPLY │ │ REPLCONF main-ch-client-id <clientid> * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_AUTH_REPLY │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_PORT_REPLY │ │ │ +OK * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_REPLCONF_REPLY│ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_IP_REPLY │ │ │ +OK * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_FULLRESYNC │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_CAPA_REPLY │ │ │+FULLRESYNC * └────────┬──────────┘ │ │Rdb delivery * │ │ ┌──────────────▼────────────────┐ * ┌────────▼──────────┐ │ │ RDB_CH_RDB_LOADING │ * │SEND_PSYNC │ │ └──────────────┬────────────────┘ * └─┬─────────────────┘ │ │ Done loading * │PSYNC (use cached-master) │ │ * ┌─▼─────────────────┐ │ │ * │RECEIVE_PSYNC_REPLY│ │ ┌────────────►│ Replica streams replication * └─┬─────────────────┘ │ │ │ buffer into memory * │ │ │ │ * │+RDBCHANNELSYNC client-id │ │ │ * ├──────┬───────────────────┘ │ │ * │ │ Main channel │ │ * │ │ accumulates repl data │ │ * │ ┌──▼────────────────┐ │ ┌───────▼───────────┐ * │ │ REPL_TRANSFER ├───────┘ │ CONNECTED │ * │ └───────────────────┘ └────▲───▲──────────┘ * │ │ │ * │ │ │ * │ +FULLRESYNC ┌───────────────────┐ │ │ * ├────────────────► REPL_TRANSFER ├────┘ │ * │ └───────────────────┘ │ * │ +CONTINUE │ * └──────────────────────────────────────────────┘ */ ``` ----- This PR also contains changes and ideas from: valkey-io/valkey#837 valkey-io/valkey#1173 valkey-io/valkey#804 valkey-io/valkey#945 valkey-io/valkey#989 --------- Co-authored-by: Yuan Wang <[email protected]> Co-authored-by: debing.sun <[email protected]> Co-authored-by: Moti Cohen <[email protected]> Co-authored-by: naglera <[email protected]> Co-authored-by: Amit Nagler <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Binbin <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]> Co-authored-by: Ping Xie <[email protected]> Co-authored-by: Ran Shidlansik <[email protected]> Co-authored-by: ranshid <[email protected]> Co-authored-by: xbasel <[email protected]>

naglera mentioned this pull request Jul 18, 2024

Dual channel replication #60

Merged

enjoy-binbin approved these changes Jul 18, 2024

View reviewed changes

madolson reviewed Jul 18, 2024

View reviewed changes

tests/helpers/bg_server_sleep.tcl Outdated Show resolved Hide resolved

naglera force-pushed the dual-channel-replication-fixes branch from 95a275e to abea73c Compare July 21, 2024 15:19

madolson added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Jul 23, 2024

madolson reviewed Jul 23, 2024

View reviewed changes

src/server.h Outdated Show resolved Hide resolved

zuiderkwast mentioned this pull request Jul 23, 2024

Fix extra reply in debug sleep-after-fork-seconds error path #810

Merged

naglera force-pushed the dual-channel-replication-fixes branch from d458aa8 to 7cd8c5e Compare July 23, 2024 10:52

madolson reviewed Jul 23, 2024

View reviewed changes

src/debug.c Outdated Show resolved Hide resolved

madolson approved these changes Jul 23, 2024

View reviewed changes

madolson reviewed Jul 23, 2024

View reviewed changes

naglera force-pushed the dual-channel-replication-fixes branch from a22b59c to fc2465d Compare July 24, 2024 08:23

madolson removed the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Jul 25, 2024

naglera and others added 9 commits July 25, 2024 16:13

Use pause process instead of brute force sleep

309a83e

Signed-off-by: naglera <[email protected]>

test documentation fix

4d3001a

Signed-off-by: naglera <[email protected]>

Deflake- Test dual-channel-replication primary reject set-rdb-client …

81355d0

…after client killed we should wait before asserting replconf will fail Signed-off-by: naglera <[email protected]>

Use smaller RDB for "Test replica's buffer limit reached"

773ad9c

Signed-off-by: naglera <[email protected]>

Update src/server.h

e0696c5

Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: naglera <[email protected]>

Fix race condition in rdb client free

021975d

Signed-off-by: Ubuntu <[email protected]> Signed-off-by: naglera <[email protected]>

Fix race condition dualChannelSync abort after local replication buff…

67dd350

…er stream Signed-off-by: Ubuntu <[email protected]> Signed-off-by: naglera <[email protected]>

naglera and others added 23 commits July 25, 2024 16:13

Complete dualChannelSync abort after local buffer stream fix: Handle …

2931ecc

…cleanup Signed-off-by: Ubuntu <[email protected]> Signed-off-by: naglera <[email protected]>

Update src/debug.c

caa0c1a

Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: naglera <[email protected]>

Move repl_rdb_channel_state check to abortDualChannelSync

c4f2ecb

Signed-off-by: naglera <[email protected]>

clang format

605ee5f

Signed-off-by: naglera <[email protected]>

Revert "Move repl_rdb_channel_state check to abortDualChannelSync"

92855ac

This reverts commit fc2465d. Signed-off-by: naglera <[email protected]>

Fix uint64 on 32bit machine bug

a682319

Signed-off-by: naglera <[email protected]>

Verify sync is still in progress when sync aborted during local

ab211b4

replicaition buffer streaming Signed-off-by: naglera <[email protected]>

NOT FOR MERGE - debug logs

a8ef15d

Signed-off-by: naglera <[email protected]>

NOT FOR MERGE - debug logs

b32505f

Signed-off-by: naglera <[email protected]>

Fix freeClientAsync double free

5d3baf9

Signed-off-by: naglera <[email protected]>

NOT FOR MERGE - debug logs

dbac48b

Signed-off-by: naglera <[email protected]>

NOT FOR MERGE - debug logs

ff5bd68

Signed-off-by: naglera <[email protected]>

NOT FOR MERGE - debug logs

2ad0e0d

Signed-off-by: naglera <[email protected]>

NOT FOR MERGE - debug logs

5b0e447

Signed-off-by: naglera <[email protected]>

fix endoffset format

5b9cb0e

Signed-off-by: naglera <[email protected]>

fix endoffset format 2

4a5ec3c

Signed-off-by: naglera <[email protected]>

Revert "NOT FOR MERGE - debug logs"

1bf7dbe

This reverts commit a967cb4. Signed-off-by: naglera <[email protected]>

Revert "NOT FOR MERGE - debug logs"

0d59c7f

This reverts commit 00e62bd. Signed-off-by: naglera <[email protected]>

Revert "NOT FOR MERGE - debug logs"

d3eac4b

This reverts commit 4b47959. Signed-off-by: naglera <[email protected]>

Revert "NOT FOR MERGE - debug logs"

eebf226

This reverts commit a88ef2d. Signed-off-by: naglera <[email protected]>

Revert "NOT FOR MERGE - debug logs"

08d0b3e

This reverts commit bb53f85. Signed-off-by: naglera <[email protected]>

Revert "NOT FOR MERGE - debug logs"

b2f739f

This reverts commit 9e3ad13. Signed-off-by: naglera <[email protected]>

clang format

468c969

Signed-off-by: naglera <[email protected]>

naglera force-pushed the dual-channel-replication-fixes branch from 487a8a9 to 468c969 Compare July 25, 2024 16:15

naglera changed the title ~~Deflake test "Psync established after RDB load - beyond grace period"~~ Improve dual channel replication stability and fix compatibility issues Jul 25, 2024

madolson approved these changes Jul 25, 2024

View reviewed changes

madolson merged commit 48ca2c9 into valkey-io:unstable Jul 25, 2024
47 checks passed

tezc mentioned this pull request Jan 8, 2025

Rdb channel replication redis/redis#13732

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve dual channel replication stability and fix compatibility issues #804

Improve dual channel replication stability and fix compatibility issues #804

naglera commented Jul 18, 2024 •

edited by madolson

Loading

codecov bot commented Jul 18, 2024 •

edited

Loading

madolson commented Jul 18, 2024 •

edited

Loading

madolson commented Jul 18, 2024

naglera commented Jul 21, 2024

madolson commented Jul 22, 2024 •

edited

Loading

enjoy-binbin commented Jul 22, 2024

madolson left a comment

madolson Jul 23, 2024

naglera commented Jul 24, 2024 •

edited

Loading

hwware commented Jul 25, 2024

		replicationAbortDualChannelSyncTransfer();
		replicationUnsetPrimary();

Improve dual channel replication stability and fix compatibility issues #804

Improve dual channel replication stability and fix compatibility issues #804

Conversation

naglera commented Jul 18, 2024 • edited by madolson Loading

codecov bot commented Jul 18, 2024 • edited Loading

Codecov Report

madolson commented Jul 18, 2024 • edited Loading

madolson commented Jul 18, 2024

naglera commented Jul 21, 2024

madolson commented Jul 22, 2024 • edited Loading

enjoy-binbin commented Jul 22, 2024

madolson left a comment

Choose a reason for hiding this comment

madolson Jul 23, 2024

Choose a reason for hiding this comment

naglera commented Jul 24, 2024 • edited Loading

hwware commented Jul 25, 2024

naglera commented Jul 18, 2024 •

edited by madolson

Loading

codecov bot commented Jul 18, 2024 •

edited

Loading

madolson commented Jul 18, 2024 •

edited

Loading

madolson commented Jul 22, 2024 •

edited

Loading

naglera commented Jul 24, 2024 •

edited

Loading