Fix bugs related to synchronous TLS connections blocking #837

naglera · 2024-07-29T08:12:35Z

Fix TLS bug where connection were shutdown by primary's main process while the child process was still writing- causing main process to be blocked.
TLS connection fix -file descriptors are set to blocking mode in the main thread, followed by a blocking write. This sets the file descriptors to non-blocking if TLS is used (see connTLSSyncWrite()) (@xbasel).
Improve the reliability of dual-channel tests. Modify the pause mechanism to verify process status directly, rather than relying on log.
Ensure that server.repl_offset and server.replid are updated correctly when dual channel synchronization completes successfully. Thist led to failures in replication tests that validate replication IDs or compare replication offsets.

Signed-off-by: naglera <[email protected]>

ranshid · 2024-07-29T08:17:41Z

tests/integration/dual-channel-replication.tcl

-    wait_for_log_messages $idx {"*Process is about to stop.*"} 0 2000 1
+    wait_for_condition 50 1000 {
+        [exec ps -o state= -p $pid] eq "T" ||
+        [exec ps -o state= -p $pid] eq "Z"


did you plan to catch that? I mean this is basically when we are waiting to collect the zombie, this is not the expected flow we are trying to catch...

codecov · 2024-07-29T08:26:07Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.23%. Comparing base (b4d96ca) to head (575d549).
Report is 76 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #837      +/-   ##
============================================
- Coverage     70.38%   70.23%   -0.16%     
============================================
  Files           112      112              
  Lines         61462    61470       +8     
============================================
- Hits          43261    43173      -88     
- Misses        18201    18297      +96

Files with missing lines	Coverage Δ
src/networking.c	`88.72% <100.00%> (-0.08%)`	⬇️
src/rdb.c	`76.16% <100.00%> (-0.11%)`	⬇️
src/replication.c	`87.14% <100.00%> (-0.04%)`	⬇️

... and 13 files with indirect coverage changes

tests/integration/dual-channel-replication.tcl

Co-authored-by: ranshid <[email protected]> Signed-off-by: naglera <[email protected]>

Signed-off-by: naglera <[email protected]>

ranshid · 2024-07-29T13:06:42Z

@naglera please fix the issue headline as it now include 2 fixes. please also add explanation for the fixes in the top comment

Signed-off-by: naglera <[email protected]>

madolson · 2024-07-29T21:16:39Z

There are some new errors I haven't seen before in the extended tests: https://github.com/valkey-io/valkey/actions/runs/10146979329/job/28056324528?pr=837

Signed-off-by: naglera <[email protected]>

On busy hosts, rdb-key-save-delay may pause the process for more then expected due to long recurreing context switches Signed-off-by: naglera <[email protected]>

…lly succeed" This reverts commit 10c73e5. Signed-off-by: naglera <[email protected]>

This reverts commit 5f1cd17. Signed-off-by: naglera <[email protected]>

1.Replica recover rdb-connection killed 2.Replica recover main-connection killed We ca only catch bgsave while in progress or expect replica to compleat the sync but not both. Using rdb-key-save-delay is not predictable enough when machine has high load. Signed-off-by: naglera <[email protected]>

Signed-off-by: naglera <[email protected]>

1. Test replica's buffer limit reached 2. dual-channel-replication fails when primary diskless disabled In both cases we should not count on rdb-key-save-delay to be precise on machines with high load. Signed-off-by: naglera <[email protected]>

Test replica unable to join dual channel replication sync after started we should not count on rdb-key-save-delay to be precise on machines with high load. Signed-off-by: naglera <[email protected]>

…rite. Dual-channel file descriptors are set to blocking mode in the main thread, followed by a blocking write. This sets the file descriptors to non-blocking if TLS is used (see `connTLSSyncWrite()`). The child process runs in blocking mode. If a write operation fails to write the entire buffer, SSL returns `SSL_ERROR_WANT_WRITE`. This error is ignored, causing the replica to fail in loading the RDB. This change sets file descriptors after the blocking write. Signed-off-by: xbasel <[email protected]>

Signed-off-by: Madelyn Olson <[email protected]>

Signed-off-by: naglera <[email protected]>

Fix "Test dual-channel-replication primary reject set-rdb-client after client killed". When replica is paused rdb child process can't recognize connection closed. Need to resume the replica in order to fail the sync. Signed-off-by: naglera <[email protected]>

When using blocking connection we can't normally close the connection at the main process context, since it may block the main proc if the replica is not responding. We are also unable to skip this since otherwise child process will continue the save. Signed-off-by: naglera <[email protected]>

- dual-channel-replication fails when primary diskless disabled - Replica recover rdb-connection killed In both tests we should make child proces sleep for shorter intervals so the save will be terminated on time Signed-off-by: naglera <[email protected]>

This reverts commit 75ff15d. Signed-off-by: naglera <[email protected]>

…nsfer error Currently lastbgsave_status is used in bgsave or disk-replication, and the target is the disk. In valkey-io#60, we update it when transfer error, i think it is mainly used in tests, so we can use log to replace it. It changes lastbgsave_status to err in this case, but it is strange that it does not set ok or err in the above if and the following else. Also noted this will affect stop-writes-on-bgsave-error. Signed-off-by: Binbin <[email protected]>

dual-channel-replication fails when primary diskless disabled - we should wait for bgproc to exit Signed-off-by: naglera <[email protected]>

src/networking.c

@xbasel

- Fix TLS bug where connection were shutdown by primary's main process while the child process was still writing- causing main process to be blocked. - TLS connection fix -file descriptors are set to blocking mode in the main thread, followed by a blocking write. This sets the file descriptors to non-blocking if TLS is used (see `connTLSSyncWrite()`) (@xbasel). - Improve the reliability of dual-channel tests. Modify the pause mechanism to verify process status directly, rather than relying on log. - Ensure that `server.repl_offset` and `server.replid` are updated correctly when dual channel synchronization completes successfully. Thist led to failures in replication tests that validate replication IDs or compare replication offsets. --------- Signed-off-by: naglera <[email protected]> Signed-off-by: naglera <[email protected]> Signed-off-by: xbasel <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: Binbin <[email protected]> Co-authored-by: ranshid <[email protected]> Co-authored-by: xbasel <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Binbin <[email protected]> Signed-off-by: mwish <[email protected]>

@xbasel

- Fix TLS bug where connection were shutdown by primary's main process while the child process was still writing- causing main process to be blocked. - TLS connection fix -file descriptors are set to blocking mode in the main thread, followed by a blocking write. This sets the file descriptors to non-blocking if TLS is used (see `connTLSSyncWrite()`) (@xbasel). - Improve the reliability of dual-channel tests. Modify the pause mechanism to verify process status directly, rather than relying on log. - Ensure that `server.repl_offset` and `server.replid` are updated correctly when dual channel synchronization completes successfully. Thist led to failures in replication tests that validate replication IDs or compare replication offsets. --------- Signed-off-by: naglera <[email protected]> Signed-off-by: naglera <[email protected]> Signed-off-by: xbasel <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: Binbin <[email protected]> Co-authored-by: ranshid <[email protected]> Co-authored-by: xbasel <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Binbin <[email protected]> Signed-off-by: mwish <[email protected]>

@xbasel

- Fix TLS bug where connection were shutdown by primary's main process while the child process was still writing- causing main process to be blocked. - TLS connection fix -file descriptors are set to blocking mode in the main thread, followed by a blocking write. This sets the file descriptors to non-blocking if TLS is used (see `connTLSSyncWrite()`) (@xbasel). - Improve the reliability of dual-channel tests. Modify the pause mechanism to verify process status directly, rather than relying on log. - Ensure that `server.repl_offset` and `server.replid` are updated correctly when dual channel synchronization completes successfully. Thist led to failures in replication tests that validate replication IDs or compare replication offsets. --------- Signed-off-by: naglera <[email protected]> Signed-off-by: naglera <[email protected]> Signed-off-by: xbasel <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: Binbin <[email protected]> Co-authored-by: ranshid <[email protected]> Co-authored-by: xbasel <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Binbin <[email protected]>

@xbasel

- Fix TLS bug where connection were shutdown by primary's main process while the child process was still writing- causing main process to be blocked. - TLS connection fix -file descriptors are set to blocking mode in the main thread, followed by a blocking write. This sets the file descriptors to non-blocking if TLS is used (see `connTLSSyncWrite()`) (@xbasel). - Improve the reliability of dual-channel tests. Modify the pause mechanism to verify process status directly, rather than relying on log. - Ensure that `server.repl_offset` and `server.replid` are updated correctly when dual channel synchronization completes successfully. Thist led to failures in replication tests that validate replication IDs or compare replication offsets. --------- Signed-off-by: naglera <[email protected]> Signed-off-by: naglera <[email protected]> Signed-off-by: xbasel <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: Binbin <[email protected]> Co-authored-by: ranshid <[email protected]> Co-authored-by: xbasel <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Binbin <[email protected]>

This change prevents unintended side effects on connection state and improves consistency with non-TLS sync operations. For example, when invoking `connTLSSyncRead` with a blocking file descriptor, the mode is switched to non-blocking upon `connTLSSyncRead` exit. If the code assumes the file descriptor remains blocking and calls the normal `read` expecting it to block, it may result in a short read. This caused a crash in dual-channel, which was fixed in this PR by relocating `connBlock()`: #837 Signed-off-by: xbasel <[email protected]>

This PR is based on: #12109 valkey-io/valkey#60 Closes: #11678 **Motivation** During a full sync, when master is delivering RDB to the replica, incoming write commands are kept in a replication buffer in order to be sent to the replica once RDB delivery is completed. If RDB delivery takes a long time, it might create memory pressure on master. Also, once a replica connection accumulates replication data which is larger than output buffer limits, master will kill replica connection. This may cause a replication failure. The main benefit of the rdb channel replication is streaming incoming commands in parallel to the RDB delivery. This approach shifts replication stream buffering to the replica and reduces load on master. We do this by opening another connection for RDB delivery. The main channel on replica will be receiving replication stream while rdb channel is receiving the RDB. This feature also helps to reduce master's main process CPU load. By opening a dedicated connection for the RDB transfer, the bgsave process has access to the new connection and it will stream RDB directly to the replicas. Before this change, due to TLS connection restriction, the bgsave process was writing RDB bytes to a pipe and the main process was forwarding it to the replica. This is no longer necessary, the main process can avoid these expensive socket read/write syscalls. It also means RDB delivery to replica will be faster as it avoids this step. In summary, replication will be faster and master's performance during full syncs will improve. **Implementation steps** 1. When replica connects to the master, it sends 'rdb-channel-repl' as part of capability exchange to let master to know replica supports rdb channel. 2. When replica lacks sufficient data for PSYNC, master sends +RDBCHANNELSYNC reply with replica's client id. As the next step, the replica opens a new connection (rdb-channel) and configures it against the master with the appropriate capabilities and requirements. It also sends given client id back to master over rdbchannel, so that master can associate these channels. (initial replica connection will be referred as main-channel) Then, replica requests fullsync using the RDB channel. 3. Prior to forking, master attaches the replica's main channel to the replication backlog to deliver replication stream starting at the snapshot end offset. 4. The master main process sends replication stream via the main channel, while the bgsave process sends the RDB directly to the replica via the rdb-channel. Replica accumulates replication stream in a local buffer, while the RDB is being loaded into the memory. 5. Once the replica completes loading the rdb, it drops the rdb channel and streams the accumulated replication stream into the db. Sync is completed. **Some details** - Currently, rdbchannel replication is supported only if `repl-diskless-sync` is enabled on master. Otherwise, replication will happen over a single connection as in before. - On replica, there is a limit to replication stream buffering. Replica uses a new config `replica-full-sync-buffer-limit` to limit number of bytes to accumulate. If it is not set, replica inherits `client-output-buffer-limit <replica>` hard limit config. If we reach this limit, replica stops accumulating. This is not a failure scenario though. Further accumulation will happen on master side. Depending on the configured limits on master, master may kill the replica connection. **API changes in INFO output:** 1. New replica state: `send_bulk_and_stream`. Indicates full sync is still in progress for this replica. It is receiving replication stream and rdb in parallel. ``` slave0:ip=127.0.0.1,port=5002,state=send_bulk_and_stream,offset=0,lag=0 ``` Replica state changes in steps: - First, replica sends psync and receives +RDBCHANNELSYNC :`state=wait_bgsave` - After replica connects with rdbchannel and delivery starts: `state=send_bulk_and_stream` - After full sync: `state=online` 2. On replica side, replication stream buffering metrics: - replica_full_sync_buffer_size: Currently accumulated replication stream data in bytes. - replica_full_sync_buffer_peak: Peak number of bytes that this instance accumulated in the lifetime of the process. ``` replica_full_sync_buffer_size:20485 replica_full_sync_buffer_peak:1048560 ``` **API changes in CLIENT LIST** In `client list` output, rdbchannel clients will have 'C' flag in addition to 'S' replica flag: ``` id=11 addr=127.0.0.1:39108 laddr=127.0.0.1:5001 fd=14 name= age=5 idle=5 flags=SC db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1920 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=0 ``` **Config changes:** - `replica-full-sync-buffer-limit`: Controls how much replication data replica can accumulate during rdbchannel replication. If it is not set, a value of 0 means replica will inherit `client-output-buffer-limit <replica>` hard limit config to limit accumulated data. - `repl-rdb-channel` config is added as a hidden config. This is mostly for testing as we need to support both rdbchannel replication and the older single connection replication (to keep compatibility with older versions and rdbchannel replication will not be enabled if repl-diskless-sync is not enabled). it affects both the master (not to respond to rdb channel requests), and the replica (not to declare capability) **Internal API changes:** Changes that were introduced to Redis replication: - New replication capability is added to replconf command: `capa rdb-channel-repl`. Indicates replica is capable of rdb channel replication. Replica sends it when it connects to master along with other capabilities. - If replica needs fullsync, master replies `+RDBCHANNELSYNC <client-id>` to the replica's PSYNC request. - When replica opens rdbchannel connection, as part of replconf command, it sends `rdb-channel 1` to let master know this is rdb channel. Also, it sends `main-ch-client-id <client-id>` as part of replconf command so master can associate channels. **Testing:** As rdbchannel replication is enabled by default, we run whole test suite with it. Though, as we need to support both rdbchannel and single connection replication, we'll be running some tests twice with `repl-rdb-channel yes/no` config. **Replica state diagram** ``` * * Replica state machine * * * Main channel state * ┌───────────────────┐ * │RECEIVE_PING_REPLY │ * └────────┬──────────┘ * │ +PONG * ┌────────▼──────────┐ * │SEND_HANDSHAKE │ RDB channel state * └────────┬──────────┘ ┌───────────────────────────────┐ * │+OK ┌───► RDB_CH_SEND_HANDSHAKE │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_AUTH_REPLY │ │ REPLCONF main-ch-client-id <clientid> * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_AUTH_REPLY │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_PORT_REPLY │ │ │ +OK * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_REPLCONF_REPLY│ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_IP_REPLY │ │ │ +OK * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_FULLRESYNC │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_CAPA_REPLY │ │ │+FULLRESYNC * └────────┬──────────┘ │ │Rdb delivery * │ │ ┌──────────────▼────────────────┐ * ┌────────▼──────────┐ │ │ RDB_CH_RDB_LOADING │ * │SEND_PSYNC │ │ └──────────────┬────────────────┘ * └─┬─────────────────┘ │ │ Done loading * │PSYNC (use cached-master) │ │ * ┌─▼─────────────────┐ │ │ * │RECEIVE_PSYNC_REPLY│ │ ┌────────────►│ Replica streams replication * └─┬─────────────────┘ │ │ │ buffer into memory * │ │ │ │ * │+RDBCHANNELSYNC client-id │ │ │ * ├──────┬───────────────────┘ │ │ * │ │ Main channel │ │ * │ │ accumulates repl data │ │ * │ ┌──▼────────────────┐ │ ┌───────▼───────────┐ * │ │ REPL_TRANSFER ├───────┘ │ CONNECTED │ * │ └───────────────────┘ └────▲───▲──────────┘ * │ │ │ * │ │ │ * │ +FULLRESYNC ┌───────────────────┐ │ │ * ├────────────────► REPL_TRANSFER ├────┘ │ * │ └───────────────────┘ │ * │ +CONTINUE │ * └──────────────────────────────────────────────┘ */ ``` ----- This PR also contains changes and ideas from: valkey-io/valkey#837 valkey-io/valkey#1173 valkey-io/valkey#804 valkey-io/valkey#945 valkey-io/valkey#989 --------- Co-authored-by: Yuan Wang <[email protected]> Co-authored-by: debing.sun <[email protected]> Co-authored-by: Moti Cohen <[email protected]> Co-authored-by: naglera <[email protected]> Co-authored-by: Amit Nagler <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Binbin <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]> Co-authored-by: Ping Xie <[email protected]> Co-authored-by: Ran Shidlansik <[email protected]> Co-authored-by: ranshid <[email protected]> Co-authored-by: xbasel <[email protected]>

Wait for pause should check process status instead of logs

0544954

Signed-off-by: naglera <[email protected]>

ranshid reviewed Jul 29, 2024

View reviewed changes

tests/integration/dual-channel-replication.tcl Outdated Show resolved Hide resolved

naglera and others added 2 commits July 29, 2024 11:55

Update tests/integration/dual-channel-replication.tcl

f9a6539

Co-authored-by: ranshid <[email protected]> Signed-off-by: naglera <[email protected]>

update primary_repl_offset when dual-channel sync is done

a28b77a

Signed-off-by: naglera <[email protected]>

update server.replid on dual-channel sync success

7b87ab0

Signed-off-by: naglera <[email protected]>

naglera changed the title ~~Wait for pause should check process status instead of logs~~ Fix dual-channel-replication related issues Jul 29, 2024

madolson added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Jul 29, 2024

test documentation

5acc8a1

Signed-off-by: naglera <[email protected]>

naglera and others added 18 commits July 30, 2024 09:59

Try to deflake dual-channel-tests

5f1cd17

Signed-off-by: naglera <[email protected]>

Avoid using rdb-key-save-delay when expecting sync to eventually succeed

10c73e5

On busy hosts, rdb-key-save-delay may pause the process for more then expected due to long recurreing context switches Signed-off-by: naglera <[email protected]>

Revert "Avoid using rdb-key-save-delay when expecting sync to eventua…

539c3cd

…lly succeed" This reverts commit 10c73e5. Signed-off-by: naglera <[email protected]>

Revert "Try to deflake dual-channel-tests"

dec13f2

This reverts commit 5f1cd17. Signed-off-by: naglera <[email protected]>

Fix logs

411bb7a

Signed-off-by: naglera <[email protected]>

Fix wait_and_resume_process for macos tests

e365f7f

Signed-off-by: naglera <[email protected]>

Stabilize dual-channel-replication tests

b598f65

1. Test replica's buffer limit reached 2. dual-channel-replication fails when primary diskless disabled In both cases we should not count on rdb-key-save-delay to be precise on machines with high load. Signed-off-by: naglera <[email protected]>

Stabilize dual-channel-replication test

1ccff18

Test replica unable to join dual channel replication sync after started we should not count on rdb-key-save-delay to be precise on machines with high load. Signed-off-by: naglera <[email protected]>

Update src/rdb.c

53788df

Signed-off-by: Madelyn Olson <[email protected]>

Fix log in freeReplicaReferencedReplBuffer

026c90f

Signed-off-by: naglera <[email protected]>

Fix dual channel tests issues

75ff15d

- dual-channel-replication fails when primary diskless disabled - Replica recover rdb-connection killed In both tests we should make child proces sleep for shorter intervals so the save will be terminated on time Signed-off-by: naglera <[email protected]>

Revert "Fix dual channel tests issues"

278ad0d

This reverts commit 75ff15d. Signed-off-by: naglera <[email protected]>

Fix dual-channel test

575d549

dual-channel-replication fails when primary diskless disabled - we should wait for bgproc to exit Signed-off-by: naglera <[email protected]>

enjoy-binbin mentioned this pull request Aug 12, 2024

Dual channel replication should not update lastbgsave_status when transfer error #811

Merged

madolson reviewed Aug 12, 2024

View reviewed changes

src/networking.c Show resolved Hide resolved

madolson mentioned this pull request Aug 12, 2024

Remove direct reference to conn->fd and use the connection abstraction #898

Open

madolson approved these changes Aug 12, 2024

View reviewed changes

madolson merged commit 27fce29 into valkey-io:unstable Aug 12, 2024
55 of 56 checks passed

This was referenced Aug 27, 2024

Dual-channel-replication: set connection to blocking after blocking-w… #863

Closed

Dual-channel-replication: set connection to blocking in child process #862

Closed

madolson added the release-notes This issue should get a line item in the release notes label Sep 2, 2024

madolson changed the title ~~Fix dual-channel-replication related issues~~ Fix bugs related to synchronous TLS connections blocking Sep 2, 2024

xbasel mentioned this pull request Nov 21, 2024

Preserve original fd blocking state in TLS I/O operations #1298

Merged

tezc mentioned this pull request Jan 8, 2025

Rdb channel replication redis/redis#13732

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bugs related to synchronous TLS connections blocking #837

Fix bugs related to synchronous TLS connections blocking #837

naglera commented Jul 29, 2024 •

edited

Loading

ranshid Jul 29, 2024

codecov bot commented Jul 29, 2024 •

edited

Loading

ranshid commented Jul 29, 2024

madolson commented Jul 29, 2024

Fix bugs related to synchronous TLS connections blocking #837

Fix bugs related to synchronous TLS connections blocking #837

Conversation

naglera commented Jul 29, 2024 • edited Loading

ranshid Jul 29, 2024

Choose a reason for hiding this comment

codecov bot commented Jul 29, 2024 • edited Loading

Codecov Report

ranshid commented Jul 29, 2024

madolson commented Jul 29, 2024

naglera commented Jul 29, 2024 •

edited

Loading

codecov bot commented Jul 29, 2024 •

edited

Loading