Followup items from #758 Async IO threads #761

madolson · 2024-07-09T03:03:11Z

uriyage · 2024-08-06T21:27:12Z

Avoid recalculating slots when the I/O thread has already done so

zuiderkwast · 2024-08-27T21:31:41Z

~~Offload PING command~~

Edit: Let's not do it. See discussion below.

rjd15372 · 2024-09-26T11:41:41Z

Hi all, I'll start working on the "Offload PING command".

rjd15372 · 2024-10-08T11:23:56Z

I've been working on the offload of the PING command and I came up with two approaches that I would like your opinion on.

Approach A)

The IO thread that is choose to handle the read of the PING command, read, parses, and executes the command, and then also writes the reply, before processing a new IO job,

Approach B)

After the PING command has been read and parsed, the main thread schedules the execution of the PING command (enqueues a new job in the IO queue). The IO thread that picks this "execution" job, executes the job and enqueues the write job in the IO queue.

Approach A is more simple, but it might block the IO thread if the socket is not available to write, while approach B is more complex but more asynchronous.

I'm in favor of approach B, even though it will require a generalization of the IO threads component to support a different kind (non-IO job), and also support the concurrent insertion of jobs in the shared queue.

@zuiderkwast @madolson thoughts?

zuiderkwast · 2024-10-08T15:55:42Z

@uriyage, you know IO threading better than anyone. Can you help?

@rjd15372 Approach B sounds like it will not be worth the overhead. Executing PING is very fast, maybe even faster executing in the main thread than delegating it to some thread to execute it.

I like approach A, which should add zero work to the main thread if the socket is writable, which is probably the normal case. If it's not writable, we need to register a write handler in the event loop and add the client to the global list server.clients_pending_io_write, which is not thread safe, so this probably needs to be done by the main thread (unless we refactor this in some way).

When the main thread delegates a read-and-parse-command job to an IO thread, it does among other things..

    c->io_read_state = CLIENT_PENDING_IO;
    connSetPostponeUpdateState(c->conn, 1);
    IOJobQueue_push(jq, ioThreadReadQueryFromClient, c);
    c->flag.pending_read = 1;
    listLinkNodeTail(server.clients_pending_io_read, &c->pending_read_list_node);

Later, the main thread checks if the clients are ready by calling processIOThreadsReadDone which loops over all clients in this list. Here, can we indicate that instead of parsing a command, the thread wants to write something to the client instead? We have the c->io_read_state and c->io_write_state variables to indicate these things back to the main thread. Or we can add a new flag in c->read_flags for this special case. I'm not quite sure about this. @uriyage WDYT?

Now, I'm just thinking, is it possible that the main thread has delegated read-and-parse for a client to one IO thread and also delegated write to that same client to another IO thread? That may be a problem... Have we've already prevented this due to TLS?

uriyage · 2024-10-09T14:48:08Z

@zuiderkwast @rjd15372
Approach A could lead to potential issues if the client already has pending data in the reply buffer or if the main thread simultaneously writes to the client after delegating the read (e.g., if the client is subscribed to a channel or MONITOR).
As you mentioned, we have read flags that the main thread can use to indicate whether the I/O thread can write to the client (when the client has no pending replies and is not subscribed or in a similar state). We currently use the same flags to indicate if the I/O thread can parse a command (if the client is not blocked) or if it should just read.
When the read job returns, we will need to:

Use flags to indicate that a command was executed
Update the cmd_stat and other relevant statistics
Handle the out-bytes stats as well

If the socket is not writeable, we will have to store the command output (or partial output if we succeed in writing part of it) in the client's reply buffer to be sent later.

Overall, I believe the performance gain from both approaches is marginal and may not justify the added complexity. Perhaps we should consider this approach only for more time-consuming commands that can be delegated to the I/O threads.

Regarding your question, we already ensure that the same thread that reads is the one that writes to the client.

rjd15372 · 2024-10-09T14:58:34Z

@uriyage @zuiderkwast
I also agree that for the specific case of PING, we will not gain a lot by doing this, but the changes that we make for PING will be the same for any other commands that we will want to offload in the future.

My idea is to make it generic, and add a new command flag that states if the command can be offloaded.

zuiderkwast · 2024-10-09T15:13:58Z

Thanks Uri! Good points. I guess offloading PING may be not worth it then.

I don't see the point of doing it for PING just as a preparation for other commands. It's better to do it for another command where it actually matters. I don't know any command that doesn't need to access any global data though. Do you?

Perhaps it's better to focus on another of the follow-up items?

uriyage · 2024-11-07T18:55:53Z

Remove err_clean in tls

#761

uriyage · 2024-11-21T19:38:54Z

Offload TLS negotiation.
Offload TLS negotiation to I/O threads #1338

hwware · 2024-11-21T22:00:24Z

I've been working on the offload of the PING command and I came up with two approaches that I would like your opinion on.

Approach A)

The IO thread that is choose to handle the read of the PING command, read, parses, and executes the command, and then also writes the reply, before processing a new IO job,

Approach B)

After the PING command has been read and parsed, the main thread schedules the execution of the PING command (enqueues a new job in the IO queue). The IO thread that picks this "execution" job, executes the job and enqueues the write job in the IO queue.

Approach A is more simple, but it might block the IO thread if the socket is not available to write, while approach B is more complex but more asynchronous.

I'm in favor of approach B, even though it will require a generalization of the IO threads component to support a different kind (non-IO job), and also support the concurrent insertion of jobs in the shared queue.

@zuiderkwast @madolson thoughts?

Approach B is what we want, PING command, even including hello command, they are health status commands of the server, they should be executed by separated thread without any data command blocking.

madolson · 2024-11-21T22:03:44Z

I don't want to offload the ping command. I feel like part of the job of the ping command is to determine the health of the server, so it could be offloaded even if the server is in some type of dead loop.

arukiidou · 2024-12-02T14:55:14Z

Optimize IO thread offload for modified argv #1360

uriyage · 2024-12-09T09:52:12Z

Optimize Client struct memory foot print.
client struct: lazy init components and optimize struct layout #1405

uriyage · 2024-12-17T16:22:25Z

Offload replica and master traffic to IO threads. Offload reading the replication stream to IO threads #1449

## TLS Negotiation Offloading to I/O Threads ### Overview This PR introduces the ability to offload TLS handshake negotiations to I/O threads, significantly improving performance under high TLS connection loads. ### Key Changes - Added infrastructure to offload TLS negotiations to I/O threads - Refactored SSL event handling to allow I/O threads modify conn flags. - Introduced new connection flag to identify client connections ### Performance Impact Testing with 650 clients with SET commands and 160 new TLS connections per second in the background: #### Throughput Impact of new TLS connections - **With Offloading**: Minimal impact (1050K → 990K ops/sec) - **Without Offloading**: Significant drop (1050K → 670K ops/sec) #### New Connection Rate - **With Offloading**: - 1,757 conn/sec - **Without Offloading**: - 477 conn/sec ### Implementation Details 1. **Main Thread**: - Initiates negotiation-offload jobs to I/O threads - Adds connections to pending-read clients list (using existing read offload mechanism) - Post-negotiation handling: - Creates read/write events if needed for incomplete negotiations - Calls accept handler for completed negotiations 2. **I/O Thread**: - Performs TLS negotiation - Updates connection flags based on negotiation result Related issue:#761 --------- Signed-off-by: Uri Yagelnik <[email protected]> Signed-off-by: ranshid <[email protected]> Co-authored-by: ranshid <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>

Support Primary client IO offload. Related issue: #761 --------- Signed-off-by: Uri Yagelnik <[email protected]>

uriyage mentioned this issue Aug 6, 2024

Improve multithreaded performance with memory prefetching #861

Merged

zuiderkwast changed the title ~~Followup items from https://github.com/valkey-io/valkey/pull/758~~ Followup items from #758 Async IO threads Aug 29, 2024

arukiidou mentioned this issue Sep 11, 2024

[NEW] Reaching 1 million requests per second on a single Valkey instance #22

Closed

madolson mentioned this issue Oct 25, 2024

Increase the IO_THREADS_MAX_NUM. #1220

Merged

uriyage mentioned this issue Nov 21, 2024

Offload TLS negotiation to I/O threads #1338

Merged

uriyage mentioned this issue Dec 9, 2024

client struct: lazy init components and optimize struct layout #1405

Open

uriyage mentioned this issue Dec 17, 2024

Offload reading the replication stream to IO threads #1449

Merged

zuiderkwast pushed a commit that referenced this issue Jan 2, 2025

Offload reading the replication stream to IO threads (#1449)

35abb68

Support Primary client IO offload. Related issue: #761 --------- Signed-off-by: Uri Yagelnik <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Followup items from #758 Async IO threads #761

Followup items from #758 Async IO threads #761

madolson commented Jul 9, 2024 •

edited by zuiderkwast

Loading

uriyage commented Aug 6, 2024

zuiderkwast commented Aug 27, 2024 •

edited

Loading

rjd15372 commented Sep 26, 2024

rjd15372 commented Oct 8, 2024

zuiderkwast commented Oct 8, 2024

uriyage commented Oct 9, 2024

rjd15372 commented Oct 9, 2024 •

edited

Loading

zuiderkwast commented Oct 9, 2024

uriyage commented Nov 7, 2024

uriyage commented Nov 21, 2024

hwware commented Nov 21, 2024 •

edited

Loading

madolson commented Nov 21, 2024

arukiidou commented Dec 2, 2024

uriyage commented Dec 9, 2024 •

edited by zuiderkwast

Loading

uriyage commented Dec 17, 2024

Followup items from #758 Async IO threads #761

Followup items from #758 Async IO threads #761

Comments

madolson commented Jul 9, 2024 • edited by zuiderkwast Loading

uriyage commented Aug 6, 2024

zuiderkwast commented Aug 27, 2024 • edited Loading

rjd15372 commented Sep 26, 2024

rjd15372 commented Oct 8, 2024

zuiderkwast commented Oct 8, 2024

uriyage commented Oct 9, 2024

rjd15372 commented Oct 9, 2024 • edited Loading

zuiderkwast commented Oct 9, 2024

uriyage commented Nov 7, 2024

uriyage commented Nov 21, 2024

hwware commented Nov 21, 2024 • edited Loading

madolson commented Nov 21, 2024

arukiidou commented Dec 2, 2024

uriyage commented Dec 9, 2024 • edited by zuiderkwast Loading

uriyage commented Dec 17, 2024

madolson commented Jul 9, 2024 •

edited by zuiderkwast

Loading

zuiderkwast commented Aug 27, 2024 •

edited

Loading

rjd15372 commented Oct 9, 2024 •

edited

Loading

hwware commented Nov 21, 2024 •

edited

Loading

uriyage commented Dec 9, 2024 •

edited by zuiderkwast

Loading