move clientCron onto a separate timer #1387

JimB123 · 2024-12-03T18:17:58Z

The serverCron() function contains a variety of maintenance functions and is set up as a timer job, configured to run at a certain rate (hz). The default rate is 10hz (every 100ms).

One of the things that serverCron() does is to perform maintenance functions on connected clients. Since the number of clients is variable, and can be very large, this could cause latency spikes when the 100ms serverCron() task gets invoked. To combat those latency spikes, a feature called "dynamic-hz" was introduced. This feature will run serverCron() more often, if there are more clients. The clients get processed up to 200 at a time. The delay for serverCron() is shortened with the goal of processing all of the clients every second.

The result of this is that some of the other (non-client) maintenance functions also get (unnecessarily) run more often. Like cronUpdateMemoryStats() and databasesCron(). Logically, it doesn't make sense to run these functions more often, just because we happen to have more clients attached.

This PR separates client activities onto a separate, variable, timer. The "dynamic-hz" feature is eliminated. Now, serverCron will run at a standard configured rate. The separate clients cron will automatically adjust based on the number of clients. This has the added benefit that often, the 2 crons will fire during separate event loop invocations and will usually avoid the combined latency impact of doing both maintenance activities together.

The new timer follows the same rules which were established with the dynamic HZ feature.

The goal is to process all of the clients once per second
We never want to process more than 200 clients in a single invocation (MAX_CLIENTS_PER_CLOCK_TICK)
We always process at least 5 clients at a time (CLIENTS_CRON_MIN_ITERATIONS)
The minimum rate is determined by HZ

The delay (ms) for the new timer is also more precise, computing the number of milliseconds needed to achieve the goal of reaching all of the clients every second. The old dynamic-hz feature just performs a doubling of the HZ until the clients processing rate is achieved (i.e. delays of 100ms, 50ms, 25ms, 12ms...)

hpatro

Seems right to me. Are there any possible unintended behavior change due to decoupling this from serverCron? Would be good to callout.

src/server.c

enjoy-binbin

Make sense to me. The code looks correct to me

src/server.c

codecov · 2024-12-05T18:36:39Z

Codecov Report

Attention: Patch coverage is 90.62500% with 3 lines in your changes missing coverage. Please review.

Project coverage is 70.87%. Comparing base (9f8b174) to head (94e6c58).
Report is 119 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/server.c	89.65%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1387      +/-   ##
============================================
+ Coverage     70.68%   70.87%   +0.19%     
============================================
  Files           118      121       +3     
  Lines         63550    65180    +1630     
============================================
+ Hits          44919    46197    +1278     
- Misses        18631    18983     +352

Files with missing lines	Coverage Δ
src/config.c	`78.38% <100.00%> (+0.74%)`	⬆️
src/server.h	`100.00% <ø> (ø)`
src/server.c	`87.66% <89.65%> (+0.27%)`	⬆️

... and 61 files with indirect coverage changes

src/server.c

soloestoy · 2025-01-23T07:43:12Z

src/server.c

+    }
+    /* A separate timer for client maintenance.  Runs at a variable speed depending
+     * on the client count. */
+    if (aeCreateTimeEvent(server.el, 1, clientsTimerProc, NULL, NULL) == AE_ERR) {


i'd love to reuse the clientsCron, just like serverCron.

I intentionally didn't use "cron" because the name is ambiguous. This function IS a time proc as defined by AE.
Naming is consistent with the defrag time proc (activeDefragTimeProc)

We have other functions called "cron" that are simply functions called by a timer proc. Like replicationCron(), databasesCron().

I'm not a native English speaker : ), what is the specific difference between cron and timeproc? I'm more accustomed to cron, and the statistics related to its timing are also el_cron_duration.

If you look at ae.h, the callback is defined as aeTimeProc here: https://github.com/valkey-io/valkey/blob/unstable/src/ae.h#L70

"cron" seems to be used for anything that's invoked at some interval. This function is specifically an aeTimeProc.

I know ae.h, but the underlying logic doesn't necessarily need to be passed through to the upper layers. Logically speaking, cron is also suitable and easy to understand.

Maybe we'll have to agree to disagree on this one.

This function carries the specific signature defined in ae.h. It has the parameters of an aeTimeProc. I want to highlight that THIS is the function directly invoked by AE as a timer callback. This function performs the management part of the timer (how often it should fire, some metrics, etc.).

If you scroll down about 20 lines, you'll see that this function actually calls another function called clientsCron() which is largely the same as the original clientsCron() which existed before this function was created. This is the actual business logic (separate from timer management) that needs to get invoked.

So, consider it this way. clientsCron() really hasn't changed. It used to be called by serverCron() and now it's called by clientsTimeProc().

We could, for consistency, create a serverTimeProc to call serverCron. If so, like I've done here, that would separate the management of the timer from the work that's done when the timer is fired.

src/server.c

src/server.h

src/server.c

Signed-off-by: Jim Brunner <[email protected]>

Co-authored-by: Binbin <[email protected]> Signed-off-by: Jim Brunner <[email protected]>

Signed-off-by: Jim Brunner <[email protected]>

JimB123 force-pushed the dynamic-hz branch from f18e63c to 51db017 Compare December 3, 2024 18:19

zuiderkwast requested a review from enjoy-binbin December 3, 2024 22:00

hpatro reviewed Dec 4, 2024

View reviewed changes