Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip tracking clients OOM test when I/O threads are enabled #764

Merged
merged 4 commits into from
Aug 22, 2024

Conversation

uriyage
Copy link
Contributor

@uriyage uriyage commented Jul 9, 2024

Fix feedback loop in key eviction with tracking clients when using I/O threads.

This addresses the "client tracking don't cause eviction feedback loop" test failure.

Current issue:
Evicting keys while tracking clients or key space-notification exist creates a feedback loop when using I/O threads:

While evicting keys we send tracking async writes to I/O threads, preventing immediate release of tracking clients' COB memory consumption.

Before the I/O thread finishes its write, we recheck used_memory, which now includes the tracking clients' COB and thus continue to evict more keys.

Fix:
We will skip the test for now while IO threads are active. We may consider avoiding sending writes in processPendingWrites to I/O threads for tracking clients when we are out of memory.

Copy link

codecov bot commented Jul 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.52%. Comparing base (002d052) to head (9e963ee).
Report is 1 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #764      +/-   ##
============================================
- Coverage     70.56%   70.52%   -0.04%     
============================================
  Files           112      112              
  Lines         61509    61509              
============================================
- Hits          43403    43382      -21     
- Misses        18106    18127      +21     

see 16 files with indirect coverage changes

src/networking.c Outdated Show resolved Hide resolved
@ranshid
Copy link
Member

ranshid commented Jul 10, 2024

I would also change the top comment to indicate this is not only fixing the test. the issue is probably also related to keyspace notifications and pubsub.

@uriyage uriyage force-pushed the fix_tracking_clients_oom_test branch from 97bb6c4 to 0986964 Compare July 11, 2024 10:00
@uriyage
Copy link
Contributor Author

uriyage commented Jul 11, 2024

I would also change the top comment to indicate this is not only fixing the test. the issue is probably also related to keyspace notifications and pubsub.

I've updated the comment to include keyspace notifications. I'm unsure about pub/sub clients, as deleting keys won't create a feedback loop for them.

@zuiderkwast zuiderkwast added the bug Something isn't working label Jul 11, 2024
@ranshid
Copy link
Member

ranshid commented Jul 14, 2024

@uriyage the fix as written now LGTM. However I still have concerns that we are mostly aiming to solve an issue which is related to the tests and not very realistic scenario. I understand that when we write to io-threads the eviction cycles MIGHT be longer but I also think that avoid doing so might also cause issues for example when large pipelines are used which will keep us above the maxmemory longer. I think that maybe we can keep an open issue to investigate this farther and for the time being just avoid async-io in this specific test?
@zuiderkwast WDYT?

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense. Eviction feedback loop sounds like something that should be avoided. :)

@uriyage the fix as written now LGTM. However I still have concerns that we are mostly aiming to solve an issue which is related to the tests and not very realistic scenario. I understand that when we write to io-threads the eviction cycles MIGHT be longer but I also think that avoid doing so might also cause issues for example when large pipelines are used which will keep us above the maxmemory longer. I think that maybe we can keep an open issue to investigate this farther and for the time being just avoid async-io in this specific test?
@zuiderkwast WDYT?

I'm not completely aware of the impact and the affected scenarios. What are the worst cases and what are the most common scenario where this can happen?

Does it happen in a normal cache workload where LRU eviction is used?

Do we free just a few more keys than we should, or do the feedback loop make us free almost the whole database in the worst case?

src/networking.c Outdated Show resolved Hide resolved
@madolson madolson self-requested a review July 15, 2024 15:00
@ranshid
Copy link
Member

ranshid commented Jul 16, 2024

I think it makes sense. Eviction feedback loop sounds like something that should be avoided. :)

@uriyage the fix as written now LGTM. However I still have concerns that we are mostly aiming to solve an issue which is related to the tests and not very realistic scenario. I understand that when we write to io-threads the eviction cycles MIGHT be longer but I also think that avoid doing so might also cause issues for example when large pipelines are used which will keep us above the maxmemory longer. I think that maybe we can keep an open issue to investigate this farther and for the time being just avoid async-io in this specific test?
@zuiderkwast WDYT?

I'm not completely aware of the impact and the affected scenarios. What are the worst cases and what are the most common scenario where this can happen?

Does it happen in a normal cache workload where LRU eviction is used?

Do we free just a few more keys than we should, or do the feedback loop make us free almost the whole database in the worst case?

I was not saying that eviction feedback loop is not something which we would like to avoid. I was referring to the condition under check here. We are avoiding sending writes to io-threads when maxmemory is hit. Is this always the best decision? for example imagine case were clients are heavily pipelining. in case we take longer to get to reading from the sockets we might read much more data from clients each time which will also cause higher memory consumption on CIB (and potentially also on COB) right? I see pipelining much more common scenario than keyspace-notifications and client side caching (maybe I am wrong here) so I raised a concern. Maybe we can do this only in case we have many eviction events listeners?
Maybe we can also either fix or NOT fix this now and consider it later?

@zuiderkwast
Copy link
Contributor

I was referring to the condition under check here. We are avoiding sending writes to io-threads when maxmemory is hit. Is this always the best decision? for example imagine case were clients are heavily pipelining. in case we take longer to get to reading from the sockets we might read much more data from clients each time which will also cause higher memory consumption on CIB (and potentially also on COB) right? I see pipelining much more common scenario than keyspace-notifications and client side caching (maybe I am wrong here) so I raised a concern. Maybe we can do this only in case we have many eviction events listeners?
Maybe we can also either fix or NOT fix this now and consider it later?

@ranshid Yeah, as I said, I'm not familiar enough with the impact of this. But yeah, pipelining is for sure more common than client-side caching and keyspace notifications.

Avoiding the I/O threads for all clients whenever we reach maxmemory sounds like we're pulling the breaks too hard. Can't we skip the I/O threads just for clients with tracking, c->flag.tracking? (and maybe some more condition, pubsub?)

@madolson thanks for stepping in! I believe you can make a better decision than I can. :)

@madolson
Copy link
Member

For now I'm inclined to say let's wait and see how it get's used. There will likely be large swings in memory usage if you are using tracking, but they should stabilize once the IO threads offload the work. I'm guessing the specific example in the test likely has too few keys to actually let it stabilize. We haven't seen wide adoption of client side tracking and generally there aren't a lot of clients that are connected listening for keyspace notifications on "all" keys.

I would almost rather investigate ways to dampen the effects of the evictions, either by excluding some of the memory in the io threads output buffers or having some generic buffer that can be used, instead of stoping the usage of the IO threads. We don't want performance to suddenly drop either.

@uriyage uriyage mentioned this pull request Jul 18, 2024
@ranshid
Copy link
Member

ranshid commented Jul 24, 2024

For now I'm inclined to say let's wait and see how it get's used. There will likely be large swings in memory usage if you are using tracking, but they should stabilize once the IO threads offload the work. I'm guessing the specific example in the test likely has too few keys to actually let it stabilize. We haven't seen wide adoption of client side tracking and generally there aren't a lot of clients that are connected listening for keyspace notifications on "all" keys.

I would almost rather investigate ways to dampen the effects of the evictions, either by excluding some of the memory in the io threads output buffers or having some generic buffer that can be used, instead of stoping the usage of the IO threads. We don't want performance to suddenly drop either.

Couldn't agree more :)

@ranshid
Copy link
Member

ranshid commented Jul 24, 2024

@uriyage I suggest for now let's disable/skip iothreads offload for this specific test

@uriyage uriyage force-pushed the fix_tracking_clients_oom_test branch from 0986964 to d21ecc2 Compare July 24, 2024 07:28
@uriyage
Copy link
Contributor Author

uriyage commented Jul 24, 2024

Skipped the test with IO threads as suggested. Also corrected the incorrect condition for IO threads in two tests.

@ranshid
Copy link
Member

ranshid commented Jul 28, 2024

Thanks @uriyage in general I would like it if we introduce a tag "io-theads::skip" and validate in tags_acceptable but that can be introduced later.
Can you please just print some indication that the test was skipped?

@uriyage uriyage force-pushed the fix_tracking_clients_oom_test branch from 9abce83 to 56da0b2 Compare July 28, 2024 08:58
@uriyage
Copy link
Contributor Author

uriyage commented Jul 28, 2024

Thanks @uriyage in general I would like it if we introduce a tag "io-theads::skip" and validate in tags_acceptable but that can be introduced later. Can you please just print some indication that the test was skipped?

Thank you, @ranshid. I've changed it to use the tag as you suggested. It looks better now, and we get the prints. For example:
[ignore]: stats: eventloop metrics: Not supported in io-threads mode

Copy link
Member

@ranshid ranshid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@uriyage uriyage changed the title Do not send pending writes to I/O threads while OOM Skip tracking clients OOM test when I/O threads are enabled Jul 28, 2024
@ranshid ranshid added the pending-missing-dco For PRs that are blocked because they are missing a dco label Aug 19, 2024
@uriyage uriyage force-pushed the fix_tracking_clients_oom_test branch from 89d29f2 to 96c732f Compare August 20, 2024 03:55
@uriyage uriyage force-pushed the fix_tracking_clients_oom_test branch from 96c732f to cf4e80f Compare August 20, 2024 03:55
@madolson madolson merged commit 39f8bcb into valkey-io:unstable Aug 22, 2024
46 checks passed
enjoy-binbin added a commit to enjoy-binbin/valkey that referenced this pull request Aug 22, 2024
In valkey-io#764, we add a --io-threads mode in test, but forgot
to handle runtest-cluster, they are different framework.

Currently runtest-cluster does not support tags, and we
don't have plan to support it. And currently cluster tests
does not have any io-threads tests, so this PR just align
--io-threads option with valkey-io#764.

Signed-off-by: Binbin <[email protected]>
hwware pushed a commit that referenced this pull request Aug 22, 2024
In #764, we add a --io-threads mode in test, but forgot
to handle runtest-cluster, they are different framework.

Currently runtest-cluster does not support tags, and we
don't have plan to support it. And currently cluster tests
does not have any io-threads tests, so this PR just align
--io-threads option with #764.

Signed-off-by: Binbin <[email protected]>
madolson added a commit that referenced this pull request Sep 2, 2024
Fix feedback loop in key eviction with tracking clients when using I/O
threads.

Current issue:
Evicting keys while tracking clients or key space-notification exist
creates a feedback loop when using I/O threads:

While evicting keys we send tracking async writes to I/O threads,
preventing immediate release of tracking clients' COB memory
consumption.

Before the I/O thread finishes its write, we recheck used_memory, which
now includes the tracking clients' COB and thus continue to evict more
keys.

**Fix:**
We will skip the test for now while IO threads are active. We may
consider avoiding sending writes in `processPendingWrites` to I/O
threads for tracking clients when we are out of memory.

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: Madelyn Olson <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
madolson pushed a commit that referenced this pull request Sep 2, 2024
In #764, we add a --io-threads mode in test, but forgot
to handle runtest-cluster, they are different framework.

Currently runtest-cluster does not support tags, and we
don't have plan to support it. And currently cluster tests
does not have any io-threads tests, so this PR just align
--io-threads option with #764.

Signed-off-by: Binbin <[email protected]>
madolson added a commit that referenced this pull request Sep 3, 2024
Fix feedback loop in key eviction with tracking clients when using I/O
threads.

Current issue:
Evicting keys while tracking clients or key space-notification exist
creates a feedback loop when using I/O threads:

While evicting keys we send tracking async writes to I/O threads,
preventing immediate release of tracking clients' COB memory
consumption.

Before the I/O thread finishes its write, we recheck used_memory, which
now includes the tracking clients' COB and thus continue to evict more
keys.

**Fix:**
We will skip the test for now while IO threads are active. We may
consider avoiding sending writes in `processPendingWrites` to I/O
threads for tracking clients when we are out of memory.

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: Madelyn Olson <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
madolson pushed a commit that referenced this pull request Sep 3, 2024
In #764, we add a --io-threads mode in test, but forgot
to handle runtest-cluster, they are different framework.

Currently runtest-cluster does not support tags, and we
don't have plan to support it. And currently cluster tests
does not have any io-threads tests, so this PR just align
--io-threads option with #764.

Signed-off-by: Binbin <[email protected]>
PingXie pushed a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…o#764)

Fix feedback loop in key eviction with tracking clients when using I/O
threads.

Current issue:
Evicting keys while tracking clients or key space-notification exist
creates a feedback loop when using I/O threads:

While evicting keys we send tracking async writes to I/O threads,
preventing immediate release of tracking clients' COB memory
consumption.

Before the I/O thread finishes its write, we recheck used_memory, which
now includes the tracking clients' COB and thus continue to evict more
keys.

**Fix:**
We will skip the test for now while IO threads are active. We may
consider avoiding sending writes in `processPendingWrites` to I/O
threads for tracking clients when we are out of memory.

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: Madelyn Olson <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
Signed-off-by: Ping Xie <[email protected]>
PingXie pushed a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…o#764)

Fix feedback loop in key eviction with tracking clients when using I/O
threads.

Current issue:
Evicting keys while tracking clients or key space-notification exist
creates a feedback loop when using I/O threads:

While evicting keys we send tracking async writes to I/O threads,
preventing immediate release of tracking clients' COB memory
consumption.

Before the I/O thread finishes its write, we recheck used_memory, which
now includes the tracking clients' COB and thus continue to evict more
keys.

**Fix:**
We will skip the test for now while IO threads are active. We may
consider avoiding sending writes in `processPendingWrites` to I/O
threads for tracking clients when we are out of memory.

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: Madelyn Olson <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
Signed-off-by: Ping Xie <[email protected]>
PingXie pushed a commit to PingXie/valkey that referenced this pull request Sep 14, 2024
…o#764)

Fix feedback loop in key eviction with tracking clients when using I/O
threads.

Current issue:
Evicting keys while tracking clients or key space-notification exist
creates a feedback loop when using I/O threads:

While evicting keys we send tracking async writes to I/O threads,
preventing immediate release of tracking clients' COB memory
consumption.

Before the I/O thread finishes its write, we recheck used_memory, which
now includes the tracking clients' COB and thus continue to evict more
keys.

**Fix:**
We will skip the test for now while IO threads are active. We may
consider avoiding sending writes in `processPendingWrites` to I/O
threads for tracking clients when we are out of memory.

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: Madelyn Olson <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
Signed-off-by: Ping Xie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pending-missing-dco For PRs that are blocked because they are missing a dco
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants