Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically resize the WSTP if availableProcessors() changes #3909

Open
armanbilge opened this issue Dec 1, 2023 · 3 comments
Open

Dynamically resize the WSTP if availableProcessors() changes #3909

armanbilge opened this issue Dec 1, 2023 · 3 comments

Comments

@armanbilge
Copy link
Member

We currently rely on Runtime#availableProcessors to size the work-stealing threadpool. The JavaDoc warns that

Applications that are sensitive to the number of available processors should therefore occasionally poll this property and adjust their resource usage appropriately.

A concrete example of an environment where the number of available processors may change is Kubernetes with the Vertical Pod Autoscaler.

The Kubernetes Vertical Pod Autoscaler automatically adjusts the CPU and memory reservations for your Pods to help "right size" your applications.

Marking as "experiment" since this would be a non-trivial enhancement :) and maybe not worth the complexity.

@durban
Copy link
Contributor

durban commented Dec 6, 2023

I'm nitpicking, but I don't think the VPA actually dynamically changes container resources. Based on the documentation, it seems to recreate pods with the changed resources.

However, it is possible to dynamically change CPU requests/limits (see here), so this issue is definitely relevant. This k8s feature is somewhat new, and behind a feature gate, but it works. (I did try it, and observed the return value of availableProcessors() changing in a single JVM.)

(I assume the VPA will also use this feature in the future, see here.)

Interestingly, the ForkJoinPool doesn't seem to follow the recommendation in the availableProcessors() javadoc (it only calls it once).

@djspiewak
Copy link
Member

Interestingly, the ForkJoinPool doesn't seem to follow the recommendation in the availableProcessors() javadoc (it only calls it once).

This is fascinating. I've never seen this note before. For posterity:

This value may change during a particular invocation of the virtual machine. Applications that are sensitive to the number of available processors should therefore occasionally poll this property and adjust their resource usage appropriately.

So the problem is that actually using this information is somewhat difficult. Resizing the number of worker threads is kinda possible in theory, but I can see a whole host of objections very quickly:

  • There are a number of places where we are almost certainly getting higher performance because the worker count is fixed for the lifetime of the application
  • The number of race conditions we would have to think through on this bit of state is dizzying. Also have to think carefully about publication of writes, since right now we just write it once before the threads start and then go to town, meaning it's always visible
  • Resizing upward is comparatively easy, since we're just adding workers which can then steal and take from the external and what not. Tasks would rebalance relatively quickly and life would be good. Resizing down is much harder, since we would need to first take workers out of rotation, then spill all their work, then wait for the worker to quiesce (while attempting to aggressively spill any excess work on each iteration), with the understanding that quiescence isn't guaranteed.
    • We could also put the worker into a state where, at the next suspension, it simply refuses to continue working and forcibly spills whatever is left. That might be better, though it would cause noticeable latency spikes
  • How will this interact with blocking?

@djspiewak
Copy link
Member

Random idea: if we want to scale down, we can probably do it just by abusing the blocking shunt and actively spilling to the external queue. More specifically, mark the thread as a blocker and do the appropriate bookkeeping, then take the entire local queue and spill it to the external queue (this would replace the normal step of creating a replacement thread to take over the local state), then simply park normally. The standard timeout mechanism would eventually come along and clean up the old worker thread, and it could be used as a handily pre-allocated blocker in the interim.

This doesn't resolve any of the problems around varying the worker pool size or publishing any of the associated state, but it does suggest a functional strategy for scaling down which doesn't require us to reinvent a very complicated wheel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants