-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Smarter portable defaults for CPU<>GPU dask interop #540
Comments
@lmeyerov thanks for moving the discussion here. If you have time, I think devs would also appreciate some motivating use cases or your thoughts on where mixing GPU/CPU workers is applicable. Pseudo would also be good |
Some examples we see:
Maybe also helpful are some code patterns:
|
I don't have recommendations yet (first try attempting this). Naively I setup a cluster in the following manner: Schedulerdask-scheduler CPU Workersdask-worker tcp://...:8786 --nprocs 1 --resources CPU=1.0 --name cpu-worker GPU Workersdask-cuda-worker tcp://...:8786 --resources GPU=1 --name gpu-worker Client
The above worked but I can see this being brittle in that missing an annotation can to lead to problems. I agree that automatic resource tagging for Are you asking for I'm also curious what you mean by:
Do you mean sparse data frames or sparse arrays ? If the latter, CuPy has grown some what recently to meet these needs |
Thanks. I was starting to get to that conclusion as the current happy path based on the slack conv. Doing some reworking here to be closer to that, I'm guessing it'll take a few weeks to see how this phase ends up. Been a lot of surprises up to here (ex: diff exn's mixing diff clients to same worker), and a sane dask Resource model that's concurrency friendly. Re:sparse, sparse Series. Typically dense when indexed by another col. Ex: Will share feedback as it gets clearer. |
Interim update: blazingsql context creation is getting wedged when combining |
I am also curious if this could interact with cluster scaling (and things like dask-gateway). Could each type of worker need to be tracked by some key and scaled independently? (In my case, I have a workflow with "big disk", "gpu" and "cpu" workers.) |
This issue has been labeled |
Hi folks, you might be interested in @madsbk recent work dask/distributed#4869 allowing workers to have multiple executors |
This issue has been labeled |
This issue has been labeled |
This got complicated enough that wanted to move from Slack --
Configuring and using dask cpu with dask gpu has been confusing, and is a reality both wrt hw (GPU hw is generally balanced with CPU core counts) and sw (a lot of code isn't GPU, despite our efforts ;-) ). It'd be good to get clarity in docs + defaults here:
Automating nthreads for cpu vs. gpu
We want a simple and portable config for our typical GPU case of launching on a single node with 1-8 GPUs and 4-128 cores, and write size-agnostic dask + dask-cuda code over it. That means never hard-coding the above in the config nor app and still getting reasonable resource utilization. I suspect this is true of most user. It'd also be good to predictably override this. I can't speak to multi-node clusters nor heterogenous ones.
Ideally, Psuedo-coding,
dask-scheduler 123 & dask-cuda-worker dask-scheduler:123 &
should "just work"dask-scheduler 123 & dask-cuda-worker dask-scheduler:123 & dask-worker dask-scheduler:123
Code should just work (~balanced) for things like
dgdf = ddf.map_partitions(cudf.to_pandas)
, and the reverseUsers can optionally override the defaults at the level of worker config or tasks
"Just working" means:
number of GPU workers (threads or processes) should default to ~= # visible cuda devices <- current
dask-cuda-worker
behaviornumber of CPU workers (threads) should default to ~= # cores <
dask-worker
's default, but notdask-cuda-worker
'sBehind the scenes, if this turns into an equiv of
dask-cuda-worker-existing && dask-worker
, or that's the recommended invocation, so be it. However, I am a bit worried there may be unforeseen buggy / awkward behavior here, like having to use Dataset Publish instead map_partitions for the interopAutomating resource tagging
Separately, here's experimental discussion of using logical resources & annotations, though PyTorch experiences may not carry over to Dask-CPU <> Dask-CUDF. One idea is autotag GPU/CPU workers with # Physical vs # Logical units, and letting app devs use those.
Ex:
From there, code can use annotations based on hard-coded physical or more scale-free / agnostic logical styles
The text was updated successfully, but these errors were encountered: