Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend memory spilling to multiple storage media #37

Open
pentschev opened this issue Apr 21, 2019 · 11 comments
Open

Extend memory spilling to multiple storage media #37

pentschev opened this issue Apr 21, 2019 · 11 comments

Comments

@pentschev
Copy link
Member

Currently in the works of #35, we will have the capability of spilling CUDA device memory to host, and that to disk. However, as pointed out by @kkraus14 here, it would be beneficial to allow spilling host memory to multiple user-defined storage media.

I think we could follow the same configuration structure of Alluxio, as suggested by @kkraus14. Based on the current structure suggested in #35 (still subject to change), it would look something like the following:

cuda.worker.dirs.path=/mnt/nvme,/mnt/ssd,/mnt/nfs
cuda.worker.dirs.quota=16GB,100GB,1000GB

@mrocklin FYI

@jakirkham
Copy link
Member

One related note for tracking, it would be useful to leverage GPUDirect Storage to allow spilling directly from GPU memory to disk.

@pentschev pentschev added the feature request New feature or request label Jan 8, 2021
@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@jangorecki
Copy link

@pentschev could you link documentation which explains how to set up spilling to disk? I found https://github.com/rapidsai/dask-cuda/pull/51/files but there doesn't seem to be any documentation on new feature.
I want to use dask_cudf to spill from vmem to main mem, and then from main mem to disk, when main mem is not enough. Searching https://docs.rapids.ai/ doesn't provide any answer.

@quasiben
Copy link
Member

@jangorecki
Copy link

This doc doesn't seem to answer my use case.

@pentschev
Copy link
Member Author

Currently, --device-memory-limit/device_memory_limit (dask-cuda-worker/LocalCUDACluster) will spill from device to host, similarly, --memory-limit/memory_limit spills from host to disk just like in mainline Dask, and the spilled data is stored in --local-directory/local_directory. Spilling to disk today is only supported for the default mechanism, JIT spilling still doesn't support it.

@jangorecki
Copy link

jangorecki commented May 27, 2021

@pentschev thank you for reply although it doesn't correspond to my current approach (cu.set_allocator("managed").
AFAIU to use it with dask I should have

client = Client(cluster)
client.run(cu.set_allocator, "managed")

Is this going to handle spilling vmem->mem->disk?
I don't want to change default limits of memory, but only enable spilling.

@pentschev
Copy link
Member Author

No, managed memory is handled by the CUDA driver, we have no control over how it handles spilling and it doesn't support any spilling to disk whatsoever. Within Dask, you can enable spilling as I mentioned above, it doesn't make use of managed memory and thus is not as performant, but it will allow Dask to spill Python memory (i.e., Dask array/dataframes chunks), but it also has no control over the memory that's handled internally by libraries such as cuDF.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

4 participants