You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we use Dask to manage MNMG processing throughout RAPIDS/cuGraph, but this causes some issues when integrating with PyTorch DDP workflows.
The biggest issue is that RMM pools can't be shared across processes, which means when using pools, each Dask worker would have a separate pool from the corresponding DDP worker on the same GPU. This wastes a significant amount of memory, and so in most cases, we don't use pools.
Another issue is that the semantics of Dask make managing loaders awkward. Originally, this wasn't supposed to be the case, but due to #4089 , we can't have multiple loaders calling uniform_neighbor_sample. And furthermore, even if that did work, it is a poor use of GPU resources when we should be combining all of those loader calls.
Finally, managing Dask and DDP in the same workflow is complicated and challenging. Other teams have complained that this makes the examples difficult to understand for new users, or even unreadable to someone unfamiliar with Dask. For MNMG workflows, we have to start Dask in a separate process which clashes with the PyTorch way of doing things.
There are also a number of new planned features (i.e. overlapping sampling/loading) that would be simpler to implement without Dask.
This path would not be unique in RAPIDS; WholeGraph already does something similar, relying on DDP as the process manager, and using its own RAFT/NCCL comms. cuGraph should be able to leverage RAFT and PyLibcuGraph to do the same.
The text was updated successfully, but these errors were encountered:
Currently, we use Dask to manage MNMG processing throughout RAPIDS/cuGraph, but this causes some issues when integrating with PyTorch DDP workflows.
The biggest issue is that RMM pools can't be shared across processes, which means when using pools, each Dask worker would have a separate pool from the corresponding DDP worker on the same GPU. This wastes a significant amount of memory, and so in most cases, we don't use pools.
Another issue is that the semantics of Dask make managing loaders awkward. Originally, this wasn't supposed to be the case, but due to #4089 , we can't have multiple loaders calling
uniform_neighbor_sample
. And furthermore, even if that did work, it is a poor use of GPU resources when we should be combining all of those loader calls.Finally, managing Dask and DDP in the same workflow is complicated and challenging. Other teams have complained that this makes the examples difficult to understand for new users, or even unreadable to someone unfamiliar with Dask. For MNMG workflows, we have to start Dask in a separate process which clashes with the PyTorch way of doing things.
There are also a number of new planned features (i.e. overlapping sampling/loading) that would be simpler to implement without Dask.
This path would not be unique in RAPIDS; WholeGraph already does something similar, relying on DDP as the process manager, and using its own RAFT/NCCL comms. cuGraph should be able to leverage RAFT and PyLibcuGraph to do the same.
The text was updated successfully, but these errors were encountered: