Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build proof of concept of multi-node join computation on Kubernetes #7

Open
mrocklin opened this issue Feb 8, 2019 · 5 comments
Open

Comments

@mrocklin
Copy link
Contributor

mrocklin commented Feb 8, 2019

It would be useful for the RAPIDS effort to have a multi-node join computation deployed from Kubernetes. Until UCX arrives this will likely be slow, but we can probably work on deployment and configuration issues in the meantime.

I suspect that this involves the following steps:

  1. Obtain access to a Kubernetes cluster with GPUs
  2. Use either dask-kubernetes or the Dask helm chart to deploy Dask workers onto that cluster, doing whatever is necessary to specify GPUs in the pod specification
  3. Run a computation similar to https://blog.dask.org/2019/01/29/cudf-joins , but presumably larger in scale
  4. Quantify the computational costs, possibly using the profile and task_stream diagnostic utilities from the client to capture information

I suspect that in going through this effort manually that we will expose a number of small issues that we'll then have to fix

@mrocklin
Copy link
Contributor Author

@beberg do you have any interest in doing this? You could adapt this notebook, which handles all of the dask-cudf things: https://gist.github.com/mrocklin/ab10c61a17391e8dbc7577f83fc4d25d

You would have to swap out LocalCUDACluster for some other solution, either Helm or Dask-Kubernetes, and then increase the size of the dataframes.

@pentschev
Copy link
Member

@jacobtomlinson I know you've been doing a lot of deployment-related work. I believe this is already covered, at least partially. Could you check if there's something that's still worth covering or already planned?

@jacobtomlinson
Copy link
Member

This should all work today. It would be worthwhile to run through though.

@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

3 participants