Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate dask-cudf README improvements to dask-cudf sphinx docs #16765

Merged
merged 17 commits into from
Sep 16, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
address more code review
  • Loading branch information
rjzamora committed Sep 11, 2024
commit 28c841ee47280c6094cd067d6230431c5874f1a9
12 changes: 6 additions & 6 deletions docs/dask_cudf/source/index.rst
Original file line number Diff line number Diff line change
@@ -16,7 +16,7 @@ as the ``"cudf"`` dataframe backend for
Neither Dask cuDF nor Dask DataFrame provide support for multi-GPU
or multi-node execution on their own. You must also deploy a
`dask.distributed <https://distributed.dask.org/en/stable/>` cluster
to leverage multiple GPUs. We strongly recommend using `Dask CUDA
to leverage multiple GPUs. We strongly recommend using `Dask-CUDA
<https://docs.rapids.ai/api/dask-cuda/stable/>`__ to simplify the
setup of the cluster, taking advantage of all features of the GPU
and networking hardware.
@@ -63,7 +63,7 @@ For example::

import dask.dataframe as dd

# By default, we obtain a Pandas-backed dataframe
# By default, we obtain a pandas-backed dataframe
df = dd.read_parquet("data.parquet", ...)

import dask
@@ -80,7 +80,7 @@ depend on the inputs to those functions. For example::
import pandas as pd
import cudf

# This gives us a Pandas-backed dataframe
# This gives us a pandas-backed dataframe
dd.from_pandas(pd.DataFrame({"a": range(10)}))

# This gives us a cuDF-backed dataframe
@@ -92,7 +92,7 @@ using the :func:`dask.dataframe.DataFrame.to_backend` API::
# This ensures that we have a cuDF-backed dataframe
df = df.to_backend("cudf")

# This ensures that we have a Pandas-backed dataframe
# This ensures that we have a pandas-backed dataframe
df = df.to_backend("pandas")

The explicit Dask cuDF API
@@ -156,7 +156,7 @@ out-of-core computing. This also means that the compute tasks can be
executed in parallel over a multi-GPU cluster.

In order to execute your Dask workflow on multiple GPUs, you will
typically need to use `Dask CUDA <https://docs.rapids.ai/api/dask-cuda/stable/>`__
typically need to use `Dask-CUDA <https://docs.rapids.ai/api/dask-cuda/stable/>`__
to deploy distributed Dask cluster, and
`Distributed <https://distributed.dask.org/en/stable/client.html>`__
to define a client object. For example::
@@ -187,7 +187,7 @@ to define a client object. For example::
<https://distributed.dask.org/en/stable/manage-computation.html>`__
for more details.

Please see the `Dask CUDA <https://docs.rapids.ai/api/dask-cuda/stable/>`__
Please see the `Dask-CUDA <https://docs.rapids.ai/api/dask-cuda/stable/>`__
documentation for more information about deploying GPU-aware clusters
(including `best practices
<https://docs.rapids.ai/api/dask-cuda/stable/examples/best-practices/>`__).
10 changes: 5 additions & 5 deletions python/dask_cudf/README.md
Original file line number Diff line number Diff line change
@@ -3,7 +3,7 @@
Dask cuDF (a.k.a. dask-cudf or `dask_cudf`) is an extension library for [Dask DataFrame](https://docs.dask.org/en/stable/dataframe.html) that provides a Pandas-like API for parallel and larger-than-memory DataFrame computing on GPUs. When installed, Dask cuDF is automatically registered as the `"cudf"` [dataframe backend](https://docs.dask.org/en/stable/how-to/selecting-the-collection-backend.html) for Dask DataFrame.
rjzamora marked this conversation as resolved.
Show resolved Hide resolved

> [!IMPORTANT]
> Dask cuDF does not provide support for multi-GPU or multi-node execution on its own. You must also deploy a distributed cluster (ideally with [Dask CUDA](https://docs.rapids.ai/api/dask-cuda/stable/)) to leverage multiple GPUs efficiently.
> Dask cuDF does not provide support for multi-GPU or multi-node execution on its own. You must also deploy a distributed cluster (ideally with [Dask-CUDA](https://docs.rapids.ai/api/dask-cuda/stable/)) to leverage multiple GPUs efficiently.

## Using Dask cuDF

@@ -18,7 +18,7 @@ See the [RAPIDS install page](https://docs.rapids.ai/install) for the most up-to
- [Dask cuDF documentation](https://docs.rapids.ai/api/dask-cudf/stable/)
- [cuDF documentation](https://docs.rapids.ai/api/cudf/stable/)
- [10 Minutes to cuDF and Dask cuDF](https://docs.rapids.ai/api/cudf/stable/user_guide/10min/)
- [Dask CUDA documentation](https://docs.rapids.ai/api/dask-cuda/stable/)
- [Dask-CUDA documentation](https://docs.rapids.ai/api/dask-cuda/stable/)
- [Deployment](https://docs.rapids.ai/deployment/stable/)
- [RAPIDS Community](https://rapids.ai/learn-more/#get-involved): Get help, contribute, and collaborate.

@@ -55,9 +55,9 @@ if __name__ == "__main__":
query = df.groupby('item')['price'].mean()

# Compute, persist, or write out the result
query.compute()
query.head()
```

If you do not have multiple GPUs available, using `LocalCUDACluster` is optional. However, it is still a good idea to [enable cudf spilling](https://docs.rapids.ai/api/cudf/stable/developer_guide/library_design/#spilling-to-host-memory).
If you do not have multiple GPUs available, using `LocalCUDACluster` is optional. However, it is still a good idea to [enable cuDF spilling](https://docs.rapids.ai/api/cudf/stable/developer_guide/library_design/#spilling-to-host-memory).

If you wish to scale across multiple nodes, you will need to use a different mechanism to deploy your Dask CUDA workers. Please see [the RAPIDS deployment documentation](https://docs.rapids.ai/deployment/stable/) for more instructions.
If you wish to scale across multiple nodes, you will need to use a different mechanism to deploy your Dask-CUDA workers. Please see [the RAPIDS deployment documentation](https://docs.rapids.ai/deployment/stable/) for more instructions.
Loading