Skip to content

Commit

Permalink
address more code review
Browse files Browse the repository at this point in the history
  • Loading branch information
rjzamora committed Sep 11, 2024
1 parent 4c41a51 commit 28c841e
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 11 deletions.
12 changes: 6 additions & 6 deletions docs/dask_cudf/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ as the ``"cudf"`` dataframe backend for
Neither Dask cuDF nor Dask DataFrame provide support for multi-GPU
or multi-node execution on their own. You must also deploy a
`dask.distributed <https://distributed.dask.org/en/stable/>` cluster
to leverage multiple GPUs. We strongly recommend using `Dask CUDA
to leverage multiple GPUs. We strongly recommend using `Dask-CUDA
<https://docs.rapids.ai/api/dask-cuda/stable/>`__ to simplify the
setup of the cluster, taking advantage of all features of the GPU
and networking hardware.
Expand Down Expand Up @@ -63,7 +63,7 @@ For example::

import dask.dataframe as dd

# By default, we obtain a Pandas-backed dataframe
# By default, we obtain a pandas-backed dataframe
df = dd.read_parquet("data.parquet", ...)

import dask
Expand All @@ -80,7 +80,7 @@ depend on the inputs to those functions. For example::
import pandas as pd
import cudf

# This gives us a Pandas-backed dataframe
# This gives us a pandas-backed dataframe
dd.from_pandas(pd.DataFrame({"a": range(10)}))

# This gives us a cuDF-backed dataframe
Expand All @@ -92,7 +92,7 @@ using the :func:`dask.dataframe.DataFrame.to_backend` API::
# This ensures that we have a cuDF-backed dataframe
df = df.to_backend("cudf")

# This ensures that we have a Pandas-backed dataframe
# This ensures that we have a pandas-backed dataframe
df = df.to_backend("pandas")

The explicit Dask cuDF API
Expand Down Expand Up @@ -156,7 +156,7 @@ out-of-core computing. This also means that the compute tasks can be
executed in parallel over a multi-GPU cluster.

In order to execute your Dask workflow on multiple GPUs, you will
typically need to use `Dask CUDA <https://docs.rapids.ai/api/dask-cuda/stable/>`__
typically need to use `Dask-CUDA <https://docs.rapids.ai/api/dask-cuda/stable/>`__
to deploy distributed Dask cluster, and
`Distributed <https://distributed.dask.org/en/stable/client.html>`__
to define a client object. For example::
Expand Down Expand Up @@ -187,7 +187,7 @@ to define a client object. For example::
<https://distributed.dask.org/en/stable/manage-computation.html>`__
for more details.

Please see the `Dask CUDA <https://docs.rapids.ai/api/dask-cuda/stable/>`__
Please see the `Dask-CUDA <https://docs.rapids.ai/api/dask-cuda/stable/>`__
documentation for more information about deploying GPU-aware clusters
(including `best practices
<https://docs.rapids.ai/api/dask-cuda/stable/examples/best-practices/>`__).
Expand Down
10 changes: 5 additions & 5 deletions python/dask_cudf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Dask cuDF (a.k.a. dask-cudf or `dask_cudf`) is an extension library for [Dask DataFrame](https://docs.dask.org/en/stable/dataframe.html) that provides a Pandas-like API for parallel and larger-than-memory DataFrame computing on GPUs. When installed, Dask cuDF is automatically registered as the `"cudf"` [dataframe backend](https://docs.dask.org/en/stable/how-to/selecting-the-collection-backend.html) for Dask DataFrame.

> [!IMPORTANT]
> Dask cuDF does not provide support for multi-GPU or multi-node execution on its own. You must also deploy a distributed cluster (ideally with [Dask CUDA](https://docs.rapids.ai/api/dask-cuda/stable/)) to leverage multiple GPUs efficiently.
> Dask cuDF does not provide support for multi-GPU or multi-node execution on its own. You must also deploy a distributed cluster (ideally with [Dask-CUDA](https://docs.rapids.ai/api/dask-cuda/stable/)) to leverage multiple GPUs efficiently.
## Using Dask cuDF

Expand All @@ -18,7 +18,7 @@ See the [RAPIDS install page](https://docs.rapids.ai/install) for the most up-to
- [Dask cuDF documentation](https://docs.rapids.ai/api/dask-cudf/stable/)
- [cuDF documentation](https://docs.rapids.ai/api/cudf/stable/)
- [10 Minutes to cuDF and Dask cuDF](https://docs.rapids.ai/api/cudf/stable/user_guide/10min/)
- [Dask CUDA documentation](https://docs.rapids.ai/api/dask-cuda/stable/)
- [Dask-CUDA documentation](https://docs.rapids.ai/api/dask-cuda/stable/)
- [Deployment](https://docs.rapids.ai/deployment/stable/)
- [RAPIDS Community](https://rapids.ai/learn-more/#get-involved): Get help, contribute, and collaborate.

Expand Down Expand Up @@ -55,9 +55,9 @@ if __name__ == "__main__":
query = df.groupby('item')['price'].mean()

# Compute, persist, or write out the result
query.compute()
query.head()
```

If you do not have multiple GPUs available, using `LocalCUDACluster` is optional. However, it is still a good idea to [enable cudf spilling](https://docs.rapids.ai/api/cudf/stable/developer_guide/library_design/#spilling-to-host-memory).
If you do not have multiple GPUs available, using `LocalCUDACluster` is optional. However, it is still a good idea to [enable cuDF spilling](https://docs.rapids.ai/api/cudf/stable/developer_guide/library_design/#spilling-to-host-memory).

If you wish to scale across multiple nodes, you will need to use a different mechanism to deploy your Dask CUDA workers. Please see [the RAPIDS deployment documentation](https://docs.rapids.ai/deployment/stable/) for more instructions.
If you wish to scale across multiple nodes, you will need to use a different mechanism to deploy your Dask-CUDA workers. Please see [the RAPIDS deployment documentation](https://docs.rapids.ai/deployment/stable/) for more instructions.

0 comments on commit 28c841e

Please sign in to comment.