From 28c841ee47280c6094cd067d6230431c5874f1a9 Mon Sep 17 00:00:00 2001 From: rjzamora Date: Wed, 11 Sep 2024 13:12:55 -0700 Subject: [PATCH] address more code review --- docs/dask_cudf/source/index.rst | 12 ++++++------ python/dask_cudf/README.md | 10 +++++----- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/dask_cudf/source/index.rst b/docs/dask_cudf/source/index.rst index f4442862aef..2a14cafb4eb 100644 --- a/docs/dask_cudf/source/index.rst +++ b/docs/dask_cudf/source/index.rst @@ -16,7 +16,7 @@ as the ``"cudf"`` dataframe backend for Neither Dask cuDF nor Dask DataFrame provide support for multi-GPU or multi-node execution on their own. You must also deploy a `dask.distributed ` cluster - to leverage multiple GPUs. We strongly recommend using `Dask CUDA + to leverage multiple GPUs. We strongly recommend using `Dask-CUDA `__ to simplify the setup of the cluster, taking advantage of all features of the GPU and networking hardware. @@ -63,7 +63,7 @@ For example:: import dask.dataframe as dd - # By default, we obtain a Pandas-backed dataframe + # By default, we obtain a pandas-backed dataframe df = dd.read_parquet("data.parquet", ...) import dask @@ -80,7 +80,7 @@ depend on the inputs to those functions. For example:: import pandas as pd import cudf - # This gives us a Pandas-backed dataframe + # This gives us a pandas-backed dataframe dd.from_pandas(pd.DataFrame({"a": range(10)})) # This gives us a cuDF-backed dataframe @@ -92,7 +92,7 @@ using the :func:`dask.dataframe.DataFrame.to_backend` API:: # This ensures that we have a cuDF-backed dataframe df = df.to_backend("cudf") - # This ensures that we have a Pandas-backed dataframe + # This ensures that we have a pandas-backed dataframe df = df.to_backend("pandas") The explicit Dask cuDF API @@ -156,7 +156,7 @@ out-of-core computing. This also means that the compute tasks can be executed in parallel over a multi-GPU cluster. In order to execute your Dask workflow on multiple GPUs, you will -typically need to use `Dask CUDA `__ +typically need to use `Dask-CUDA `__ to deploy distributed Dask cluster, and `Distributed `__ to define a client object. For example:: @@ -187,7 +187,7 @@ to define a client object. For example:: `__ for more details. -Please see the `Dask CUDA `__ +Please see the `Dask-CUDA `__ documentation for more information about deploying GPU-aware clusters (including `best practices `__). diff --git a/python/dask_cudf/README.md b/python/dask_cudf/README.md index c3a1662729f..4655d2165f0 100644 --- a/python/dask_cudf/README.md +++ b/python/dask_cudf/README.md @@ -3,7 +3,7 @@ Dask cuDF (a.k.a. dask-cudf or `dask_cudf`) is an extension library for [Dask DataFrame](https://docs.dask.org/en/stable/dataframe.html) that provides a Pandas-like API for parallel and larger-than-memory DataFrame computing on GPUs. When installed, Dask cuDF is automatically registered as the `"cudf"` [dataframe backend](https://docs.dask.org/en/stable/how-to/selecting-the-collection-backend.html) for Dask DataFrame. > [!IMPORTANT] -> Dask cuDF does not provide support for multi-GPU or multi-node execution on its own. You must also deploy a distributed cluster (ideally with [Dask CUDA](https://docs.rapids.ai/api/dask-cuda/stable/)) to leverage multiple GPUs efficiently. +> Dask cuDF does not provide support for multi-GPU or multi-node execution on its own. You must also deploy a distributed cluster (ideally with [Dask-CUDA](https://docs.rapids.ai/api/dask-cuda/stable/)) to leverage multiple GPUs efficiently. ## Using Dask cuDF @@ -18,7 +18,7 @@ See the [RAPIDS install page](https://docs.rapids.ai/install) for the most up-to - [Dask cuDF documentation](https://docs.rapids.ai/api/dask-cudf/stable/) - [cuDF documentation](https://docs.rapids.ai/api/cudf/stable/) - [10 Minutes to cuDF and Dask cuDF](https://docs.rapids.ai/api/cudf/stable/user_guide/10min/) -- [Dask CUDA documentation](https://docs.rapids.ai/api/dask-cuda/stable/) +- [Dask-CUDA documentation](https://docs.rapids.ai/api/dask-cuda/stable/) - [Deployment](https://docs.rapids.ai/deployment/stable/) - [RAPIDS Community](https://rapids.ai/learn-more/#get-involved): Get help, contribute, and collaborate. @@ -55,9 +55,9 @@ if __name__ == "__main__": query = df.groupby('item')['price'].mean() # Compute, persist, or write out the result - query.compute() + query.head() ``` -If you do not have multiple GPUs available, using `LocalCUDACluster` is optional. However, it is still a good idea to [enable cudf spilling](https://docs.rapids.ai/api/cudf/stable/developer_guide/library_design/#spilling-to-host-memory). +If you do not have multiple GPUs available, using `LocalCUDACluster` is optional. However, it is still a good idea to [enable cuDF spilling](https://docs.rapids.ai/api/cudf/stable/developer_guide/library_design/#spilling-to-host-memory). -If you wish to scale across multiple nodes, you will need to use a different mechanism to deploy your Dask CUDA workers. Please see [the RAPIDS deployment documentation](https://docs.rapids.ai/deployment/stable/) for more instructions. +If you wish to scale across multiple nodes, you will need to use a different mechanism to deploy your Dask-CUDA workers. Please see [the RAPIDS deployment documentation](https://docs.rapids.ai/deployment/stable/) for more instructions.