Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gh pages branch 21.10 #3902

Merged
merged 9 commits into from
Oct 29, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 52 additions & 13 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: page
title: Frequently Asked Questions
nav_order: 11
nav_order: 12
---
# Frequently Asked Questions

Expand All @@ -10,9 +10,9 @@ nav_order: 11

### What versions of Apache Spark does the RAPIDS Accelerator for Apache Spark support?

The RAPIDS Accelerator for Apache Spark requires version 3.0.1, 3.0.2, 3.1.1 or 3.1.2 of Apache
Spark. Because the plugin replaces parts of the physical plan that Apache Spark considers to be
internal the code for those plans can change even between bug fix releases. As a part of our
The RAPIDS Accelerator for Apache Spark requires version 3.0.1, 3.0.2, 3.0.3, 3.1.1, 3.1.2 or 3.2.0 of
Apache Spark. Because the plugin replaces parts of the physical plan that Apache Spark considers to
be internal the code for those plans can change even between bug fix releases. As a part of our
process, we try to stay on top of these changes and release updates as quickly as possible.

### Which distributions are supported?
Expand All @@ -30,15 +30,15 @@ to set up testing and validation on their distributions.

### What CUDA versions are supported?

CUDA 11.0 and 11.2 are currently supported. Please look [here](download.md) for download links for
the latest release.
CUDA 11.x is currently supported. Please look [here](download.md) for download links for the latest
release.

### What hardware is supported?

The plugin is tested and supported on V100, T4, A10, A30 and A100 datacenter GPUs. It is possible
to run the plugin on GeForce desktop hardware with Volta or better architectures. GeForce hardware
does not support [CUDA enhanced
compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html#enhanced-compat-minor-releases),
does not support [CUDA forward
compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html#forward-compatibility-title),
and will need CUDA 11.2 installed. If not, the following error will be displayed:

```
Expand All @@ -47,6 +47,9 @@ ai.rapids.cudf.CudaException: forward compatibility was attempted on non support
at com.nvidia.spark.rapids.GpuDeviceManager$.findGpuAndAcquire(GpuDeviceManager.scala:78)
```

More information about cards that support forward compatibility can be found
[here](https://docs.nvidia.com/deploy/cuda-compatibility/index.html#faq).

### How can I check if the RAPIDS Accelerator is installed and which version is running?

On startup the RAPIDS Accelerator will log a warning message on the Spark driver showing the
Expand Down Expand Up @@ -120,6 +123,17 @@ starts. If you are only going to run a single query that only takes a few second
be problematic. In general if you are going to do 30 seconds or more of processing within a single
session the overhead can be amortized.

### How long does it take to translate a query to run on the GPU?

The time it takes to translate the Apache Spark physical plan to one that can run on the GPU
is proportional to the size of the plan. But, it also depends on the CPU you are
running on and if the JVM has optimized that code path yet. The first queries run in a client will
be worse than later queries. Small queries can typically be translated in a millisecond or two while
larger queries can take tens of milliseconds. In all cases tested the translation time is orders of
magnitude smaller than the total runtime of the query.

See the entry on [explain](#explain) for details on how to measure this for your queries.

### How can I tell what will run on the GPU and what will not run on it?
<a name="explain"></a>

Expand Down Expand Up @@ -166,10 +180,27 @@ In this `indicator` is one of the following
* will not run on the GPU with an explanation why
* will be removed from the plan with a reason why

Generally if an operator is not compatible with Spark for some reason and is off the explanation
Generally if an operator is not compatible with Spark for some reason and is off, the explanation
will include information about how it is incompatible and what configs to set to enable the
operator if you can accept the incompatibility.

These messages are logged at the WARN level so even in `spark-shell` which by default only logs
at WARN or above you should see these messages.

This translation takes place in two steps. The first step looks at the plan, figures out what
can be translated to the GPU, and then does the translation. The second step optimizes the
transitions between the CPU and the GPU.
Explain will also log how long these translations took at the INFO level with lines like.

```
INFO GpuOverrides: Plan conversion to the GPU took 3.13 ms
INFO GpuOverrides: GPU plan transition optimization took 1.66 ms
```

Because it is at the INFO level, the default logging level for `spark-shell` is not going to display
this information. If you want to monitor this number for your queries you might need to adjust your
logging configuration.

### Why does the plan for the GPU query look different from the CPU query?

Typically, there is a one to one mapping between CPU stages in a plan and GPU stages. There are a
Expand Down Expand Up @@ -228,12 +259,18 @@ efficient to stay on the CPU instead of going back and forth.

Yes, DPP still works. It might not be as efficient as it could be, and we are working to improve it.

DPP is not supported on Databricks with the plugin.
Queries on Databricks will not fail but it can not benefit from DPP.

### Is Adaptive Query Execution (AQE) Supported?

In the 0.2 release, AQE is supported but all exchanges will default to the CPU. As of the 0.3
release, running on Spark 3.0.1 and higher any operation that is supported on GPU will now stay on
the GPU when AQE is enabled.

AQE is not supported on Databricks with the plugin.
If AQE is enabled on Databricks, queries may fail with `StackOverflowError` error.

#### Why does my query show as not on the GPU when Adaptive Query Execution is enabled?

When running an `explain()` on a query where AQE is on, it is possible that AQE has not finalized
Expand All @@ -250,13 +287,15 @@ AdaptiveSparkPlan isFinalPlan=false

### Are cache and persist supported?

Yes cache and persist are supported, but they are not GPU accelerated yet. We are working with
the Spark community on changes that would allow us to accelerate compression when caching data.
Yes cache and persist are supported, the cache is GPU accelerated
but still stored on the host memory.
Please refer to [RAPIDS Cache Serializer](./additional-functionality/cache-serializer.md)
for more details.

### Can I cache data into GPU memory?

No, that is not currently supported. It would require much larger changes to Apache Spark to be able
to support this.
No, that is not currently supported.
It would require much larger changes to Apache Spark to be able to support this.

### Is PySpark supported?

Expand Down
2 changes: 1 addition & 1 deletion docs/additional-functionality/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: page
title: Additional Functionality
nav_order: 9
nav_order: 10
has_children: true
permalink: /additional-functionality/
---
12 changes: 4 additions & 8 deletions docs/additional-functionality/cache-serializer.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,21 +29,17 @@ nav_order: 2
`spark.sql.inMemoryColumnarStorage.enableVectorizedReader` will not be honored as the GPU
data is always read in as columnar. If `spark.rapids.sql.enabled` is set to false
the cached objects will still be compressed on the CPU as a part of the caching process.

Please note that ParquetCachedBatchSerializer doesn't support negative decimal scale, so if
`spark.sql.legacy.allowNegativeScaleOfDecimal` is set to true ParquetCachedBatchSerializer
should not be used. Using the serializer with negative decimal scales will generate
an error at runtime.

To use this serializer please run Spark with the following conf.
```
spark-shell --conf spark.sql.cache.serializer=com.nvidia.spark.rapids.shims.spark311.ParquetCachedBatchSerializer"
spark-shell --conf spark-shell --conf spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer
viadea marked this conversation as resolved.
Show resolved Hide resolved
```


## Supported Types

All types are supported on the CPU, on the GPU, ArrayType, MapType and BinaryType are not
supported. If an unsupported type is encountered the Rapids Accelerator for Apache Spark will fall
All types are supported on the CPU.
On the GPU, MapType and BinaryType are not supported.
If an unsupported type is encountered the Rapids Accelerator for Apache Spark will fall
back to using the CPU for caching.

51 changes: 30 additions & 21 deletions docs/additional-functionality/rapids-shuffle.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,13 @@ in these scenarios:
- GPU-to-GPU: Shuffle blocks that were able to fit in GPU memory.
- Host-to-GPU and Disk-to-GPU: Shuffle blocks that spilled to host (or disk) but will be manifested
in the GPU in the downstream Spark task.

The RAPIDS Shuffle Manager uses the `spark.shuffle.manager` plugin interface in Spark and it relies
on fast connections between executors, where shuffle data is kept in a cache backed by GPU, host, or disk.
As such, it doesn't implement functionality to interact with the External Shuffle Service (ESS).
To enable the RAPIDS Shuffle Manager, users need to disable ESS using `spark.shuffle.service.enabled=false`.
Note that Spark's Dynamic Allocation feature requires ESS to be configured, and must also be
disabled with `spark.dynamicAllocation.enabled=false`.

### System Setup

Expand All @@ -36,7 +43,7 @@ be installed on the host and inside Docker containers (if not baremetal). A host
requirements, like the MLNX_OFED driver and `nv_peer_mem` kernel module.

The minimum UCX requirement for the RAPIDS Shuffle Manager is
[UCX 1.11.0](https://github.com/openucx/ucx/releases/tag/v1.11.0).
[UCX 1.11.2](https://github.com/openucx/ucx/releases/tag/v1.11.2).

#### Baremetal

Expand Down Expand Up @@ -66,9 +73,9 @@ The minimum UCX requirement for the RAPIDS Shuffle Manager is
further.

2. Fetch and install the UCX package for your OS from:
[UCX 1.11.0](https://github.com/openucx/ucx/releases/tag/v1.11.0).
[UCX 1.11.2](https://github.com/openucx/ucx/releases/tag/v1.11.2).

NOTE: Please install the artifact with the newest CUDA 11.x version (for UCX 1.11.0 please
NOTE: Please install the artifact with the newest CUDA 11.x version (for UCX 1.11.2 please
pick CUDA 11.2) as CUDA 11 introduced [CUDA Enhanced Compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html#enhanced-compat-minor-releases).
Starting with UCX 1.12, UCX will stop publishing individual artifacts for each minor version of CUDA.

Expand All @@ -78,35 +85,35 @@ The minimum UCX requirement for the RAPIDS Shuffle Manager is
RDMA packages have extra requirements that should be satisfied by MLNX_OFED.

##### CentOS UCX RPM
The UCX packages for CentOS 7 and 8 are divided into different RPMs. For example, UCX 1.11.0
The UCX packages for CentOS 7 and 8 are divided into different RPMs. For example, UCX 1.11.2
available at
https://github.com/openucx/ucx/releases/download/v1.11.0/ucx-v1.11.0-centos7-mofed5.x-cuda11.2.tar.bz2
https://github.com/openucx/ucx/releases/download/v1.11.2/ucx-v1.11.2-centos7-mofed5.x-cuda11.2.tar.bz2
contains:

```
ucx-devel-1.11.0-1.el7.x86_64.rpm
ucx-debuginfo-1.11.0-1.el7.x86_64.rpm
ucx-1.11.0-1.el7.x86_64.rpm
ucx-cuda-1.11.0-1.el7.x86_64.rpm
ucx-rdmacm-1.11.0-1.el7.x86_64.rpm
ucx-cma-1.11.0-1.el7.x86_64.rpm
ucx-ib-1.11.0-1.el7.x86_64.rpm
ucx-devel-1.11.2-1.el7.x86_64.rpm
ucx-debuginfo-1.11.2-1.el7.x86_64.rpm
ucx-1.11.2-1.el7.x86_64.rpm
ucx-cuda-1.11.2-1.el7.x86_64.rpm
ucx-rdmacm-1.11.2-1.el7.x86_64.rpm
ucx-cma-1.11.2-1.el7.x86_64.rpm
ucx-ib-1.11.2-1.el7.x86_64.rpm
```

For a setup without RoCE or Infiniband networking, the only packages required are:

```
ucx-1.11.0-1.el7.x86_64.rpm
ucx-cuda-1.11.0-1.el7.x86_64.rpm
ucx-1.11.2-1.el7.x86_64.rpm
ucx-cuda-1.11.2-1.el7.x86_64.rpm
```

If accelerated networking is available, the package list is:

```
ucx-1.11.0-1.el7.x86_64.rpm
ucx-cuda-1.11.0-1.el7.x86_64.rpm
ucx-rdmacm-1.11.0-1.el7.x86_64.rpm
ucx-ib-1.11.0-1.el7.x86_64.rpm
ucx-1.11.2-1.el7.x86_64.rpm
ucx-cuda-1.11.2-1.el7.x86_64.rpm
ucx-rdmacm-1.11.2-1.el7.x86_64.rpm
ucx-ib-1.11.2-1.el7.x86_64.rpm
```

---
Expand Down Expand Up @@ -145,7 +152,7 @@ system if you have RDMA capable hardware.
Within the Docker container we need to install UCX and its requirements. These are Dockerfile
examples for Ubuntu 18.04:

The following are examples of Docker containers with UCX 1.11.0 and cuda-11.2 support.
The following are examples of Docker containers with UCX 1.11.2 and cuda-11.2 support.

| OS Type | RDMA | Dockerfile |
| ------- | ---- | ---------- |
Expand Down Expand Up @@ -281,7 +288,6 @@ In this section, we are using a docker container built using the sample dockerfi
| Spark Shim | spark.shuffle.manager value |
| -----------| -------------------------------------------------------- |
| 3.0.1 | com.nvidia.spark.rapids.spark301.RapidsShuffleManager |
| 3.0.1 EMR | com.nvidia.spark.rapids.spark301emr.RapidsShuffleManager |
| 3.0.2 | com.nvidia.spark.rapids.spark302.RapidsShuffleManager |
| 3.0.3 | com.nvidia.spark.rapids.spark303.RapidsShuffleManager |
| 3.0.4 | com.nvidia.spark.rapids.spark304.RapidsShuffleManager |
Expand All @@ -290,7 +296,7 @@ In this section, we are using a docker container built using the sample dockerfi
| 3.1.2 | com.nvidia.spark.rapids.spark312.RapidsShuffleManager |
| 3.1.3 | com.nvidia.spark.rapids.spark313.RapidsShuffleManager |

2. Settings for UCX 1.11.0+:
2. Settings for UCX 1.11.2+:

Minimum configuration:

Expand Down Expand Up @@ -326,6 +332,9 @@ Apache Spark 3.1.3 is: `com.nvidia.spark.rapids.spark313.RapidsShuffleManager`.
Please note `LD_LIBRARY_PATH` should optionally be set if the UCX library is installed in a
non-standard location.

With the RAPIDS Shuffle Manager configured, the setting `spark.rapids.shuffle.enabled` (default on)
can be used to enable or disable the usage of RAPIDS Shuffle Manager during your application.

#### UCX Environment Variables
- `UCX_TLS`:
- `cuda_copy`, and `cuda_ipc`: enables handling of CUDA memory in UCX, both for copy-based transport
Expand Down
20 changes: 10 additions & 10 deletions docs/additional-functionality/rapids-udfs.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,38 +139,38 @@ in the [udf-examples](../../udf-examples) project.

### Spark Scala UDF Examples

- [URLDecode](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLDecode.scala)
- [URLDecode](../../udf-examples/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLDecode.scala)
decodes URL-encoded strings using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
- [URLEncode](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLEncode.scala)
- [URLEncode](../../udf-examples/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLEncode.scala)
URL-encodes strings using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)

### Spark Java UDF Examples

- [URLDecode](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/URLDecode.java)
- [URLDecode](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/URLDecode.java)
decodes URL-encoded strings using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
- [URLEncode](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/URLEncode.java)
- [URLEncode](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/URLEncode.java)
URL-encodes strings using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
- [CosineSimilarity](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/CosineSimilarity.java)
- [CosineSimilarity](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/CosineSimilarity.java)
computes the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity)
between two float vectors using [native code](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/cpp/src)
between two float vectors using [native code](../../udf-examples/src/main/cpp/src)

### Hive UDF Examples

- [URLDecode](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/URLDecode.java)
- [URLDecode](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/URLDecode.java)
implements a Hive simple UDF using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
to decode URL-encoded strings
- [URLEncode](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/URLEncode.java)
- [URLEncode](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/URLEncode.java)
implements a Hive generic UDF using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
to URL-encode strings
- [StringWordCount](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java)
- [StringWordCount](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java)
implements a Hive simple UDF using
[native code](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/cpp/src) to count words in strings
[native code](../../udf-examples/src/main/cpp/src) to count words in strings


## GPU Support for Pandas UDF
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,16 @@
# See: https://github.com/openucx/ucx/releases/

ARG CUDA_VER=11.2.2
ARG UCX_VER=v1.11.0
ARG UCX_VER=1.11.2
ARG UCX_CUDA_VER=11.2

FROM nvidia/cuda:${CUDA_VER}-runtime-centos7
ARG UCX_VER
ARG UCX_CUDA_VER

RUN yum update -y && yum install -y wget bzip2
RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/$UCX_VER/ucx-$UCX_VER-centos7-mofed5.x-cuda$UCX_CUDA_VER.tar.bz2
RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-v$UCX_VER-centos7-mofed5.x-cuda$UCX_CUDA_VER.tar.bz2
RUN cd /tmp && tar -xvf *.bz2 && \
yum install -y ucx-1.11.0-1.el7.x86_64.rpm && \
yum install -y ucx-cuda-1.11.0-1.el7.x86_64.rpm && \
yum install -y ucx-$UCX_VER-1.el7.x86_64.rpm && \
yum install -y ucx-cuda-$UCX_VER-1.el7.x86_64.rpm && \
rm -rf /tmp/*.rpm
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

ARG RDMA_CORE_VERSION=32.1
ARG CUDA_VER=11.2.2
ARG UCX_VER=v1.11.0
ARG UCX_VER=1.11.2
ARG UCX_CUDA_VER=11.2

# Throw away image to build rdma_core
Expand Down Expand Up @@ -59,12 +59,12 @@ COPY --from=rdma_core /tmp/*.rpm /tmp/

RUN yum update -y
RUN yum install -y wget bzip2
RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/$UCX_VER/ucx-$UCX_VER-centos7-mofed5.x-cuda$UCX_CUDA_VER.tar.bz2
RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-v$UCX_VER-centos7-mofed5.x-cuda$UCX_CUDA_VER.tar.bz2
RUN cd /tmp && \
yum install -y *.rpm && \
tar -xvf *.bz2 && \
yum install -y ucx-1.11.0-1.el7.x86_64.rpm && \
yum install -y ucx-cuda-1.11.0-1.el7.x86_64.rpm && \
yum install -y ucx-ib-1.11.0-1.el7.x86_64.rpm && \
yum install -y ucx-rdmacm-1.11.0-1.el7.x86_64.rpm
yum install -y ucx-$UCX_VER-1.el7.x86_64.rpm && \
yum install -y ucx-cuda-$UCX_VER-1.el7.x86_64.rpm && \
yum install -y ucx-ib-$UCX_VER-1.el7.x86_64.rpm && \
yum install -y ucx-rdmacm-$UCX_VER-1.el7.x86_64.rpm
RUN rm -rf /tmp/*.rpm && rm /tmp/*.bz2
Loading