Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc updated for v2110[skip ci] #3906

Merged
merged 5 commits into from
Oct 26, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ process, we try to stay on top of these changes and release updates as quickly a

The RAPIDS Accelerator for Apache Spark officially supports:
- [Apache Spark](get-started/getting-started-on-prem.md)
- [AWS EMR 6.2.0, 6.3.0](get-started/getting-started-aws-emr.md)
- [AWS EMR 6.2+](get-started/getting-started-aws-emr.md)
- [Databricks Runtime 7.3, 8.2](get-started/getting-started-databricks.md)
- [Google Cloud Dataproc 2.0](get-started/getting-started-gcp.md)

Expand All @@ -35,7 +35,7 @@ release.

### What hardware is supported?

The plugin is tested and supported on V100, T4, A10, A30 and A100 datacenter GPUs. It is possible
The plugin is tested and supported on V100, T4, A2, A10, A30 and A100 datacenter GPUs. It is possible
to run the plugin on GeForce desktop hardware with Volta or better architectures. GeForce hardware
does not support [CUDA forward
compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html#forward-compatibility-title),
Expand Down
32 changes: 16 additions & 16 deletions docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@ Hardware Requirements:

The plugin is tested on the following architectures:

GPU Architecture: NVIDIA V100, T4 and A10/A30/A100 GPUs
GPU Architecture: NVIDIA V100, T4 and A2/A10/A30/A100 GPUs

Software Requirements:

OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS 8

CUDA & Nvidia Drivers*: 11.0-11.4 & v450.80.02+
CUDA & NVIDIA Drivers*: 11.0-11.4 & v450.80.02+

Apache Spark 3.0.1, 3.0.2, 3.0.3, 3.1.1, 3.1.2, 3.2.0, Cloudera CDP 7.1.6, 7.1.7, Databricks 7.3 ML LTS or 8.2 ML Runtime, and GCP Dataproc 2.0

Expand All @@ -47,7 +47,7 @@ for your hardware's minimum driver version.

This package is built against CUDA 11.2 and has [CUDA forward
compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html) enabled. It is tested
on V100, T4, A30 and A100 GPUs with CUDA 11.0-11.4. For those using other types of GPUs which
on V100, T4, A2, A10, A30 and A100 GPUs with CUDA 11.0-11.4. For those using other types of GPUs which
do not have CUDA forward compatibility (for example, GeForce), CUDA 11.2 is required. Users will
need to ensure the minimum driver (450.80.02) and CUDA toolkit are installed on each Spark node.

Expand Down Expand Up @@ -87,7 +87,7 @@ Software Requirements:

OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS 8

CUDA & Nvidia Drivers*: 11.0-11.4 & v450.80.02+
CUDA & NVIDIA Drivers*: 11.0-11.4 & v450.80.02+

Apache Spark 3.0.1, 3.0.2, 3.0.3, 3.1.1, 3.1.2, Cloudera CDP 7.1.6, 7.1.7, Databricks 7.3 ML LTS or 8.2 ML Runtime, and GCP Dataproc 2.0

Expand Down Expand Up @@ -142,7 +142,7 @@ Software Requirements:

OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS 8

CUDA & Nvidia Drivers*: 11.0 or 11.2 & v450.80.02+
CUDA & NVIDIA Drivers*: 11.0 or 11.2 & v450.80.02+
viadea marked this conversation as resolved.
Show resolved Hide resolved

Apache Spark 3.0.1, 3.0.2, 3.1.1, 3.1.2, Cloudera CDP 7.1.7, Databricks 7.3 ML LTS or 8.2 ML Runtime, and GCP Dataproc 2.0

Expand Down Expand Up @@ -184,7 +184,7 @@ Software Requirements:

OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS 8

CUDA & Nvidia Drivers*: 11.0 or 11.2 & v450.80.02+
CUDA & NVIDIA Drivers*: 11.0 or 11.2 & v450.80.02+

Apache Spark 3.0.1, 3.0.2, 3.1.1, 3.1.2, Cloudera CDP 7.1.7, and GCP Dataproc 2.0

Expand Down Expand Up @@ -230,7 +230,7 @@ Software Requirements:

OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS 8

CUDA & Nvidia Drivers*: 11.0 or 11.2 & v450.80.02+
CUDA & NVIDIA Drivers*: 11.0 or 11.2 & v450.80.02+

Apache Spark 3.0.1, 3.0.2, 3.1.1, 3.1.2, Cloudera CDP 7.1.7, Databricks 8.2 ML Runtime, and GCP Dataproc 2.0

Expand Down Expand Up @@ -296,7 +296,7 @@ Software Requirements:

OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS8

CUDA & Nvidia Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+
CUDA & NVIDIA Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+

Apache Spark 3.0.0, 3.0.1, 3.0.2, 3.1.1, Databricks 7.3 ML LTS Runtime, or GCP Dataproc 2.0

Expand Down Expand Up @@ -344,7 +344,7 @@ Software Requirements:

OS: Ubuntu 16.04, Ubuntu 18.04 or CentOS 7

CUDA & Nvidia Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+
CUDA & NVIDIA Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+

Apache Spark 3.0, 3.0.1, 3.0.2, 3.1.1, Databricks 7.3 ML LTS Runtime, or GCP Dataproc 2.0

Expand All @@ -364,7 +364,7 @@ The list of all supported operations is provided [here](supported_ops.md).
For a detailed list of changes, please refer to the
[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md).

**_Note:_** Using Nvidia driver release 450.80.02, 450.102.04 or 460.32.03 in combination with the
**_Note:_** Using NVIDIA driver release 450.80.02, 450.102.04 or 460.32.03 in combination with the
CUDA 10.1 or 10.2 toolkit may result in long read times when reading a file that is snappy
compressed. In those cases we recommend either running with the CUDA 11.0 toolkit or using a newer
driver. This issue is resolved in the 0.5.0 and higher releases.
Expand All @@ -386,7 +386,7 @@ Software Requirements:

OS: Ubuntu 16.04, Ubuntu 18.04 or CentOS 7

CUDA & Nvidia Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+
CUDA & NVIDIA Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+

Apache Spark 3.0, 3.0.1, 3.0.2, 3.1.1, Databricks 7.3 ML LTS Runtime, or GCP Dataproc 2.0

Expand Down Expand Up @@ -418,7 +418,7 @@ The list of all supported operations is provided [here](supported_ops.md).
For a detailed list of changes, please refer to the
[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md).

**_Note:_** Using Nvidia driver release 450.80.02, 450.102.04 or 460.32.03 in combination with the
**_Note:_** Using NVIDIA driver release 450.80.02, 450.102.04 or 460.32.03 in combination with the
CUDA 10.1 or 10.2 toolkit may result in long read times when reading a file that is snappy
compressed. In those cases we recommend either running with the CUDA 11.0 toolkit or using a newer
driver. This issue is resolved in the 0.5.0 and higher releases.
Expand All @@ -440,7 +440,7 @@ Software Requirements:

OS: Ubuntu 16.04, Ubuntu 18.04 or CentOS 7

CUDA & Nvidia Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+
CUDA & NVIDIA Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+

Apache Spark 3.0, 3.0.1, Databricks 7.3 ML LTS Runtime, or GCP Dataproc 2.0

Expand Down Expand Up @@ -469,7 +469,7 @@ The list of all supported operations is provided [here](supported_ops.md).
For a detailed list of changes, please refer to the
[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md).

**_Note:_** Using Nvidia driver release 450.80.02, 450.102.04 or 460.32.03 in combination with the
**_Note:_** Using NVIDIA driver release 450.80.02, 450.102.04 or 460.32.03 in combination with the
CUDA 10.1 or 10.2 toolkit may result in long read times when reading a file that is snappy
compressed. In those cases we recommend either running with the CUDA 11.0 toolkit or using a newer
driver. This issue is resolved in the 0.5.0 and higher releases.
Expand All @@ -491,7 +491,7 @@ Software Requirements:

OS: Ubuntu 16.04, Ubuntu 18.04 or CentOS 7

CUDA & Nvidia Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+
CUDA & NVIDIA Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+

Apache Spark 3.0, 3.0.1

Expand Down Expand Up @@ -523,7 +523,7 @@ The list of all supported operations is provided
For a detailed list of changes, please refer to the
[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md).

**_Note:_** Using Nvidia driver release 450.80.02, 450.102.04 or 460.32.03 in combination with the
**_Note:_** Using NVIDIA driver release 450.80.02, 450.102.04 or 460.32.03 in combination with the
CUDA 10.1 or 10.2 toolkit may result in long read times when reading a file that is snappy
compressed. In those cases we recommend either running with the CUDA 11.0 toolkit or using a newer
driver. This issue is resolved in the 0.5.0 and higher releases.
Expand Down
9 changes: 5 additions & 4 deletions docs/get-started/getting-started-aws-emr.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Different versions of EMR ship with different versions of Spark, RAPIDS Accelera

| EMR | Spark | RAPIDS Accelerator jar | cuDF jar | xgboost4j-spark jar
| --- | --- | --- | ---| --- |
| 6.4 | 3.1.2 | rapids-4-spark_2.12-0.4.1.jar | cudf-0.18.1-cuda10-1.jar | xgboost4j-spark_3.0-1.2.0-0.1.0.jar |
| 6.3 | 3.1.1 | rapids-4-spark_2.12-0.4.1.jar | cudf-0.18.1-cuda10-1.jar | xgboost4j-spark_3.0-1.2.0-0.1.0.jar |
| 6.2 | 3.0.1 | rapids-4-spark_2.12-0.2.0.jar | cudf-0.15-cuda10-1.jar | xgboost4j-spark_3.0-1.0.0-0.2.0.jar |

Expand All @@ -25,7 +26,7 @@ documentation](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-i

## Configure and Launch AWS EMR with GPU Nodes

The following steps are based on the AWS EMR document ["Using the Nvidia Spark-RAPIDS Accelerator
The following steps are based on the AWS EMR document ["Using the NVIDIA Spark-RAPIDS Accelerator
for Spark"](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-rapids.html)

### Launch an EMR Cluster using AWS CLI
Expand All @@ -35,7 +36,7 @@ g4dn.2xlarge nodes:

```
aws emr create-cluster \
--release-label emr-6.3.0 \
--release-label emr-6.4.0 \
--applications Name=Hadoop Name=Spark Name=Livy Name=JupyterEnterpriseGateway \
--service-role EMR_DefaultRole \
--ec2-attributes KeyName=my-key-pair,InstanceProfile=EMR_EC2_DefaultRole \
Expand Down Expand Up @@ -75,8 +76,8 @@ detailed cluster configuration page.

#### Step 1: Software Configuration and Steps

Select **emr-6.3.0** for the release, uncheck all the software options, and then check **Hadoop
3.2.1**, **Spark 3.1.1**, **Livy 0.7.0** and **JupyterEnterpriseGateway 2.1.0**.
Select **emr-6.4.0** for the release, uncheck all the software options, and then check **Hadoop
3.2.1**, **Spark 3.1.2**, **Livy 0.7.0** and **JupyterEnterpriseGateway 2.1.0**.

In the "Edit software settings" field, copy and paste the configuration from the [EMR
document](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-rapids.html). You can also
Expand Down
2 changes: 1 addition & 1 deletion docs/get-started/getting-started-on-prem.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Download the RAPIDS Accelerator for Apache Spark plugin jar. Then download the v
jar that your version of the accelerator depends on. Each cudf jar is for a specific version of
CUDA and will not run on other versions. The jars use a maven classifier to keep them separate.

- CUDA 11.0/11.1/11.2 => classifier cuda11
- CUDA 11.x => classifier cuda11

For example, here is a sample version of the jars and cudf with CUDA 11.0 support:
- cudf-21.10.0-cuda11.jar
Expand Down
Binary file modified docs/img/AWS-EMR/RAPIDS_EMR_GUI_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/img/AWS-EMR/RAPIDS_EMR_GUI_5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion shims/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ In the following we provide recipes for typical scenarios addressed by the Shim
It's among the easiest issues to resolve. We define a method in SparkShims
trait covering a superset of parameters from all versions and call it
```
ShimLoader.gerSparkShims.methodWithDiscrepancies(p_1, ..., p_n)
ShimLoader.getSparkShims.methodWithDiscrepancies(p_1, ..., p_n)
```
instead of referencing it directly. Shim implementations are in charge of dispatching it further
to correct version-dependent methods. Moreover, unlike in the below sections
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ import org.apache.spark.api.resource.ResourceDiscoveryPlugin
import org.apache.spark.resource.{ResourceInformation, ResourceRequest}

/**
* A Spark Resource Discovery Plugin that relies on the Nvidia GPUs being in PROCESS_EXCLUSIVE
* A Spark Resource Discovery Plugin that relies on the NVIDIA GPUs being in PROCESS_EXCLUSIVE
* mode so that it can discover free GPUs.
* This plugin iterates through all the GPUs on the node and tries to initialize a CUDA context
* on each one. When the GPUs are in process exclusive mode this
Expand Down