diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 390fa148737..09a92885770 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -113,15 +113,15 @@ mvn -pl dist -PnoSnapshots package -DskipTests Verify that shim-specific classes are hidden from a conventional classloader. ```bash -$ javap -cp dist/target/rapids-4-spark_2.12-23.08.2-SNAPSHOT-cuda11.jar com.nvidia.spark.rapids.shims.SparkShimImpl +$ javap -cp dist/target/rapids-4-spark_2.12-23.10.0-SNAPSHOT-cuda11.jar com.nvidia.spark.rapids.shims.SparkShimImpl Error: class not found: com.nvidia.spark.rapids.shims.SparkShimImpl ``` However, its bytecode can be loaded if prefixed with `spark3XY` not contained in the package name ```bash -$ javap -cp dist/target/rapids-4-spark_2.12-23.08.2-SNAPSHOT-cuda11.jar spark320.com.nvidia.spark.rapids.shims.SparkShimImpl | head -2 -Warning: File dist/target/rapids-4-spark_2.12-23.08.2-SNAPSHOT-cuda11.jar(/spark320/com/nvidia/spark/rapids/shims/SparkShimImpl.class) does not contain class spark320.com.nvidia.spark.rapids.shims.SparkShimImpl +$ javap -cp dist/target/rapids-4-spark_2.12-23.10.0-SNAPSHOT-cuda11.jar spark320.com.nvidia.spark.rapids.shims.SparkShimImpl | head -2 +Warning: File dist/target/rapids-4-spark_2.12-23.10.0-SNAPSHOT-cuda11.jar(/spark320/com/nvidia/spark/rapids/shims/SparkShimImpl.class) does not contain class spark320.com.nvidia.spark.rapids.shims.SparkShimImpl Compiled from "SparkShims.scala" public final class com.nvidia.spark.rapids.shims.SparkShimImpl { ``` @@ -164,7 +164,7 @@ mvn package -pl dist -am -Dbuildver=340 -DallowConventionalDistJar=true Verify `com.nvidia.spark.rapids.shims.SparkShimImpl` is conventionally loadable: ```bash -$ javap -cp dist/target/rapids-4-spark_2.12-23.08.2-SNAPSHOT-cuda11.jar com.nvidia.spark.rapids.shims.SparkShimImpl | head -2 +$ javap -cp dist/target/rapids-4-spark_2.12-23.10.0-SNAPSHOT-cuda11.jar com.nvidia.spark.rapids.shims.SparkShimImpl | head -2 Compiled from "SparkShims.scala" public final class com.nvidia.spark.rapids.shims.SparkShimImpl { ``` diff --git a/docs/archive.md b/docs/archive.md index 14c9e9a209f..90f0aa3bb16 100644 --- a/docs/archive.md +++ b/docs/archive.md @@ -5,6 +5,85 @@ nav_order: 15 --- Below are archived releases for RAPIDS Accelerator for Apache Spark. +## Release v23.08.2 +### Hardware Requirements: + +The plugin is tested on the following architectures: + + GPU Models: NVIDIA P100, V100, T4, A10/A100, L4 and H100 GPUs + +### Software Requirements: + + OS: Ubuntu 20.04, Ubuntu 22.04, CentOS 7, or Rocky Linux 8 + + NVIDIA Driver*: R470+ + + Runtime: + Scala 2.12 + Python, Java Virtual Machine (JVM) compatible with your spark-version. + + * Check the Spark documentation for Python and Java version compatibility with your specific + Spark version. For instance, visit `https://spark.apache.org/docs/3.4.1` for Spark 3.4.1. + Please be aware that we do not currently support Spark builds with Scala 2.13. + + Supported Spark versions: + Apache Spark 3.1.1, 3.1.2, 3.1.3 + Apache Spark 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.2.4 + Apache Spark 3.3.0, 3.3.1, 3.3.2 + Apache Spark 3.4.0, 3.4.1 + Apache Spark 3.5.0 + + Supported Databricks runtime versions for Azure and AWS: + Databricks 10.4 ML LTS (GPU, Scala 2.12, Spark 3.2.1) + Databricks 11.3 ML LTS (GPU, Scala 2.12, Spark 3.3.0) + Databricks 12.2 ML LTS (GPU, Scala 2.12, Spark 3.3.2) + + Supported Dataproc versions: + GCP Dataproc 2.0 + GCP Dataproc 2.1 + +*Some hardware may have a minimum driver version greater than R470. Check the GPU spec sheet +for your hardware's minimum driver version. + +*For Cloudera and EMR support, please refer to the +[Distributions](https://docs.nvidia.com/spark-rapids/user-guide/latest/faq.html#which-distributions-are-supported) section of the FAQ. + +### Download v23.08.2 +* Download the [RAPIDS + Accelerator for Apache Spark 23.08.2 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.08.2/rapids-4-spark_2.12-23.08.2.jar) + +This package is built against CUDA 11.8. It is tested on V100, T4, A10, A100, L4 and H100 GPUs with +CUDA 11.8 through CUDA 12.0. + +Note that v23.08.1 is deprecated. + +### Verify signature +* Download the [RAPIDS Accelerator for Apache Spark 23.08.2 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.08.2/rapids-4-spark_2.12-23.08.2.jar) + and [RAPIDS Accelerator for Apache Spark 23.08.2 jars.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.08.2/rapids-4-spark_2.12-23.08.2.jar.asc) +* Download the [PUB_KEY](https://keys.openpgp.org/search?q=sw-spark@nvidia.com). +* Import the public key: `gpg --import PUB_KEY` +* Verify the signature: `gpg --verify rapids-4-spark_2.12-23.08.2.jar.asc rapids-4-spark_2.12-23.08.2.jar` + +The output of signature verify: + + gpg: Good signature from "NVIDIA Spark (For the signature of spark-rapids release jars) " + +### Release Notes +New functionality and performance improvements for this release include: +* Compatibility with Databricks AWS & Azure 12.2 ML LTS. +* Enhanced stability and support for ORC and Parquet. +* Reduction of out-of-memory (OOM) occurrences. +* Corner case evaluation for data formats, operators and expressions +* Qualification and Profiling tool: + * Profiling tool now supports Azure Databricks and AWS Databricks. + * Qualification tool can provide advice on unaccelerated operations. + * Improve user experience through CLI design. + * Qualification tool provides configuration and migration recommendations for Dataproc and EMR. +* Fixes Databricks build issues from the previous 23.08 release. + +For a detailed list of changes, please refer to the +[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md). + ## Release v23.08.1 ### Hardware Requirements: diff --git a/docs/dev/testing.md b/docs/dev/testing.md index b7cdbf0e42c..9d92ae4aacf 100644 --- a/docs/dev/testing.md +++ b/docs/dev/testing.md @@ -5,5 +5,5 @@ nav_order: 2 parent: Developer Overview --- An overview of testing can be found within the repository at: -* [Unit tests](https://github.com/NVIDIA/spark-rapids/tree/branch-23.08/tests#readme) -* [Integration testing](https://github.com/NVIDIA/spark-rapids/tree/branch-23.08/integration_tests#readme) +* [Unit tests](https://github.com/NVIDIA/spark-rapids/tree/branch-23.10/tests#readme) +* [Integration testing](https://github.com/NVIDIA/spark-rapids/tree/branch-23.10/integration_tests#readme) diff --git a/docs/download.md b/docs/download.md index 45639efa937..18d873765d3 100644 --- a/docs/download.md +++ b/docs/download.md @@ -18,7 +18,7 @@ cuDF jar, that is either preinstalled in the Spark classpath on all nodes or sub that uses the RAPIDS Accelerator For Apache Spark. See the [getting-started guide](https://nvidia.github.io/spark-rapids/Getting-Started/) for more details. -## Release v23.08.2 +## Release v23.10.0 ### Hardware Requirements: The plugin is tested on the following architectures: @@ -40,9 +40,8 @@ The plugin is tested on the following architectures: Please be aware that we do not currently support Spark builds with Scala 2.13. Supported Spark versions: - Apache Spark 3.1.1, 3.1.2, 3.1.3 Apache Spark 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.2.4 - Apache Spark 3.3.0, 3.3.1, 3.3.2 + Apache Spark 3.3.0, 3.3.1, 3.3.2, 3.3.3 Apache Spark 3.4.0, 3.4.1 Apache Spark 3.5.0 @@ -61,38 +60,40 @@ for your hardware's minimum driver version. *For Cloudera and EMR support, please refer to the [Distributions](https://docs.nvidia.com/spark-rapids/user-guide/latest/faq.html#which-distributions-are-supported) section of the FAQ. -### Download v23.08.2 +#### RAPIDS Accelerator's Support Policy for Apache Spark +The RAPIDS Accelerator maintains support for Apache Spark versions available for download from [Apache Spark](https://spark.apache.org/downloads.html) + +### Download v23.10.0 * Download the [RAPIDS - Accelerator for Apache Spark 23.08.2 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.08.2/rapids-4-spark_2.12-23.08.2.jar) + Accelerator for Apache Spark 23.10.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.10.0/rapids-4-spark_2.12-23.10.0.jar) This package is built against CUDA 11.8. It is tested on V100, T4, A10, A100, L4 and H100 GPUs with CUDA 11.8 through CUDA 12.0. -Note that v23.08.0 is deprecated. - ### Verify signature -* Download the [RAPIDS Accelerator for Apache Spark 23.08.2 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.08.2/rapids-4-spark_2.12-23.08.2.jar) - and [RAPIDS Accelerator for Apache Spark 23.08.2 jars.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.08.2/rapids-4-spark_2.12-23.08.2.jar.asc) +* Download the [RAPIDS Accelerator for Apache Spark 23.10.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.10.0/rapids-4-spark_2.12-23.10.0.jar) + and [RAPIDS Accelerator for Apache Spark 23.10.0 jars.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.10.0/rapids-4-spark_2.12-23.10.0.jar.asc) * Download the [PUB_KEY](https://keys.openpgp.org/search?q=sw-spark@nvidia.com). * Import the public key: `gpg --import PUB_KEY` -* Verify the signature: `gpg --verify rapids-4-spark_2.12-23.08.2.jar.asc rapids-4-spark_2.12-23.08.2.jar` +* Verify the signature: `gpg --verify rapids-4-spark_2.12-23.10.0.jar.asc rapids-4-spark_2.12-23.10.0.jar` -The output if signature verify: +The output of signature verify: gpg: Good signature from "NVIDIA Spark (For the signature of spark-rapids release jars) " ### Release Notes New functionality and performance improvements for this release include: -* Compatibility with Databricks AWS & Azure 12.2 ML LTS. -* Enhanced stability and support for ORC and Parquet. -* Reduction of out-of-memory (OOM) occurrences. -* Corner case evaluation for data formats, operators and expressions +* Introduced support for Spark 3.5.0. +* Improved memory management for better control in YARN and K8s on CSP. +* Strengthened Parquet and ORC tests for enhanced stability and support. +* Reduce GPU out-of-memory (OOM) occurrences. +* Enhanced driver log with actionable insights. * Qualification and Profiling tool: - * Profiling tool now supports Azure Databricks and AWS Databricks. - * Qualification tool can provide advice on unaccelerated operations. - * Improve user experience through CLI design. - * Qualification tool provides configuration and migration recommendations for Dataproc and EMR. -* Fixes Databricks build issues from the previous 23.08 release. + * Enhanced user experience with the availability of the 'ascli' tool for qualification and + profiling across all platforms. + * The qualification tool now accommodates CPU-fallback transitions and broadens the speedup factor coverage. + * Extended diagnostic support for user tools to cover EMR, Databricks AWS, and Databricks Azure. + * Introduced support for cluster configuration recommendations in the profiling tool for supported platforms. For a detailed list of changes, please refer to the [CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md).