diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 1cc52e5472a..f832cd2facd 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -130,15 +130,15 @@ mvn -pl dist -PnoSnapshots package -DskipTests Verify that shim-specific classes are hidden from a conventional classloader. ```bash -$ javap -cp dist/target/rapids-4-spark_2.12-23.10.0-SNAPSHOT-cuda11.jar com.nvidia.spark.rapids.shims.SparkShimImpl +$ javap -cp dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar com.nvidia.spark.rapids.shims.SparkShimImpl Error: class not found: com.nvidia.spark.rapids.shims.SparkShimImpl ``` However, its bytecode can be loaded if prefixed with `spark3XY` not contained in the package name ```bash -$ javap -cp dist/target/rapids-4-spark_2.12-23.10.0-SNAPSHOT-cuda11.jar spark320.com.nvidia.spark.rapids.shims.SparkShimImpl | head -2 -Warning: File dist/target/rapids-4-spark_2.12-23.10.0-SNAPSHOT-cuda11.jar(/spark320/com/nvidia/spark/rapids/shims/SparkShimImpl.class) does not contain class spark320.com.nvidia.spark.rapids.shims.SparkShimImpl +$ javap -cp dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar spark320.com.nvidia.spark.rapids.shims.SparkShimImpl | head -2 +Warning: File dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar(/spark320/com/nvidia/spark/rapids/shims/SparkShimImpl.class) does not contain class spark320.com.nvidia.spark.rapids.shims.SparkShimImpl Compiled from "SparkShims.scala" public final class com.nvidia.spark.rapids.shims.SparkShimImpl { ``` @@ -181,7 +181,7 @@ mvn package -pl dist -am -Dbuildver=340 -DallowConventionalDistJar=true Verify `com.nvidia.spark.rapids.shims.SparkShimImpl` is conventionally loadable: ```bash -$ javap -cp dist/target/rapids-4-spark_2.12-23.10.0-SNAPSHOT-cuda11.jar com.nvidia.spark.rapids.shims.SparkShimImpl | head -2 +$ javap -cp dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar com.nvidia.spark.rapids.shims.SparkShimImpl | head -2 Compiled from "SparkShims.scala" public final class com.nvidia.spark.rapids.shims.SparkShimImpl { ``` diff --git a/docs/archive.md b/docs/archive.md index dae04b46bb0..83108f7e200 100644 --- a/docs/archive.md +++ b/docs/archive.md @@ -5,6 +5,86 @@ nav_order: 15 --- Below are archived releases for RAPIDS Accelerator for Apache Spark. +## Release v23.10.0 +### Hardware Requirements: + +The plugin is tested on the following architectures: + + GPU Models: NVIDIA P100, V100, T4, A10/A100, L4 and H100 GPUs + +### Software Requirements: + + OS: Ubuntu 20.04, Ubuntu 22.04, CentOS 7, or Rocky Linux 8 + + NVIDIA Driver*: R470+ + + Runtime: + Scala 2.12 + Python, Java Virtual Machine (JVM) compatible with your spark-version. + + * Check the Spark documentation for Python and Java version compatibility with your specific + Spark version. For instance, visit `https://spark.apache.org/docs/3.4.1` for Spark 3.4.1. + Please be aware that we do not currently support Spark builds with Scala 2.13. + + Supported Spark versions: + Apache Spark 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.2.4 + Apache Spark 3.3.0, 3.3.1, 3.3.2, 3.3.3 + Apache Spark 3.4.0, 3.4.1 + Apache Spark 3.5.0 + + Supported Databricks runtime versions for Azure and AWS: + Databricks 10.4 ML LTS (GPU, Scala 2.12, Spark 3.2.1) + Databricks 11.3 ML LTS (GPU, Scala 2.12, Spark 3.3.0) + Databricks 12.2 ML LTS (GPU, Scala 2.12, Spark 3.3.2) + + Supported Dataproc versions: + GCP Dataproc 2.0 + GCP Dataproc 2.1 + +*Some hardware may have a minimum driver version greater than R470. Check the GPU spec sheet +for your hardware's minimum driver version. + +*For Cloudera and EMR support, please refer to the +[Distributions](https://docs.nvidia.com/spark-rapids/user-guide/latest/faq.html#which-distributions-are-supported) section of the FAQ. + +#### RAPIDS Accelerator's Support Policy for Apache Spark +The RAPIDS Accelerator maintains support for Apache Spark versions available for download from [Apache Spark](https://spark.apache.org/downloads.html) + +### Download v23.10.0 +* Download the [RAPIDS + Accelerator for Apache Spark 23.10.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.10.0/rapids-4-spark_2.12-23.10.0.jar) + +This package is built against CUDA 11.8. It is tested on V100, T4, A10, A100, L4 and H100 GPUs with +CUDA 11.8 through CUDA 12.0. + +### Verify signature +* Download the [RAPIDS Accelerator for Apache Spark 23.10.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.10.0/rapids-4-spark_2.12-23.10.0.jar) + and [RAPIDS Accelerator for Apache Spark 23.10.0 jars.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.10.0/rapids-4-spark_2.12-23.10.0.jar.asc) +* Download the [PUB_KEY](https://keys.openpgp.org/search?q=sw-spark@nvidia.com). +* Import the public key: `gpg --import PUB_KEY` +* Verify the signature: `gpg --verify rapids-4-spark_2.12-23.10.0.jar.asc rapids-4-spark_2.12-23.10.0.jar` + +The output of signature verify: + + gpg: Good signature from "NVIDIA Spark (For the signature of spark-rapids release jars) " + +### Release Notes +New functionality and performance improvements for this release include: +* Introduced support for Spark 3.5.0. +* Improved memory management for better control in YARN and K8s on CSP. +* Strengthened Parquet and ORC tests for enhanced stability and support. +* Reduce GPU out-of-memory (OOM) occurrences. +* Enhanced driver log with actionable insights. +* Qualification and Profiling tool: + * Enhanced user experience with the availability of the 'ascli' tool for qualification and + profiling across all platforms. + * The qualification tool now accommodates CPU-fallback transitions and broadens the speedup factor coverage. + * Extended diagnostic support for user tools to cover EMR, Databricks AWS, and Databricks Azure. + * Introduced support for cluster configuration recommendations in the profiling tool for supported platforms. + +For a detailed list of changes, please refer to the +[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md). + ## Release v23.08.2 ### Hardware Requirements: diff --git a/docs/dev/testing.md b/docs/dev/testing.md index 9d92ae4aacf..318d3d0584e 100644 --- a/docs/dev/testing.md +++ b/docs/dev/testing.md @@ -5,5 +5,5 @@ nav_order: 2 parent: Developer Overview --- An overview of testing can be found within the repository at: -* [Unit tests](https://github.com/NVIDIA/spark-rapids/tree/branch-23.10/tests#readme) -* [Integration testing](https://github.com/NVIDIA/spark-rapids/tree/branch-23.10/integration_tests#readme) +* [Unit tests](https://github.com/NVIDIA/spark-rapids/tree/branch-23.12/tests#readme) +* [Integration testing](https://github.com/NVIDIA/spark-rapids/tree/branch-23.12/integration_tests#readme) diff --git a/docs/download.md b/docs/download.md index 1c7e26fc090..e68af9c65ae 100644 --- a/docs/download.md +++ b/docs/download.md @@ -18,12 +18,12 @@ cuDF jar, that is either preinstalled in the Spark classpath on all nodes or sub that uses the RAPIDS Accelerator For Apache Spark. See the [getting-started guide](https://docs.nvidia.com/spark-rapids/user-guide/latest/getting-started/overview.html) for more details. -## Release v23.10.0 +## Release v23.12.0 ### Hardware Requirements: The plugin is tested on the following architectures: - GPU Models: NVIDIA P100, V100, T4, A10/A100, L4 and H100 GPUs + GPU Models: NVIDIA V100, T4, A10/A100, L4 and H100 GPUs ### Software Requirements: @@ -32,12 +32,11 @@ The plugin is tested on the following architectures: NVIDIA Driver*: R470+ Runtime: - Scala 2.12 + Scala 2.12, 2.13 Python, Java Virtual Machine (JVM) compatible with your spark-version. * Check the Spark documentation for Python and Java version compatibility with your specific - Spark version. For instance, visit `https://spark.apache.org/docs/3.4.1` for Spark 3.4.1. - Please be aware that we do not currently support Spark builds with Scala 2.13. + Spark version. For instance, visit `https://spark.apache.org/docs/3.4.1` for Spark 3.4.1. Supported Spark versions: Apache Spark 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.2.4 @@ -53,6 +52,9 @@ The plugin is tested on the following architectures: Supported Dataproc versions: GCP Dataproc 2.0 GCP Dataproc 2.1 + + Supported Dataproc Serverless versions: + Spark runtime 1.1 LTS *Some hardware may have a minimum driver version greater than R470. Check the GPU spec sheet for your hardware's minimum driver version. @@ -60,22 +62,28 @@ for your hardware's minimum driver version. *For Cloudera and EMR support, please refer to the [Distributions](https://docs.nvidia.com/spark-rapids/user-guide/latest/faq.html#which-distributions-are-supported) section of the FAQ. -#### RAPIDS Accelerator's Support Policy for Apache Spark +### RAPIDS Accelerator's Support Policy for Apache Spark The RAPIDS Accelerator maintains support for Apache Spark versions available for download from [Apache Spark](https://spark.apache.org/downloads.html) -### Download v23.10.0 -* Download the [RAPIDS - Accelerator for Apache Spark 23.10.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.10.0/rapids-4-spark_2.12-23.10.0.jar) +### Download RAPIDS Accelerator for Apache Spark v23.12.0 +- **Scala 2.12:** + - [RAPIDS Accelerator for Apache Spark 23.12.0 - Scala 2.12 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.0/rapids-4-spark_2.12-23.12.0.jar) + - [RAPIDS Accelerator for Apache Spark 23.12.0 - Scala 2.12 jar.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.0/rapids-4-spark_2.12-23.12.0.jar.asc) + +- **Scala 2.13:** + - [RAPIDS Accelerator for Apache Spark 23.12.0 - Scala 2.13 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/23.12.0/rapids-4-spark_2.13-23.12.0.jar) + - [RAPIDS Accelerator for Apache Spark 23.12.0 - Scala 2.13 jar.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/23.12.0/rapids-4-spark_2.13-23.12.0.jar.asc) This package is built against CUDA 11.8. It is tested on V100, T4, A10, A100, L4 and H100 GPUs with CUDA 11.8 through CUDA 12.0. ### Verify signature -* Download the [RAPIDS Accelerator for Apache Spark 23.10.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.10.0/rapids-4-spark_2.12-23.10.0.jar) - and [RAPIDS Accelerator for Apache Spark 23.10.0 jars.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.10.0/rapids-4-spark_2.12-23.10.0.jar.asc) * Download the [PUB_KEY](https://keys.openpgp.org/search?q=sw-spark@nvidia.com). * Import the public key: `gpg --import PUB_KEY` -* Verify the signature: `gpg --verify rapids-4-spark_2.12-23.10.0.jar.asc rapids-4-spark_2.12-23.10.0.jar` +* Verify the signature for Scala 2.12 jar: + `gpg --verify rapids-4-spark_2.12-23.12.0.jar.asc rapids-4-spark_2.12-23.12.0.jar` +* Verify the signature for Scala 2.13 jar: + `gpg --verify rapids-4-spark_2.13-23.12.0.jar.asc rapids-4-spark_2.13-23.12.0.jar` The output of signature verify: @@ -83,17 +91,16 @@ The output of signature verify: ### Release Notes New functionality and performance improvements for this release include: -* Introduced support for Spark 3.5.0. -* Improved memory management for better control in YARN and K8s on CSP. -* Strengthened Parquet and ORC tests for enhanced stability and support. -* Reduce GPU out-of-memory (OOM) occurrences. -* Enhanced driver log with actionable insights. +* Introduced support for chunked reading of ORC files. +* Enhanced support for additional time zones and added stack function support. +* Enhanced performance for join and aggregation operations. +* Kernel optimizations have been implemented to improve Parquet read performance. +* RAPIDS Accelerator also built and tested with Scala 2.13. +* Last version to support Pascal-based Nvidia GPUs; discontinued in the next release. * Qualification and Profiling tool: - * Enhanced user experience with the availability of the 'ascli' tool for qualification and - profiling across all platforms. - * The qualification tool now accommodates CPU-fallback transitions and broadens the speedup factor coverage. - * Extended diagnostic support for user tools to cover EMR, Databricks AWS, and Databricks Azure. - * Introduced support for cluster configuration recommendations in the profiling tool for supported platforms. + * Profiling Tool now processes Spark Driver log for GPU runs, enhancing feature analysis. + * Auto-tuner recommendations include AQE settings for optimized performance. + * New configurations in Profiler for enabling off-default features: udfCompiler, incompatibleDateFormats, hasExtendedYearValues. For a detailed list of changes, please refer to the [CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md).