Skip to content

Commit

Permalink
Merge branch 'branch-24.06' into regexpr_trick
Browse files Browse the repository at this point in the history
  • Loading branch information
thirtiseven committed Apr 12, 2024
2 parents 5682864 + 5b38fe8 commit 482248c
Show file tree
Hide file tree
Showing 35 changed files with 376 additions and 141 deletions.
1 change: 1 addition & 0 deletions .github/workflows/blossom-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ jobs:
viadea,\
yinqingh,\
parthosa,\
liurenjie1024,\
', format('{0},', github.actor)) && github.event.comment.body == 'build'
steps:
- name: Check if comment is issued by authorized person
Expand Down
2 changes: 1 addition & 1 deletion docs/additional-functionality/advanced_configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ Name | SQL Function(s) | Description | Default Value | Notes
<a name="sql.expression.FromUnixTime"></a>spark.rapids.sql.expression.FromUnixTime|`from_unixtime`|Get the string from a unix timestamp|true|None|
<a name="sql.expression.GetArrayItem"></a>spark.rapids.sql.expression.GetArrayItem| |Gets the field at `ordinal` in the Array|true|None|
<a name="sql.expression.GetArrayStructFields"></a>spark.rapids.sql.expression.GetArrayStructFields| |Extracts the `ordinal`-th fields of all array elements for the data with the type of array of struct|true|None|
<a name="sql.expression.GetJsonObject"></a>spark.rapids.sql.expression.GetJsonObject|`get_json_object`|Extracts a json object from path|true|None|
<a name="sql.expression.GetJsonObject"></a>spark.rapids.sql.expression.GetJsonObject|`get_json_object`|Extracts a json object from path|false|This is disabled by default because Experimental feature that could be unstable or have performance issues.|
<a name="sql.expression.GetMapValue"></a>spark.rapids.sql.expression.GetMapValue| |Gets Value from a Map based on a key|true|None|
<a name="sql.expression.GetStructField"></a>spark.rapids.sql.expression.GetStructField| |Gets the named field of the struct|true|None|
<a name="sql.expression.GetTimestamp"></a>spark.rapids.sql.expression.GetTimestamp| |Gets timestamps from strings using given pattern.|true|None|
Expand Down
91 changes: 91 additions & 0 deletions docs/archive.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,97 @@ nav_order: 15
---
Below are archived releases for RAPIDS Accelerator for Apache Spark.

## Release v24.02.0
### Hardware Requirements:

The plugin is tested on the following architectures:

GPU Models: NVIDIA V100, T4, A10/A100, L4 and H100 GPUs

### Software Requirements:

OS: Ubuntu 20.04, Ubuntu 22.04, CentOS 7, or Rocky Linux 8

NVIDIA Driver*: R470+

Runtime:
Scala 2.12, 2.13
Python, Java Virtual Machine (JVM) compatible with your spark-version.

* Check the Spark documentation for Python and Java version compatibility with your specific
Spark version. For instance, visit `https://spark.apache.org/docs/3.4.1` for Spark 3.4.1.

Supported Spark versions:
Apache Spark 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.2.4
Apache Spark 3.3.0, 3.3.1, 3.3.2, 3.3.3
Apache Spark 3.4.0, 3.4.1
Apache Spark 3.5.0

Supported Databricks runtime versions for Azure and AWS:
Databricks 10.4 ML LTS (GPU, Scala 2.12, Spark 3.2.1)
Databricks 11.3 ML LTS (GPU, Scala 2.12, Spark 3.3.0)
Databricks 12.2 ML LTS (GPU, Scala 2.12, Spark 3.3.2)
Databricks 13.3 ML LTS (GPU, Scala 2.12, Spark 3.4.1)

Supported Dataproc versions:
GCP Dataproc 2.0
GCP Dataproc 2.1

Supported Dataproc Serverless versions:
Spark runtime 1.1 LTS
Spark runtime 2.0
Spark runtime 2.1

*Some hardware may have a minimum driver version greater than R470. Check the GPU spec sheet
for your hardware's minimum driver version.

*For Cloudera and EMR support, please refer to the
[Distributions](https://docs.nvidia.com/spark-rapids/user-guide/latest/faq.html#which-distributions-are-supported) section of the FAQ.

### RAPIDS Accelerator's Support Policy for Apache Spark
The RAPIDS Accelerator maintains support for Apache Spark versions available for download from [Apache Spark](https://spark.apache.org/downloads.html)

### Download RAPIDS Accelerator for Apache Spark v24.02.0

| Processor | Scala Version | Download Jar | Download Signature |
|-----------|---------------|--------------|--------------------|
| x86_64 | Scala 2.12 | [RAPIDS Accelerator v24.02.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar.asc) |
| x86_64 | Scala 2.13 | [RAPIDS Accelerator v24.02.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0.jar.asc) |
| arm64 | Scala 2.12 | [RAPIDS Accelerator v24.02.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0-cuda11-arm64.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0-cuda11-arm64.jar.asc) |
| arm64 | Scala 2.13 | [RAPIDS Accelerator v24.02.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0-cuda11-arm64.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0-cuda11-arm64.jar.asc) |

This package is built against CUDA 11.8. It is tested on V100, T4, A10, A100, L4 and H100 GPUs with
CUDA 11.8 through CUDA 12.0.

### Verify signature
* Download the [PUB_KEY](https://keys.openpgp.org/[email protected]).
* Import the public key: `gpg --import PUB_KEY`
* Verify the signature for Scala 2.12 jar:
`gpg --verify rapids-4-spark_2.12-24.02.0.jar.asc rapids-4-spark_2.12-24.02.0.jar`
* Verify the signature for Scala 2.13 jar:
`gpg --verify rapids-4-spark_2.13-24.02.0.jar.asc rapids-4-spark_2.13-24.02.0.jar`

The output of signature verify:

gpg: Good signature from "NVIDIA Spark (For the signature of spark-rapids release jars) <[email protected]>"

### Release Notes
New functionality and performance improvements for this release include:
* Discontinued support for Nvidia GPUs based on Pascal architecture.
* Set get_json_object functionality to disabled by default.
* Implemented string comparison in AST expressions.
* Expanded timezone support to include options beyond UTC.
* Optional checksums for cached files in the file cache.
* Introduced support for Databricks 13.3 ML LTS.
* Added support for parse_url functionality.
* Introducing Lazy Quantifier support for regular expression functions.
* Added support for the format_number function.
* Enhanced batching support for row-based bounded window functions.
* For updates on RAPIDS Accelerator Tools, please visit [this link](https://github.com/NVIDIA/spark-rapids-tools/releases).

For a detailed list of changes, please refer to the
[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md).

## Release v23.12.2
### Hardware Requirements:

Expand Down
6 changes: 1 addition & 5 deletions docs/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -440,11 +440,7 @@ Known issues are:
### get_json_object

Known issue:
- [Non-string output is not normalized](https://github.com/NVIDIA/spark-rapids/issues/10218)
When returning a result for things other than strings, a number of things are normalized by
Apache Spark, but are not normalized by the GPU, like removing unnecessary white space,
parsing and then serializing floating point numbers, turning single quotes to double quotes,
and removing unneeded escapes for single quotes.
- [Floating-point number normalization error](https://github.com/NVIDIA/spark-rapids-jni/issues/1922). `get_json_object` floating-point number normalization on the GPU could sometimes return incorrect results if the string contains high-precision values, see the String to Float and Float to String section for more details.

## Avro

Expand Down
44 changes: 20 additions & 24 deletions docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ cuDF jar, that is either preinstalled in the Spark classpath on all nodes or sub
that uses the RAPIDS Accelerator For Apache Spark. See the [getting-started
guide](https://docs.nvidia.com/spark-rapids/user-guide/latest/getting-started/overview.html) for more details.

## Release v24.02.0
## Release v24.04.0
### Hardware Requirements:

The plugin is tested on the following architectures:
Expand All @@ -40,17 +40,16 @@ The plugin is tested on the following architectures:

Supported Spark versions:
Apache Spark 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.2.4
Apache Spark 3.3.0, 3.3.1, 3.3.2, 3.3.3
Apache Spark 3.4.0, 3.4.1
Apache Spark 3.5.0
Apache Spark 3.3.0, 3.3.1, 3.3.2, 3.3.3, 3.3.4
Apache Spark 3.4.0, 3.4.1, 3.4.2
Apache Spark 3.5.0, 3.5.1

Supported Databricks runtime versions for Azure and AWS:
Databricks 10.4 ML LTS (GPU, Scala 2.12, Spark 3.2.1)
Databricks 11.3 ML LTS (GPU, Scala 2.12, Spark 3.3.0)
Databricks 12.2 ML LTS (GPU, Scala 2.12, Spark 3.3.2)
Databricks 13.3 ML LTS (GPU, Scala 2.12, Spark 3.4.1)

Supported Dataproc versions:
Supported Dataproc versions (Debian/Ubuntu):
GCP Dataproc 2.0
GCP Dataproc 2.1

Expand All @@ -68,14 +67,14 @@ for your hardware's minimum driver version.
### RAPIDS Accelerator's Support Policy for Apache Spark
The RAPIDS Accelerator maintains support for Apache Spark versions available for download from [Apache Spark](https://spark.apache.org/downloads.html)

### Download RAPIDS Accelerator for Apache Spark v24.02.0
### Download RAPIDS Accelerator for Apache Spark v24.04.0

| Processor | Scala Version | Download Jar | Download Signature |
|-----------|---------------|--------------|--------------------|
| x86_64 | Scala 2.12 | [RAPIDS Accelerator v24.02.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar.asc) |
| x86_64 | Scala 2.13 | [RAPIDS Accelerator v24.02.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0.jar.asc) |
| arm64 | Scala 2.12 | [RAPIDS Accelerator v24.02.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0-cuda11-arm64.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0-cuda11-arm64.jar.asc) |
| arm64 | Scala 2.13 | [RAPIDS Accelerator v24.02.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0-cuda11-arm64.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.02.0/rapids-4-spark_2.13-24.02.0-cuda11-arm64.jar.asc) |
| x86_64 | Scala 2.12 | [RAPIDS Accelerator v24.04.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.04.0/rapids-4-spark_2.12-24.04.0.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.04.0/rapids-4-spark_2.12-24.04.0.jar.asc) |
| x86_64 | Scala 2.13 | [RAPIDS Accelerator v24.04.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.04.0/rapids-4-spark_2.13-24.04.0.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.04.0/rapids-4-spark_2.13-24.04.0.jar.asc) |
| arm64 | Scala 2.12 | [RAPIDS Accelerator v24.04.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.04.0/rapids-4-spark_2.12-24.04.0-cuda11-arm64.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.04.0/rapids-4-spark_2.12-24.04.0-cuda11-arm64.jar.asc) |
| arm64 | Scala 2.13 | [RAPIDS Accelerator v24.04.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.04.0/rapids-4-spark_2.13-24.04.0-cuda11-arm64.jar) | [Signature](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/24.04.0/rapids-4-spark_2.13-24.04.0-cuda11-arm64.jar.asc) |

This package is built against CUDA 11.8. It is tested on V100, T4, A10, A100, L4 and H100 GPUs with
CUDA 11.8 through CUDA 12.0.
Expand All @@ -84,26 +83,23 @@ CUDA 11.8 through CUDA 12.0.
* Download the [PUB_KEY](https://keys.openpgp.org/[email protected]).
* Import the public key: `gpg --import PUB_KEY`
* Verify the signature for Scala 2.12 jar:
`gpg --verify rapids-4-spark_2.12-24.02.0.jar.asc rapids-4-spark_2.12-24.02.0.jar`
`gpg --verify rapids-4-spark_2.12-24.04.0.jar.asc rapids-4-spark_2.12-24.04.0.jar`
* Verify the signature for Scala 2.13 jar:
`gpg --verify rapids-4-spark_2.13-24.02.0.jar.asc rapids-4-spark_2.13-24.02.0.jar`
`gpg --verify rapids-4-spark_2.13-24.04.0.jar.asc rapids-4-spark_2.13-24.04.0.jar`

The output of signature verify:

gpg: Good signature from "NVIDIA Spark (For the signature of spark-rapids release jars) <[email protected]>"

### Release Notes
New functionality and performance improvements for this release include:
* Discontinued support for Nvidia GPUs based on Pascal architecture.
* Set get_json_object functionality to disabled by default.
* Implemented string comparison in AST expressions.
* Expanded timezone support to include options beyond UTC.
* Optional checksums for cached files in the file cache.
* Introduced support for Databricks 13.3 ML LTS.
* Added support for parse_url functionality.
* Introducing Lazy Quantifier support for regular expression functions.
* Added support for the format_number function.
* Enhanced batching support for row-based bounded window functions.
* New functionality and performance improvements for this release include:
* Performance improvements for S3 reading.
Refer to perfio.s3.enabled in [advanced_configs](./additional-functionality/advanced_configs.md) for more details.
* Performance improvements when doing a joins on unique keys.
* Enhanced decompression kernels for zstd and snappy.
* Enhanced Parquet reading performance with modular kernels.
* Added compatibility with Spark version 3.5.1.
* Deprecated support for Databricks 10.4 ML LTS.
* For updates on RAPIDS Accelerator Tools, please visit [this link](https://github.com/NVIDIA/spark-rapids-tools/releases).

For a detailed list of changes, please refer to the
Expand Down
2 changes: 1 addition & 1 deletion docs/supported_ops.md
Original file line number Diff line number Diff line change
Expand Up @@ -6856,7 +6856,7 @@ are limited.
<td rowSpan="3">GetJsonObject</td>
<td rowSpan="3">`get_json_object`</td>
<td rowSpan="3">Extracts a json object from path</td>
<td rowSpan="3">None</td>
<td rowSpan="3">This is disabled by default because Experimental feature that could be unstable or have performance issues.</td>
<td rowSpan="3">project</td>
<td>json</td>
<td> </td>
Expand Down
Loading

0 comments on commit 482248c

Please sign in to comment.