')").show()
-+----------+
-| entries|
-+----------+
-| {}|
-| null|
-|{A -> 100}|
-+----------+
-```
+ * If the input JSON is given as multiple rows, any row containing invalid JSON format will be parsed as an empty
+ struct instead of a null value ([#9592](https://github.com/NVIDIA/spark-rapids/issues/9592)).
+
+### `to_json` function
+
+The `to_json` function is disabled by default because it is experimental and has some known incompatibilities
+with Spark, and can be enabled by setting `spark.rapids.sql.expression.StructsToJson=true`.
+
+Known issues are:
+
+- There can be rounding differences when formatting floating-point numbers as strings. For example, Spark may
+ produce `-4.1243574E26` but the GPU may produce `-4.124357351E26`.
+- Not all JSON options are respected
### JSON Floating Point
@@ -640,8 +694,7 @@ leads to restrictions:
* Float values cannot be larger than `1e18` or smaller than `-1e18` after conversion.
* The results produced by GPU slightly differ from the default results of Spark.
-Starting from 22.06 this conf is enabled, to disable this operation on the GPU when using Spark 3.1.0 or
-later, set
+This configuration is enabled by default. To disable this operation on the GPU set
[`spark.rapids.sql.castFloatToDecimal.enabled`](additional-functionality/advanced_configs.md#sql.castFloatToDecimal.enabled) to `false`
### Float to Integral Types
@@ -652,12 +705,10 @@ Spark 3.1.0 the MIN and MAX values were floating-point values such as `Int.MaxVa
starting with 3.1.0 these are now integral types such as `Int.MaxValue` so this has slightly
affected the valid range of values and now differs slightly from the behavior on GPU in some cases.
-Starting from 22.06 this conf is enabled, to disable this operation on the GPU when using Spark 3.1.0 or later, set
+This configuration is enabled by default. To disable this operation on the GPU set
[`spark.rapids.sql.castFloatToIntegralTypes.enabled`](additional-functionality/advanced_configs.md#sql.castFloatToIntegralTypes.enabled)
to `false`.
-This configuration setting is ignored when using Spark versions prior to 3.1.0.
-
### Float to String
The GPU will use different precision than Java's toString method when converting floating-point data
@@ -668,7 +719,7 @@ The `format_number` function will retain 10 digits of precision for the GPU when
point number, but Spark will retain up to 17 digits of precision, i.e. `format_number(1234567890.1234567890, 5)`
will return `1,234,567,890.00000` on the GPU and `1,234,567,890.12346` on the CPU. To enable this on the GPU, set [`spark.rapids.sql.formatNumberFloat.enabled`](additional-functionality/advanced_configs.md#sql.formatNumberFloat.enabled) to `true`.
-Starting from 22.06 this conf is enabled by default, to disable this operation on the GPU, set
+This configuration is enabled by default. To disable this operation on the GPU set
[`spark.rapids.sql.castFloatToString.enabled`](additional-functionality/advanced_configs.md#sql.castFloatToString.enabled) to `false`.
### String to Float
@@ -682,7 +733,7 @@ default behavior in Apache Spark is to return `+Infinity` and `-Infinity`, respe
Also, the GPU does not support casting from strings containing hex values.
-Starting from 22.06 this conf is enabled by default, to enable this operation on the GPU, set
+This configuration is enabled by default. To disable this operation on the GPU set
[`spark.rapids.sql.castStringToFloat.enabled`](additional-functionality/advanced_configs.md#sql.castStringToFloat.enabled) to `false`.
### String to Date
diff --git a/docs/configs.md b/docs/configs.md
index 9be096e8c7f..9b7234e13b8 100644
--- a/docs/configs.md
+++ b/docs/configs.md
@@ -10,7 +10,7 @@ The following is the list of options that `rapids-plugin-4-spark` supports.
On startup use: `--conf [conf key]=[conf value]`. For example:
```
-${SPARK_HOME}/bin/spark-shell --jars rapids-4-spark_2.12-23.10.0-cuda11.jar \
+${SPARK_HOME}/bin/spark-shell --jars rapids-4-spark_2.12-23.12.0-cuda11.jar \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.rapids.sql.concurrentGpuTasks=2
```
diff --git a/docs/dev/README.md b/docs/dev/README.md
index 5af0d309c3c..edd6c2313f5 100644
--- a/docs/dev/README.md
+++ b/docs/dev/README.md
@@ -13,6 +13,7 @@ following topics:
* [How Spark Executes the Physical Plan](#how-spark-executes-the-physical-plan)
* [How the Plugin Works](#how-the-rapids-plugin-works)
* [Plugin Replacement Rules](#plugin-replacement-rules)
+ * [Working with Data Sources](#working-with-data-sources)
* [Guidelines for Replacing Catalyst Executors and Expressions](#guidelines-for-replacing-catalyst-executors-and-expressions)
* [Setting Up the Class](#setting-up-the-class)
* [Expressions](#expressions)
@@ -131,6 +132,11 @@ executor, expression, etc.), and applying the rule that matches. See the
There is a separate guide for working with
[Adaptive Query Execution](adaptive-query.md).
+### Working with Data Sources
+
+The plugin supports v1 and v2 data sources for file formats such as CSV,
+Orc, JSON, and Parquet. See the [data source guide](data-sources.md) for more information.
+
## Guidelines for Replacing Catalyst Executors and Expressions
Most development work in the plugin involves translating various Catalyst
executor and expression nodes into new nodes that execute on the GPU. This
diff --git a/docs/dev/data-sources.md b/docs/dev/data-sources.md
new file mode 100644
index 00000000000..79bf8b292bd
--- /dev/null
+++ b/docs/dev/data-sources.md
@@ -0,0 +1,68 @@
+---
+layout: page
+title: Working with Spark Data Sources
+nav_order: 2
+parent: Developer Overview
+---
+
+# Working with Spark Data Sources
+
+## Data Source API Versions
+
+Spark has two major versions of its data source APIs, simply known as "v1" and "v2". There is a configuration
+property `spark.sql.sources.useV1SourceList` which determines which API version is used when reading from data
+sources such as CSV, Orc, and Parquet. The default value for this configuration option (as of Spark 3.4.0)
+is `"avro,csv,json,kafka,orc,parquet,text"`, meaning that all of these data sources fall back to v1 by default.
+
+When using Spark SQL (including the DataFrame API), the representation of a read in the physical plan will be
+different depending on the API version being used, and in the plugin we therefore have different code paths
+for tagging and replacing these operations.
+
+## V1 API
+
+In the v1 API, a read from a file-based data source is represented by a `FileSourceScanExec`, which wraps
+a `HadoopFsRelation`.
+
+`HadoopFsRelation` is an important component in Apache Spark. It represents a relation based on data stored in the
+Hadoop FileSystem. When we talk about the Hadoop FileSystem in this context, it encompasses various distributed
+storage systems that are Hadoop-compatible, such as HDFS (Hadoop Distributed FileSystem), Amazon S3, and others.
+
+`HadoopFsRelation` is not tied to a specific file format. Instead, it relies on implementations of the `FileFormat`
+interface to read and write data.
+
+This means that various file formats like CSV, Parquet, and ORC can have their implementations of the `FileFormat`
+interface, and `HadoopFsRelation` will be able to work with any of them.
+
+When overriding `FileSourceScanExec` in the plugin, there are a number of different places where tagging code can be
+placed, depending on the file format. We start in GpuOverrides with a map entry `GpuOverrides.exec[FileSourceScanExec]`,
+and then the hierarchical flow is typically as follows, although it may vary between shim versions:
+
+```
+FileSourceScanExecMeta.tagPlanForGpu
+ ScanExecShims.tagGpuFileSourceScanExecSupport
+ GpuFileSourceScanExec.tagSupport
+```
+
+`GpuFileSourceScanExec.tagSupport` will inspect the `FileFormat` and then call into one of the following:
+
+- `GpuReadCSVFileFormat.tagSupport`, which calls `GpuCSVScan.tagSupport`
+- `GpuReadOrcFileFormat.tagSupport`, which calls `GpuOrcScan.tagSupport`
+- `GpuReadParquetFileFormat.tagSupport`, which calls `GpuParquetScan.tagSupport`
+
+The classes `GpuCSVScan`, `GpuParquetScan`, `GpuOrcScan`, and `GpuJsonScan` are also called
+from the v2 API, so this is a good place to put code that is not specific to either API
+version. These scan classes also call into `FileFormatChecks.tag`.
+
+## V2 API
+
+When using the v2 API, the physical plan will contain a `BatchScanExec`, which wraps a scan that implements
+the `org.apache.spark.sql.connector.read.Scan` trait. The scan implementations include `CsvScan`, `ParquetScan`,
+and `OrcScan`. These are the same scan implementations used in the v1 API, and the plugin tagging code can be
+placed in one of the following methods:
+
+- `GpuCSVScan.tagSupport`
+- `GpuOrcScan.tagSupport`
+- `GpuParquetScan.tagSupport`
+
+When overriding v2 operators in the plugin, we can override both `BatchScanExec` and the individual scans, such
+as `CsvScanExec`.
diff --git a/docs/dev/shims.md b/docs/dev/shims.md
index a15c6570fd6..cca778382b8 100644
--- a/docs/dev/shims.md
+++ b/docs/dev/shims.md
@@ -68,17 +68,17 @@ Using JarURLConnection URLs we create a Parallel World of the current version wi
Spark 3.0.2's URLs:
```text
-jar:file:/home/spark/rapids-4-spark_2.12-23.10.0.jar!/
-jar:file:/home/spark/rapids-4-spark_2.12-23.10.0.jar!/spark3xx-common/
-jar:file:/home/spark/rapids-4-spark_2.12-23.10.0.jar!/spark302/
+jar:file:/home/spark/rapids-4-spark_2.12-23.12.0.jar!/
+jar:file:/home/spark/rapids-4-spark_2.12-23.12.0.jar!/spark3xx-common/
+jar:file:/home/spark/rapids-4-spark_2.12-23.12.0.jar!/spark302/
```
Spark 3.2.0's URLs :
```text
-jar:file:/home/spark/rapids-4-spark_2.12-23.10.0.jar!/
-jar:file:/home/spark/rapids-4-spark_2.12-23.10.0.jar!/spark3xx-common/
-jar:file:/home/spark/rapids-4-spark_2.12-23.10.0.jar!/spark320/
+jar:file:/home/spark/rapids-4-spark_2.12-23.12.0.jar!/
+jar:file:/home/spark/rapids-4-spark_2.12-23.12.0.jar!/spark3xx-common/
+jar:file:/home/spark/rapids-4-spark_2.12-23.12.0.jar!/spark320/
```
### Late Inheritance in Public Classes
@@ -114,17 +114,19 @@ that the classloader is
[set up at load time](https://github.com/NVIDIA/spark-rapids/blob/main/sql-plugin/src/main/scala/com/nvidia/spark/SQLPlugin.scala#L29)
before the DriverPlugin and ExecutorPlugin instances are called the `init` method on.
-By making a visible class merely a wrapper of the real implementation, extending `scala.Proxy` where `self` is a lazy
-val, we prevent classes from Parallel Worlds to be loaded before they can be, and are actually required.
+By making a visible class merely a wrapper of the real implementation where the real implementation
+is a `lazy val` we prevent classes from Parallel Worlds to be loaded before they can be, and are
+actually required.
+
For examples see:
-1. `abstract class ProxyRapidsShuffleInternalManagerBase`
+1. `class ProxyRapidsShuffleInternalManagerBase`
2. `class ExclusiveModeGpuDiscoveryPlugin`
Note that we currently have to manually code up the delegation methods to the tune of:
```Scala
- def method(x: SomeThing) = self.method(x)
+ def method(x: SomeThing) = realImpl.method(x)
```
This could be automatically generated with a simple tool processing the `scalap` output or Scala macros at
diff --git a/docs/dev/testing.md b/docs/dev/testing.md
index 9d92ae4aacf..318d3d0584e 100644
--- a/docs/dev/testing.md
+++ b/docs/dev/testing.md
@@ -5,5 +5,5 @@ nav_order: 2
parent: Developer Overview
---
An overview of testing can be found within the repository at:
-* [Unit tests](https://github.com/NVIDIA/spark-rapids/tree/branch-23.10/tests#readme)
-* [Integration testing](https://github.com/NVIDIA/spark-rapids/tree/branch-23.10/integration_tests#readme)
+* [Unit tests](https://github.com/NVIDIA/spark-rapids/tree/branch-23.12/tests#readme)
+* [Integration testing](https://github.com/NVIDIA/spark-rapids/tree/branch-23.12/integration_tests#readme)
diff --git a/docs/download.md b/docs/download.md
index 18d873765d3..e68af9c65ae 100644
--- a/docs/download.md
+++ b/docs/download.md
@@ -16,14 +16,14 @@ The RAPIDS Accelerator For Apache Spark requires each worker node in the cluster
The RAPIDS Accelerator For Apache Spark consists of two jars: a plugin jar along with the RAPIDS
cuDF jar, that is either preinstalled in the Spark classpath on all nodes or submitted with each job
that uses the RAPIDS Accelerator For Apache Spark. See the [getting-started
-guide](https://nvidia.github.io/spark-rapids/Getting-Started/) for more details.
+guide](https://docs.nvidia.com/spark-rapids/user-guide/latest/getting-started/overview.html) for more details.
-## Release v23.10.0
+## Release v23.12.0
### Hardware Requirements:
The plugin is tested on the following architectures:
- GPU Models: NVIDIA P100, V100, T4, A10/A100, L4 and H100 GPUs
+ GPU Models: NVIDIA V100, T4, A10/A100, L4 and H100 GPUs
### Software Requirements:
@@ -32,12 +32,11 @@ The plugin is tested on the following architectures:
NVIDIA Driver*: R470+
Runtime:
- Scala 2.12
+ Scala 2.12, 2.13
Python, Java Virtual Machine (JVM) compatible with your spark-version.
* Check the Spark documentation for Python and Java version compatibility with your specific
- Spark version. For instance, visit `https://spark.apache.org/docs/3.4.1` for Spark 3.4.1.
- Please be aware that we do not currently support Spark builds with Scala 2.13.
+ Spark version. For instance, visit `https://spark.apache.org/docs/3.4.1` for Spark 3.4.1.
Supported Spark versions:
Apache Spark 3.2.0, 3.2.1, 3.2.2, 3.2.3, 3.2.4
@@ -53,6 +52,9 @@ The plugin is tested on the following architectures:
Supported Dataproc versions:
GCP Dataproc 2.0
GCP Dataproc 2.1
+
+ Supported Dataproc Serverless versions:
+ Spark runtime 1.1 LTS
*Some hardware may have a minimum driver version greater than R470. Check the GPU spec sheet
for your hardware's minimum driver version.
@@ -60,22 +62,28 @@ for your hardware's minimum driver version.
*For Cloudera and EMR support, please refer to the
[Distributions](https://docs.nvidia.com/spark-rapids/user-guide/latest/faq.html#which-distributions-are-supported) section of the FAQ.
-#### RAPIDS Accelerator's Support Policy for Apache Spark
+### RAPIDS Accelerator's Support Policy for Apache Spark
The RAPIDS Accelerator maintains support for Apache Spark versions available for download from [Apache Spark](https://spark.apache.org/downloads.html)
-### Download v23.10.0
-* Download the [RAPIDS
- Accelerator for Apache Spark 23.10.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.10.0/rapids-4-spark_2.12-23.10.0.jar)
+### Download RAPIDS Accelerator for Apache Spark v23.12.0
+- **Scala 2.12:**
+ - [RAPIDS Accelerator for Apache Spark 23.12.0 - Scala 2.12 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.0/rapids-4-spark_2.12-23.12.0.jar)
+ - [RAPIDS Accelerator for Apache Spark 23.12.0 - Scala 2.12 jar.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.0/rapids-4-spark_2.12-23.12.0.jar.asc)
+
+- **Scala 2.13:**
+ - [RAPIDS Accelerator for Apache Spark 23.12.0 - Scala 2.13 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/23.12.0/rapids-4-spark_2.13-23.12.0.jar)
+ - [RAPIDS Accelerator for Apache Spark 23.12.0 - Scala 2.13 jar.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.13/23.12.0/rapids-4-spark_2.13-23.12.0.jar.asc)
This package is built against CUDA 11.8. It is tested on V100, T4, A10, A100, L4 and H100 GPUs with
CUDA 11.8 through CUDA 12.0.
### Verify signature
-* Download the [RAPIDS Accelerator for Apache Spark 23.10.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.10.0/rapids-4-spark_2.12-23.10.0.jar)
- and [RAPIDS Accelerator for Apache Spark 23.10.0 jars.asc](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.10.0/rapids-4-spark_2.12-23.10.0.jar.asc)
* Download the [PUB_KEY](https://keys.openpgp.org/search?q=sw-spark@nvidia.com).
* Import the public key: `gpg --import PUB_KEY`
-* Verify the signature: `gpg --verify rapids-4-spark_2.12-23.10.0.jar.asc rapids-4-spark_2.12-23.10.0.jar`
+* Verify the signature for Scala 2.12 jar:
+ `gpg --verify rapids-4-spark_2.12-23.12.0.jar.asc rapids-4-spark_2.12-23.12.0.jar`
+* Verify the signature for Scala 2.13 jar:
+ `gpg --verify rapids-4-spark_2.13-23.12.0.jar.asc rapids-4-spark_2.13-23.12.0.jar`
The output of signature verify:
@@ -83,17 +91,16 @@ The output of signature verify:
### Release Notes
New functionality and performance improvements for this release include:
-* Introduced support for Spark 3.5.0.
-* Improved memory management for better control in YARN and K8s on CSP.
-* Strengthened Parquet and ORC tests for enhanced stability and support.
-* Reduce GPU out-of-memory (OOM) occurrences.
-* Enhanced driver log with actionable insights.
+* Introduced support for chunked reading of ORC files.
+* Enhanced support for additional time zones and added stack function support.
+* Enhanced performance for join and aggregation operations.
+* Kernel optimizations have been implemented to improve Parquet read performance.
+* RAPIDS Accelerator also built and tested with Scala 2.13.
+* Last version to support Pascal-based Nvidia GPUs; discontinued in the next release.
* Qualification and Profiling tool:
- * Enhanced user experience with the availability of the 'ascli' tool for qualification and
- profiling across all platforms.
- * The qualification tool now accommodates CPU-fallback transitions and broadens the speedup factor coverage.
- * Extended diagnostic support for user tools to cover EMR, Databricks AWS, and Databricks Azure.
- * Introduced support for cluster configuration recommendations in the profiling tool for supported platforms.
+ * Profiling Tool now processes Spark Driver log for GPU runs, enhancing feature analysis.
+ * Auto-tuner recommendations include AQE settings for optimized performance.
+ * New configurations in Profiler for enabling off-default features: udfCompiler, incompatibleDateFormats, hasExtendedYearValues.
For a detailed list of changes, please refer to the
[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md).
diff --git a/docs/supported_ops.md b/docs/supported_ops.md
index 48949ab00ef..414a53c56ac 100644
--- a/docs/supported_ops.md
+++ b/docs/supported_ops.md
@@ -1894,7 +1894,7 @@ are limited.
S |
S |
PS UTC is only supported TZ for TIMESTAMP |
-NS |
+S |
NS |
NS |
NS |
@@ -1915,7 +1915,7 @@ are limited.
S |
S |
PS UTC is only supported TZ for TIMESTAMP |
-NS |
+S |
NS |
NS |
NS |
@@ -3333,7 +3333,7 @@ are limited.
S |
S |
PS UTC is only supported TZ for TIMESTAMP |
-NS |
+S |
NS |
NS |
NS |
@@ -6061,7 +6061,7 @@ are limited.
NS |
S |
PS UTC is only supported TZ for TIMESTAMP |
-NS |
+S |
NS |
NS |
NS |
@@ -6082,7 +6082,7 @@ are limited.
NS |
S |
PS UTC is only supported TZ for TIMESTAMP |
-NS |
+S |
NS |
NS |
NS |
@@ -7162,7 +7162,7 @@ are limited.
NS |
S |
PS UTC is only supported TZ for TIMESTAMP |
-NS |
+S |
NS |
NS |
NS |
@@ -7183,7 +7183,7 @@ are limited.
NS |
S |
PS UTC is only supported TZ for TIMESTAMP |
-NS |
+S |
NS |
NS |
NS |
@@ -7294,7 +7294,7 @@ are limited.
NS |
S |
PS UTC is only supported TZ for TIMESTAMP |
-NS |
+S |
NS |
NS |
NS |
@@ -7315,7 +7315,7 @@ are limited.
NS |
S |
PS UTC is only supported TZ for TIMESTAMP |
-NS |
+S |
NS |
NS |
NS |
@@ -8141,8 +8141,8 @@ are limited.
|
|
NS |
-PS MAP only supports keys and values that are of STRING type; unsupported child types BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT |
-NS |
+PS MAP only supports keys and values that are of STRING type; UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, MAP, UDT |
+PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, MAP, UDT |
|
@@ -8826,7 +8826,7 @@ are limited.
NS |
S |
PS UTC is only supported TZ for TIMESTAMP |
-NS |
+S |
NS |
NS |
NS |
@@ -8847,7 +8847,7 @@ are limited.
NS |
S |
PS UTC is only supported TZ for TIMESTAMP |
-NS |
+S |
NS |
NS |
NS |
@@ -8958,7 +8958,7 @@ are limited.
NS |
S |
PS UTC is only supported TZ for TIMESTAMP |
-NS |
+S |
NS |
NS |
NS |
@@ -8979,7 +8979,7 @@ are limited.
NS |
S |
PS UTC is only supported TZ for TIMESTAMP |
-NS |
+S |
NS |
NS |
NS |
@@ -9142,7 +9142,7 @@ are limited.
S |
S |
PS UTC is only supported TZ for TIMESTAMP |
-NS |
+S |
NS |
NS |
NS |
@@ -13351,22 +13351,22 @@ are limited.
|
-StartsWith |
- |
-Starts with |
+Stack |
+`stack` |
+Separates expr1, ..., exprk into n rows. |
None |
project |
-src |
+n |
|
|
|
+PS Literal value only |
|
|
|
|
|
|
-S |
|
|
|
@@ -13377,29 +13377,28 @@ are limited.
|
-search |
- |
- |
- |
- |
- |
- |
- |
- |
- |
-PS Literal value only |
- |
- |
- |
- |
- |
- |
- |
- |
+expr |
+S |
+S |
+S |
+S |
+S |
+S |
+S |
+S |
+PS UTC is only supported TZ for TIMESTAMP |
+S |
+S |
+S |
+NS |
+NS |
+PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
+PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
+PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
+NS |
result |
-S |
|
|
|
@@ -13414,6 +13413,7 @@ are limited.
|
|
|
+PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT |
|
|
|
@@ -13445,6 +13445,74 @@ are limited.
UDT |
+StartsWith |
+ |
+Starts with |
+None |
+project |
+src |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+S |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+
+
+search |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+PS Literal value only |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+
+
+result |
+S |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+
+
StringInstr |
`instr` |
Instr string operator |
@@ -14460,6 +14528,79 @@ are limited.
|
+StructsToJson |
+`to_json` |
+Converts structs to JSON text format |
+This is disabled by default because to_json support is experimental. See compatibility guide for more information. |
+project |
+struct |
+S |
+S |
+S |
+S |
+S |
+S |
+S |
+S |
+PS UTC is only supported TZ for TIMESTAMP |
+S |
+S |
+ |
+ |
+ |
+PS UTC is only supported TZ for child TIMESTAMP |
+PS UTC is only supported TZ for child TIMESTAMP |
+PS UTC is only supported TZ for child TIMESTAMP |
+ |
+
+
+result |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+S |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+
+
+Expression |
+SQL Functions(s) |
+Description |
+Notes |
+Context |
+Param/Output |
+BOOLEAN |
+BYTE |
+SHORT |
+INT |
+LONG |
+FLOAT |
+DOUBLE |
+DATE |
+TIMESTAMP |
+STRING |
+DECIMAL |
+NULL |
+BINARY |
+CALENDAR |
+ARRAY |
+MAP |
+STRUCT |
+UDT |
+
+
Substring |
`substr`, `substring` |
Substring operator |
@@ -14549,32 +14690,6 @@ are limited.
|
-Expression |
-SQL Functions(s) |
-Description |
-Notes |
-Context |
-Param/Output |
-BOOLEAN |
-BYTE |
-SHORT |
-INT |
-LONG |
-FLOAT |
-DOUBLE |
-DATE |
-TIMESTAMP |
-STRING |
-DECIMAL |
-NULL |
-BINARY |
-CALENDAR |
-ARRAY |
-MAP |
-STRUCT |
-UDT |
-
-
SubstringIndex |
`substring_index` |
substring_index operator |
@@ -14886,26 +15001,52 @@ are limited.
|
-Tanh |
-`tanh` |
-Hyperbolic tangent |
-None |
-project |
-input |
- |
- |
- |
- |
- |
- |
-S |
- |
- |
- |
- |
- |
- |
- |
+Expression |
+SQL Functions(s) |
+Description |
+Notes |
+Context |
+Param/Output |
+BOOLEAN |
+BYTE |
+SHORT |
+INT |
+LONG |
+FLOAT |
+DOUBLE |
+DATE |
+TIMESTAMP |
+STRING |
+DECIMAL |
+NULL |
+BINARY |
+CALENDAR |
+ARRAY |
+MAP |
+STRUCT |
+UDT |
+
+
+Tanh |
+`tanh` |
+Hyperbolic tangent |
+None |
+project |
+input |
+ |
+ |
+ |
+ |
+ |
+ |
+S |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
|
|
@@ -14976,32 +15117,6 @@ are limited.
|
-Expression |
-SQL Functions(s) |
-Description |
-Notes |
-Context |
-Param/Output |
-BOOLEAN |
-BYTE |
-SHORT |
-INT |
-LONG |
-FLOAT |
-DOUBLE |
-DATE |
-TIMESTAMP |
-STRING |
-DECIMAL |
-NULL |
-BINARY |
-CALENDAR |
-ARRAY |
-MAP |
-STRUCT |
-UDT |
-
-
TimeAdd |
|
Adds interval to timestamp |
@@ -15300,6 +15415,32 @@ are limited.
|
+Expression |
+SQL Functions(s) |
+Description |
+Notes |
+Context |
+Param/Output |
+BOOLEAN |
+BYTE |
+SHORT |
+INT |
+LONG |
+FLOAT |
+DOUBLE |
+DATE |
+TIMESTAMP |
+STRING |
+DECIMAL |
+NULL |
+BINARY |
+CALENDAR |
+ARRAY |
+MAP |
+STRUCT |
+UDT |
+
+
TransformValues |
`transform_values` |
Transform values in a map using a transform function |
@@ -15368,32 +15509,6 @@ are limited.
|
-Expression |
-SQL Functions(s) |
-Description |
-Notes |
-Context |
-Param/Output |
-BOOLEAN |
-BYTE |
-SHORT |
-INT |
-LONG |
-FLOAT |
-DOUBLE |
-DATE |
-TIMESTAMP |
-STRING |
-DECIMAL |
-NULL |
-BINARY |
-CALENDAR |
-ARRAY |
-MAP |
-STRUCT |
-UDT |
-
-
UnaryMinus |
`negative` |
Negate a numeric value |
@@ -15694,6 +15809,32 @@ are limited.
|
+Expression |
+SQL Functions(s) |
+Description |
+Notes |
+Context |
+Param/Output |
+BOOLEAN |
+BYTE |
+SHORT |
+INT |
+LONG |
+FLOAT |
+DOUBLE |
+DATE |
+TIMESTAMP |
+STRING |
+DECIMAL |
+NULL |
+BINARY |
+CALENDAR |
+ARRAY |
+MAP |
+STRUCT |
+UDT |
+
+
UnscaledValue |
|
Convert a Decimal to an unscaled long value for some aggregation optimizations |
@@ -15741,32 +15882,6 @@ are limited.
|
-Expression |
-SQL Functions(s) |
-Description |
-Notes |
-Context |
-Param/Output |
-BOOLEAN |
-BYTE |
-SHORT |
-INT |
-LONG |
-FLOAT |
-DOUBLE |
-DATE |
-TIMESTAMP |
-STRING |
-DECIMAL |
-NULL |
-BINARY |
-CALENDAR |
-ARRAY |
-MAP |
-STRUCT |
-UDT |
-
-
Upper |
`upper`, `ucase` |
String uppercase operator |
@@ -16091,6 +16206,32 @@ are limited.
|
+Expression |
+SQL Functions(s) |
+Description |
+Notes |
+Context |
+Param/Output |
+BOOLEAN |
+BYTE |
+SHORT |
+INT |
+LONG |
+FLOAT |
+DOUBLE |
+DATE |
+TIMESTAMP |
+STRING |
+DECIMAL |
+NULL |
+BINARY |
+CALENDAR |
+ARRAY |
+MAP |
+STRUCT |
+UDT |
+
+
AggregateExpression |
|
Aggregate expression |
@@ -16287,32 +16428,6 @@ are limited.
S |
-Expression |
-SQL Functions(s) |
-Description |
-Notes |
-Context |
-Param/Output |
-BOOLEAN |
-BYTE |
-SHORT |
-INT |
-LONG |
-FLOAT |
-DOUBLE |
-DATE |
-TIMESTAMP |
-STRING |
-DECIMAL |
-NULL |
-BINARY |
-CALENDAR |
-ARRAY |
-MAP |
-STRUCT |
-UDT |
-
-
ApproximatePercentile |
`percentile_approx`, `approx_percentile` |
Approximate percentile |
@@ -16487,6 +16602,32 @@ are limited.
|
+Expression |
+SQL Functions(s) |
+Description |
+Notes |
+Context |
+Param/Output |
+BOOLEAN |
+BYTE |
+SHORT |
+INT |
+LONG |
+FLOAT |
+DOUBLE |
+DATE |
+TIMESTAMP |
+STRING |
+DECIMAL |
+NULL |
+BINARY |
+CALENDAR |
+ARRAY |
+MAP |
+STRUCT |
+UDT |
+
+
Average |
`avg`, `mean` |
Average aggregate operator |
@@ -16753,32 +16894,6 @@ are limited.
|
-Expression |
-SQL Functions(s) |
-Description |
-Notes |
-Context |
-Param/Output |
-BOOLEAN |
-BYTE |
-SHORT |
-INT |
-LONG |
-FLOAT |
-DOUBLE |
-DATE |
-TIMESTAMP |
-STRING |
-DECIMAL |
-NULL |
-BINARY |
-CALENDAR |
-ARRAY |
-MAP |
-STRUCT |
-UDT |
-
-
CollectSet |
`collect_set` |
Collect a set of unique elements, not supported in reduction |
@@ -16912,6 +17027,32 @@ are limited.
|
+Expression |
+SQL Functions(s) |
+Description |
+Notes |
+Context |
+Param/Output |
+BOOLEAN |
+BYTE |
+SHORT |
+INT |
+LONG |
+FLOAT |
+DOUBLE |
+DATE |
+TIMESTAMP |
+STRING |
+DECIMAL |
+NULL |
+BINARY |
+CALENDAR |
+ARRAY |
+MAP |
+STRUCT |
+UDT |
+
+
Count |
`count` |
Count aggregate operator |
@@ -17178,32 +17319,6 @@ are limited.
NS |
-Expression |
-SQL Functions(s) |
-Description |
-Notes |
-Context |
-Param/Output |
-BOOLEAN |
-BYTE |
-SHORT |
-INT |
-LONG |
-FLOAT |
-DOUBLE |
-DATE |
-TIMESTAMP |
-STRING |
-DECIMAL |
-NULL |
-BINARY |
-CALENDAR |
-ARRAY |
-MAP |
-STRUCT |
-UDT |
-
-
Last |
`last`, `last_value` |
last aggregate operator |
@@ -17337,6 +17452,32 @@ are limited.
NS |
+Expression |
+SQL Functions(s) |
+Description |
+Notes |
+Context |
+Param/Output |
+BOOLEAN |
+BYTE |
+SHORT |
+INT |
+LONG |
+FLOAT |
+DOUBLE |
+DATE |
+TIMESTAMP |
+STRING |
+DECIMAL |
+NULL |
+BINARY |
+CALENDAR |
+ARRAY |
+MAP |
+STRUCT |
+UDT |
+
+
Max |
`max` |
Max aggregate operator |
@@ -17603,6 +17744,180 @@ are limited.
NS |
+Percentile |
+`percentile` |
+Aggregation computing exact percentile |
+None |
+aggregation |
+input |
+ |
+S |
+S |
+S |
+S |
+S |
+S |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+
+
+percentage |
+ |
+ |
+ |
+ |
+ |
+ |
+PS Literal value only |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+S |
+ |
+ |
+ |
+
+
+frequency |
+ |
+ |
+ |
+ |
+S |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+S |
+ |
+ |
+ |
+
+
+result |
+ |
+ |
+ |
+ |
+ |
+ |
+S |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+S |
+ |
+ |
+ |
+
+
+reduction |
+input |
+ |
+S |
+S |
+S |
+S |
+S |
+S |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+
+
+percentage |
+ |
+ |
+ |
+ |
+ |
+ |
+PS Literal value only |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+S |
+ |
+ |
+ |
+
+
+frequency |
+ |
+ |
+ |
+ |
+S |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+S |
+ |
+ |
+ |
+
+
+result |
+ |
+ |
+ |
+ |
+ |
+ |
+S |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+S |
+ |
+ |
+ |
+
+
Expression |
SQL Functions(s) |
Description |
diff --git a/index.md b/index.md
index 0334ecc5002..724e6b79a82 100644
--- a/index.md
+++ b/index.md
@@ -6,6 +6,9 @@ permalink: /
description: This site serves as a collection of documentation about the RAPIDS accelerator for Apache Spark
---
# Overview
+**If you are a customer looking for information on how to adopt RAPIDS Accelerator for Apache Spark
+for your Spark workloads, please go to our User Guide for more information: [link](https://docs.nvidia.com/spark-rapids/user-guide/latest/index.html).**
+
The RAPIDS Accelerator for Apache Spark leverages GPUs to accelerate processing via the
[RAPIDS libraries](http://rapids.ai).
@@ -19,5 +22,3 @@ the scale of the Spark distributed computing framework. The RAPIDS Accelerator
built-in accelerated shuffle based on [UCX](https://github.com/openucx/ucx/) that can be configured to leverage GPU-to-GPU
communication and RDMA capabilities.
-If you are a customer looking for information on how to adopt RAPIDS Accelerator for Apache Spark
-for your Spark workloads, please go to our User Guide for more information: [link](https://docs.nvidia.com/spark-rapids/user-guide/latest/index.html).