From 36140010e91029f983dae991cf89bfeaad5c7a3d Mon Sep 17 00:00:00 2001 From: Rodney Howeedy Date: Wed, 20 Oct 2021 13:10:06 -0600 Subject: [PATCH 1/9] Update FAQ for gh-pages 21.10 --- docs/FAQ.md | 65 ++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 52 insertions(+), 13 deletions(-) diff --git a/docs/FAQ.md b/docs/FAQ.md index 2b6542efc0c..3875a94b0cb 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -1,7 +1,7 @@ --- layout: page title: Frequently Asked Questions -nav_order: 11 +nav_order: 12 --- # Frequently Asked Questions @@ -10,9 +10,9 @@ nav_order: 11 ### What versions of Apache Spark does the RAPIDS Accelerator for Apache Spark support? -The RAPIDS Accelerator for Apache Spark requires version 3.0.1, 3.0.2, 3.1.1 or 3.1.2 of Apache -Spark. Because the plugin replaces parts of the physical plan that Apache Spark considers to be -internal the code for those plans can change even between bug fix releases. As a part of our +The RAPIDS Accelerator for Apache Spark requires version 3.0.1, 3.0.2, 3.0.3, 3.1.1, 3.1.2 or 3.2.0 of +Apache Spark. Because the plugin replaces parts of the physical plan that Apache Spark considers to +be internal the code for those plans can change even between bug fix releases. As a part of our process, we try to stay on top of these changes and release updates as quickly as possible. ### Which distributions are supported? @@ -30,15 +30,15 @@ to set up testing and validation on their distributions. ### What CUDA versions are supported? -CUDA 11.0 and 11.2 are currently supported. Please look [here](download.md) for download links for -the latest release. +CUDA 11.x is currently supported. Please look [here](download.md) for download links for the latest +release. ### What hardware is supported? The plugin is tested and supported on V100, T4, A10, A30 and A100 datacenter GPUs. It is possible to run the plugin on GeForce desktop hardware with Volta or better architectures. GeForce hardware -does not support [CUDA enhanced -compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html#enhanced-compat-minor-releases), +does not support [CUDA forward +compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html#forward-compatibility-title), and will need CUDA 11.2 installed. If not, the following error will be displayed: ``` @@ -47,6 +47,9 @@ ai.rapids.cudf.CudaException: forward compatibility was attempted on non support at com.nvidia.spark.rapids.GpuDeviceManager$.findGpuAndAcquire(GpuDeviceManager.scala:78) ``` +More information about cards that support forward compatibility can be found +[here](https://docs.nvidia.com/deploy/cuda-compatibility/index.html#faq). + ### How can I check if the RAPIDS Accelerator is installed and which version is running? On startup the RAPIDS Accelerator will log a warning message on the Spark driver showing the @@ -120,6 +123,17 @@ starts. If you are only going to run a single query that only takes a few second be problematic. In general if you are going to do 30 seconds or more of processing within a single session the overhead can be amortized. +### How long does it take to translate a query to run on the GPU? + +The time it takes to translate the Apache Spark physical plan to one that can run on the GPU +is proportional to the size of the plan. But, it also depends on the CPU you are +running on and if the JVM has optimized that code path yet. The first queries run in a client will +be worse than later queries. Small queries can typically be translated in a millisecond or two while +larger queries can take tens of milliseconds. In all cases tested the translation time is orders of +magnitude smaller than the total runtime of the query. + +See the entry on [explain](#explain) for details on how to measure this for your queries. + ### How can I tell what will run on the GPU and what will not run on it? @@ -166,10 +180,27 @@ In this `indicator` is one of the following * will not run on the GPU with an explanation why * will be removed from the plan with a reason why -Generally if an operator is not compatible with Spark for some reason and is off the explanation +Generally if an operator is not compatible with Spark for some reason and is off, the explanation will include information about how it is incompatible and what configs to set to enable the operator if you can accept the incompatibility. +These messages are logged at the WARN level so even in `spark-shell` which by default only logs +at WARN or above you should see these messages. + +This translation takes place in two steps. The first step looks at the plan, figures out what +can be translated to the GPU, and then does the translation. The second step optimizes the +transitions between the CPU and the GPU. +Explain will also log how long these translations took at the INFO level with lines like. + +``` +INFO GpuOverrides: Plan conversion to the GPU took 3.13 ms +INFO GpuOverrides: GPU plan transition optimization took 1.66 ms +``` + +Because it is at the INFO level, the default logging level for `spark-shell` is not going to display +this information. If you want to monitor this number for your queries you might need to adjust your +logging configuration. + ### Why does the plan for the GPU query look different from the CPU query? Typically, there is a one to one mapping between CPU stages in a plan and GPU stages. There are a @@ -228,12 +259,18 @@ efficient to stay on the CPU instead of going back and forth. Yes, DPP still works. It might not be as efficient as it could be, and we are working to improve it. +DPP is not supported on Databricks with the plugin. +Queries on Databricks will not fail but it can not benefit from DPP. + ### Is Adaptive Query Execution (AQE) Supported? In the 0.2 release, AQE is supported but all exchanges will default to the CPU. As of the 0.3 release, running on Spark 3.0.1 and higher any operation that is supported on GPU will now stay on the GPU when AQE is enabled. +AQE is not supported on Databricks with the plugin. +If AQE is enabled on Databricks, queries may fail with `StackOverflowError` error. + #### Why does my query show as not on the GPU when Adaptive Query Execution is enabled? When running an `explain()` on a query where AQE is on, it is possible that AQE has not finalized @@ -250,13 +287,15 @@ AdaptiveSparkPlan isFinalPlan=false ### Are cache and persist supported? -Yes cache and persist are supported, but they are not GPU accelerated yet. We are working with -the Spark community on changes that would allow us to accelerate compression when caching data. +Yes cache and persist are supported, the cache is GPU accelerated +but still stored on the host memory. +Please refer to [RAPIDS Cache Serializer](./additional-functionality/cache-serializer.md) +for more details. ### Can I cache data into GPU memory? -No, that is not currently supported. It would require much larger changes to Apache Spark to be able -to support this. +No, that is not currently supported. +It would require much larger changes to Apache Spark to be able to support this. ### Is PySpark supported? From df1172580809a7c51641dd47df4fb5341f37bb00 Mon Sep 17 00:00:00 2001 From: Rodney Howeedy Date: Thu, 21 Oct 2021 14:50:05 -0600 Subject: [PATCH 2/9] Update gh-pages branch 21.10 --- docs/additional-functionality/README.md | 2 +- .../cache-serializer.md | 12 ++-- .../rapids-shuffle.md | 51 ++++++++------ docs/additional-functionality/rapids-udfs.md | 20 +++--- .../Dockerfile.centos_no_rdma | 8 +-- .../Dockerfile.centos_rdma | 12 ++-- .../Dockerfile.ubuntu_no_rdma | 4 +- .../Dockerfile.ubuntu_rdma | 4 +- .../udf-to-catalyst-expressions.md | 2 + docs/compatibility.md | 15 ++++- docs/configs.md | 66 ++++++++++--------- .../generate-init-script-cuda11.ipynb | 2 +- .../Databricks/generate-init-script.ipynb | 2 +- docs/dev/README.md | 2 +- docs/download.md | 58 ++++++++++++++++ docs/examples.md | 2 +- docs/get-started/Dockerfile.cuda | 31 ++++----- .../get-started/getting-started-databricks.md | 34 +++++++++- docs/get-started/getting-started-on-prem.md | 8 +-- 19 files changed, 225 insertions(+), 110 deletions(-) diff --git a/docs/additional-functionality/README.md b/docs/additional-functionality/README.md index d0f0ab31f65..cf4f014122b 100644 --- a/docs/additional-functionality/README.md +++ b/docs/additional-functionality/README.md @@ -1,7 +1,7 @@ --- layout: page title: Additional Functionality -nav_order: 9 +nav_order: 10 has_children: true permalink: /additional-functionality/ --- diff --git a/docs/additional-functionality/cache-serializer.md b/docs/additional-functionality/cache-serializer.md index bf8d8655e38..e45fd387013 100644 --- a/docs/additional-functionality/cache-serializer.md +++ b/docs/additional-functionality/cache-serializer.md @@ -29,21 +29,17 @@ nav_order: 2 `spark.sql.inMemoryColumnarStorage.enableVectorizedReader` will not be honored as the GPU data is always read in as columnar. If `spark.rapids.sql.enabled` is set to false the cached objects will still be compressed on the CPU as a part of the caching process. - - Please note that ParquetCachedBatchSerializer doesn't support negative decimal scale, so if - `spark.sql.legacy.allowNegativeScaleOfDecimal` is set to true ParquetCachedBatchSerializer - should not be used. Using the serializer with negative decimal scales will generate - an error at runtime. To use this serializer please run Spark with the following conf. ``` - spark-shell --conf spark.sql.cache.serializer=com.nvidia.spark.rapids.shims.spark311.ParquetCachedBatchSerializer" + spark-shell --conf spark-shell --conf spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer ``` ## Supported Types - All types are supported on the CPU, on the GPU, ArrayType, MapType and BinaryType are not - supported. If an unsupported type is encountered the Rapids Accelerator for Apache Spark will fall + All types are supported on the CPU. + On the GPU, MapType and BinaryType are not supported. + If an unsupported type is encountered the Rapids Accelerator for Apache Spark will fall back to using the CPU for caching. diff --git a/docs/additional-functionality/rapids-shuffle.md b/docs/additional-functionality/rapids-shuffle.md index 47b562264df..a96e38651a3 100644 --- a/docs/additional-functionality/rapids-shuffle.md +++ b/docs/additional-functionality/rapids-shuffle.md @@ -28,6 +28,13 @@ in these scenarios: - GPU-to-GPU: Shuffle blocks that were able to fit in GPU memory. - Host-to-GPU and Disk-to-GPU: Shuffle blocks that spilled to host (or disk) but will be manifested in the GPU in the downstream Spark task. + +The RAPIDS Shuffle Manager uses the `spark.shuffle.manager` plugin interface in Spark and it relies +on fast connections between executors, where shuffle data is kept in a cache backed by GPU, host, or disk. +As such, it doesn't implement functionality to interact with the External Shuffle Service (ESS). +To enable the RAPIDS Shuffle Manager, users need to disable ESS using `spark.shuffle.service.enabled=false`. +Note that Spark's Dynamic Allocation feature requires ESS to be configured, and must also be +disabled with `spark.dynamicAllocation.enabled=false`. ### System Setup @@ -36,7 +43,7 @@ be installed on the host and inside Docker containers (if not baremetal). A host requirements, like the MLNX_OFED driver and `nv_peer_mem` kernel module. The minimum UCX requirement for the RAPIDS Shuffle Manager is -[UCX 1.11.0](https://github.com/openucx/ucx/releases/tag/v1.11.0). +[UCX 1.11.2](https://github.com/openucx/ucx/releases/tag/v1.11.2). #### Baremetal @@ -66,9 +73,9 @@ The minimum UCX requirement for the RAPIDS Shuffle Manager is further. 2. Fetch and install the UCX package for your OS from: - [UCX 1.11.0](https://github.com/openucx/ucx/releases/tag/v1.11.0). + [UCX 1.11.2](https://github.com/openucx/ucx/releases/tag/v1.11.2). - NOTE: Please install the artifact with the newest CUDA 11.x version (for UCX 1.11.0 please + NOTE: Please install the artifact with the newest CUDA 11.x version (for UCX 1.11.2 please pick CUDA 11.2) as CUDA 11 introduced [CUDA Enhanced Compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html#enhanced-compat-minor-releases). Starting with UCX 1.12, UCX will stop publishing individual artifacts for each minor version of CUDA. @@ -78,35 +85,35 @@ The minimum UCX requirement for the RAPIDS Shuffle Manager is RDMA packages have extra requirements that should be satisfied by MLNX_OFED. ##### CentOS UCX RPM -The UCX packages for CentOS 7 and 8 are divided into different RPMs. For example, UCX 1.11.0 +The UCX packages for CentOS 7 and 8 are divided into different RPMs. For example, UCX 1.11.2 available at -https://github.com/openucx/ucx/releases/download/v1.11.0/ucx-v1.11.0-centos7-mofed5.x-cuda11.2.tar.bz2 +https://github.com/openucx/ucx/releases/download/v1.11.2/ucx-v1.11.2-centos7-mofed5.x-cuda11.2.tar.bz2 contains: ``` -ucx-devel-1.11.0-1.el7.x86_64.rpm -ucx-debuginfo-1.11.0-1.el7.x86_64.rpm -ucx-1.11.0-1.el7.x86_64.rpm -ucx-cuda-1.11.0-1.el7.x86_64.rpm -ucx-rdmacm-1.11.0-1.el7.x86_64.rpm -ucx-cma-1.11.0-1.el7.x86_64.rpm -ucx-ib-1.11.0-1.el7.x86_64.rpm +ucx-devel-1.11.2-1.el7.x86_64.rpm +ucx-debuginfo-1.11.2-1.el7.x86_64.rpm +ucx-1.11.2-1.el7.x86_64.rpm +ucx-cuda-1.11.2-1.el7.x86_64.rpm +ucx-rdmacm-1.11.2-1.el7.x86_64.rpm +ucx-cma-1.11.2-1.el7.x86_64.rpm +ucx-ib-1.11.2-1.el7.x86_64.rpm ``` For a setup without RoCE or Infiniband networking, the only packages required are: ``` -ucx-1.11.0-1.el7.x86_64.rpm -ucx-cuda-1.11.0-1.el7.x86_64.rpm +ucx-1.11.2-1.el7.x86_64.rpm +ucx-cuda-1.11.2-1.el7.x86_64.rpm ``` If accelerated networking is available, the package list is: ``` -ucx-1.11.0-1.el7.x86_64.rpm -ucx-cuda-1.11.0-1.el7.x86_64.rpm -ucx-rdmacm-1.11.0-1.el7.x86_64.rpm -ucx-ib-1.11.0-1.el7.x86_64.rpm +ucx-1.11.2-1.el7.x86_64.rpm +ucx-cuda-1.11.2-1.el7.x86_64.rpm +ucx-rdmacm-1.11.2-1.el7.x86_64.rpm +ucx-ib-1.11.2-1.el7.x86_64.rpm ``` --- @@ -145,7 +152,7 @@ system if you have RDMA capable hardware. Within the Docker container we need to install UCX and its requirements. These are Dockerfile examples for Ubuntu 18.04: -The following are examples of Docker containers with UCX 1.11.0 and cuda-11.2 support. +The following are examples of Docker containers with UCX 1.11.2 and cuda-11.2 support. | OS Type | RDMA | Dockerfile | | ------- | ---- | ---------- | @@ -281,7 +288,6 @@ In this section, we are using a docker container built using the sample dockerfi | Spark Shim | spark.shuffle.manager value | | -----------| -------------------------------------------------------- | | 3.0.1 | com.nvidia.spark.rapids.spark301.RapidsShuffleManager | - | 3.0.1 EMR | com.nvidia.spark.rapids.spark301emr.RapidsShuffleManager | | 3.0.2 | com.nvidia.spark.rapids.spark302.RapidsShuffleManager | | 3.0.3 | com.nvidia.spark.rapids.spark303.RapidsShuffleManager | | 3.0.4 | com.nvidia.spark.rapids.spark304.RapidsShuffleManager | @@ -290,7 +296,7 @@ In this section, we are using a docker container built using the sample dockerfi | 3.1.2 | com.nvidia.spark.rapids.spark312.RapidsShuffleManager | | 3.1.3 | com.nvidia.spark.rapids.spark313.RapidsShuffleManager | -2. Settings for UCX 1.11.0+: +2. Settings for UCX 1.11.2+: Minimum configuration: @@ -326,6 +332,9 @@ Apache Spark 3.1.3 is: `com.nvidia.spark.rapids.spark313.RapidsShuffleManager`. Please note `LD_LIBRARY_PATH` should optionally be set if the UCX library is installed in a non-standard location. +With the RAPIDS Shuffle Manager configured, the setting `spark.rapids.shuffle.enabled` (default on) +can be used to enable or disable the usage of RAPIDS Shuffle Manager during your application. + #### UCX Environment Variables - `UCX_TLS`: - `cuda_copy`, and `cuda_ipc`: enables handling of CUDA memory in UCX, both for copy-based transport diff --git a/docs/additional-functionality/rapids-udfs.md b/docs/additional-functionality/rapids-udfs.md index f36186057b2..0a26e2ce5ec 100644 --- a/docs/additional-functionality/rapids-udfs.md +++ b/docs/additional-functionality/rapids-udfs.md @@ -139,38 +139,38 @@ in the [udf-examples](../../udf-examples) project. ### Spark Scala UDF Examples -- [URLDecode](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLDecode.scala) +- [URLDecode](../../udf-examples/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLDecode.scala) decodes URL-encoded strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable) -- [URLEncode](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLEncode.scala) +- [URLEncode](../../udf-examples/src/main/scala/com/nvidia/spark/rapids/udf/scala/URLEncode.scala) URL-encodes strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable) ### Spark Java UDF Examples -- [URLDecode](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/URLDecode.java) +- [URLDecode](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/URLDecode.java) decodes URL-encoded strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable) -- [URLEncode](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/URLEncode.java) +- [URLEncode](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/URLEncode.java) URL-encodes strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable) -- [CosineSimilarity](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/CosineSimilarity.java) +- [CosineSimilarity](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/java/CosineSimilarity.java) computes the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) -between two float vectors using [native code](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/cpp/src) +between two float vectors using [native code](../../udf-examples/src/main/cpp/src) ### Hive UDF Examples -- [URLDecode](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/URLDecode.java) +- [URLDecode](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/URLDecode.java) implements a Hive simple UDF using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable) to decode URL-encoded strings -- [URLEncode](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/URLEncode.java) +- [URLEncode](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/URLEncode.java) implements a Hive generic UDF using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable) to URL-encode strings -- [StringWordCount](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java) +- [StringWordCount](../../udf-examples/src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java) implements a Hive simple UDF using -[native code](https://github.com/NVIDIA/spark-rapids/tree/main/udf-examples/src/main/cpp/src) to count words in strings +[native code](../../udf-examples/src/main/cpp/src) to count words in strings ## GPU Support for Pandas UDF diff --git a/docs/additional-functionality/shuffle-docker-examples/Dockerfile.centos_no_rdma b/docs/additional-functionality/shuffle-docker-examples/Dockerfile.centos_no_rdma index ad21f2bcc00..dca75b835f3 100644 --- a/docs/additional-functionality/shuffle-docker-examples/Dockerfile.centos_no_rdma +++ b/docs/additional-functionality/shuffle-docker-examples/Dockerfile.centos_no_rdma @@ -22,7 +22,7 @@ # See: https://github.com/openucx/ucx/releases/ ARG CUDA_VER=11.2.2 -ARG UCX_VER=v1.11.0 +ARG UCX_VER=1.11.2 ARG UCX_CUDA_VER=11.2 FROM nvidia/cuda:${CUDA_VER}-runtime-centos7 @@ -30,8 +30,8 @@ ARG UCX_VER ARG UCX_CUDA_VER RUN yum update -y && yum install -y wget bzip2 -RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/$UCX_VER/ucx-$UCX_VER-centos7-mofed5.x-cuda$UCX_CUDA_VER.tar.bz2 +RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-v$UCX_VER-centos7-mofed5.x-cuda$UCX_CUDA_VER.tar.bz2 RUN cd /tmp && tar -xvf *.bz2 && \ - yum install -y ucx-1.11.0-1.el7.x86_64.rpm && \ - yum install -y ucx-cuda-1.11.0-1.el7.x86_64.rpm && \ + yum install -y ucx-$UCX_VER-1.el7.x86_64.rpm && \ + yum install -y ucx-cuda-$UCX_VER-1.el7.x86_64.rpm && \ rm -rf /tmp/*.rpm diff --git a/docs/additional-functionality/shuffle-docker-examples/Dockerfile.centos_rdma b/docs/additional-functionality/shuffle-docker-examples/Dockerfile.centos_rdma index 733b2e872f6..e59d3ca5f68 100644 --- a/docs/additional-functionality/shuffle-docker-examples/Dockerfile.centos_rdma +++ b/docs/additional-functionality/shuffle-docker-examples/Dockerfile.centos_rdma @@ -29,7 +29,7 @@ ARG RDMA_CORE_VERSION=32.1 ARG CUDA_VER=11.2.2 -ARG UCX_VER=v1.11.0 +ARG UCX_VER=1.11.2 ARG UCX_CUDA_VER=11.2 # Throw away image to build rdma_core @@ -59,12 +59,12 @@ COPY --from=rdma_core /tmp/*.rpm /tmp/ RUN yum update -y RUN yum install -y wget bzip2 -RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/$UCX_VER/ucx-$UCX_VER-centos7-mofed5.x-cuda$UCX_CUDA_VER.tar.bz2 +RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-v$UCX_VER-centos7-mofed5.x-cuda$UCX_CUDA_VER.tar.bz2 RUN cd /tmp && \ yum install -y *.rpm && \ tar -xvf *.bz2 && \ - yum install -y ucx-1.11.0-1.el7.x86_64.rpm && \ - yum install -y ucx-cuda-1.11.0-1.el7.x86_64.rpm && \ - yum install -y ucx-ib-1.11.0-1.el7.x86_64.rpm && \ - yum install -y ucx-rdmacm-1.11.0-1.el7.x86_64.rpm + yum install -y ucx-$UCX_VER-1.el7.x86_64.rpm && \ + yum install -y ucx-cuda-$UCX_VER-1.el7.x86_64.rpm && \ + yum install -y ucx-ib-$UCX_VER-1.el7.x86_64.rpm && \ + yum install -y ucx-rdmacm-$UCX_VER-1.el7.x86_64.rpm RUN rm -rf /tmp/*.rpm && rm /tmp/*.bz2 diff --git a/docs/additional-functionality/shuffle-docker-examples/Dockerfile.ubuntu_no_rdma b/docs/additional-functionality/shuffle-docker-examples/Dockerfile.ubuntu_no_rdma index 3dbf3026764..8270f295f00 100644 --- a/docs/additional-functionality/shuffle-docker-examples/Dockerfile.ubuntu_no_rdma +++ b/docs/additional-functionality/shuffle-docker-examples/Dockerfile.ubuntu_no_rdma @@ -22,7 +22,7 @@ # See: https://github.com/openucx/ucx/releases/ ARG CUDA_VER=11.2.2 -ARG UCX_VER=v1.11.0 +ARG UCX_VER=1.11.2 ARG UCX_CUDA_VER=11.2 FROM nvidia/cuda:${CUDA_VER}-runtime-ubuntu18.04 @@ -31,5 +31,5 @@ ARG UCX_CUDA_VER RUN apt update RUN apt-get install -y wget -RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/$UCX_VER/ucx-$UCX_VER-ubuntu18.04-mofed5.x-cuda$UCX_CUDA_VER.deb +RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-v$UCX_VER-ubuntu18.04-mofed5.x-cuda$UCX_CUDA_VER.deb RUN apt install -y /tmp/*.deb && rm -rf /tmp/*.deb diff --git a/docs/additional-functionality/shuffle-docker-examples/Dockerfile.ubuntu_rdma b/docs/additional-functionality/shuffle-docker-examples/Dockerfile.ubuntu_rdma index 1d5f5454236..b498239974e 100644 --- a/docs/additional-functionality/shuffle-docker-examples/Dockerfile.ubuntu_rdma +++ b/docs/additional-functionality/shuffle-docker-examples/Dockerfile.ubuntu_rdma @@ -29,7 +29,7 @@ ARG RDMA_CORE_VERSION=32.1 ARG CUDA_VER=11.2.2 -ARG UCX_VER=v1.11.0 +ARG UCX_VER=1.11.2 ARG UCX_CUDA_VER=11.2 # Throw away image to build rdma_core @@ -50,5 +50,5 @@ COPY --from=rdma_core /*.deb /tmp/ RUN apt update RUN apt-get install -y wget -RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/$UCX_VER/ucx-$UCX_VER-ubuntu18.04-mofed5.x-cuda$UCX_CUDA_VER.deb +RUN cd /tmp && wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-v$UCX_VER-ubuntu18.04-mofed5.x-cuda$UCX_CUDA_VER.deb RUN apt install -y /tmp/*.deb && rm -rf /tmp/*.deb diff --git a/docs/additional-functionality/udf-to-catalyst-expressions.md b/docs/additional-functionality/udf-to-catalyst-expressions.md index d852f93d4ee..950c9e263ac 100644 --- a/docs/additional-functionality/udf-to-catalyst-expressions.md +++ b/docs/additional-functionality/udf-to-catalyst-expressions.md @@ -110,3 +110,5 @@ When translating UDFs to Catalyst expressions, the supported UDF functions are l | | lhs :+ rhs | | Method call | Only if the method being called 1. Consists of operations supported by the UDF compiler, and 2. is one of the folllowing: a final method, a method in a final class, or a method in a final object | +All other expressions, including but not limited to `try` and `catch`, are unsupported and UDFs +with such expressions cannot be compiled. \ No newline at end of file diff --git a/docs/compatibility.md b/docs/compatibility.md index bb36b0e9a2e..e9e51be17be 100644 --- a/docs/compatibility.md +++ b/docs/compatibility.md @@ -539,8 +539,7 @@ Casting from string to timestamp currently has the following limitations. | `"tomorrow"` | Yes | | `"yesterday"` | Yes | -- [1] The timestamp portion must be complete in terms of hours, minutes, seconds, and - milliseconds, with 2 digits each for hours, minutes, and seconds, and 6 digits for milliseconds. +- [1] The timestamp portion must have 6 digits for milliseconds. Only timezone 'Z' (UTC) is supported. Casting unsupported formats will result in null values. Spark is very lenient when casting from string to timestamp because all date and time components @@ -561,3 +560,15 @@ double quotes around strings in JSON data, whereas Spark allows single quotes ar data. The RAPIDS Spark `get_json_object` operation on the GPU will return `None` in PySpark or `Null` in Scala when trying to match a string surrounded by single quotes. This behavior will be updated in a future release to more closely match Spark. + +## Approximate Percentile + +The GPU implementation of `approximate_percentile` uses +[t-Digests](https://arxiv.org/abs/1902.04023) which have high accuracy, particularly near the tails of a +distribution. Because the results are not bit-for-bit identical with the Apache Spark implementation of +`approximate_percentile`, this feature is disabled by default and can be enabled by setting +`spark.rapids.sql.expression.ApproximatePercentile=true`. + +There are known issues with the approximate percentile implementation +([#3706](https://github.com/NVIDIA/spark-rapids/issues/3706), +[#3692](https://github.com/NVIDIA/spark-rapids/issues/3692)) and the feature should be considered experimental. \ No newline at end of file diff --git a/docs/configs.md b/docs/configs.md index 4408e01fed1..cafed650bd7 100644 --- a/docs/configs.md +++ b/docs/configs.md @@ -10,7 +10,7 @@ The following is the list of options that `rapids-plugin-4-spark` supports. On startup use: `--conf [conf key]=[conf value]`. For example: ``` -${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.08.0.jar,cudf-21.08.2-cuda11.jar' \ +${SPARK_HOME}/bin/spark --jars 'rapids-4-spark_2.12-21.10.0.jar,cudf-21.10.0-cuda11.jar' \ --conf spark.plugins=com.nvidia.spark.SQLPlugin \ --conf spark.rapids.sql.incompatibleOps.enabled=true ``` @@ -31,16 +31,16 @@ Name | Description | Default Value -----|-------------|-------------- spark.rapids.alluxio.pathsToReplace|List of paths to be replaced with corresponding alluxio scheme. Eg, when configureis set to "s3:/foo->alluxio://0.1.2.3:19998/foo,gcs:/bar->alluxio://0.1.2.3:19998/bar", which means: s3:/foo/a.csv will be replaced to alluxio://0.1.2.3:19998/foo/a.csv and gcs:/bar/b.csv will be replaced to alluxio://0.1.2.3:19998/bar/b.csv|None spark.rapids.cloudSchemes|Comma separated list of additional URI schemes that are to be considered cloud based filesystems. Schemes already included: dbfs, s3, s3a, s3n, wasbs, gs. Cloud based stores generally would be total separate from the executors and likely have a higher I/O read cost. Many times the cloud filesystems also get better throughput when you have multiple readers in parallel. This is used with spark.rapids.sql.format.parquet.reader.type|None -spark.rapids.memory.gpu.allocFraction|The fraction of available GPU memory that should be initially allocated for pooled memory. Extra memory will be allocated as needed, but it may result in more fragmentation. This must be less than or equal to the maximum limit configured via spark.rapids.memory.gpu.maxAllocFraction.|0.9 +spark.rapids.memory.gpu.allocFraction|The fraction of available GPU memory that should be initially allocated for pooled memory. Extra memory will be allocated as needed, but it may result in more fragmentation. This must be less than or equal to the maximum limit configured via spark.rapids.memory.gpu.maxAllocFraction.|1.0 spark.rapids.memory.gpu.debug|Provides a log of GPU memory allocations and frees. If set to STDOUT or STDERR the logging will go there. Setting it to NONE disables logging. All other values are reserved for possible future expansion and in the mean time will disable logging.|NONE spark.rapids.memory.gpu.direct.storage.spill.batchWriteBuffer.size|The size of the GPU memory buffer used to batch small buffers when spilling to GDS. Note that this buffer is mapped to the PCI Base Address Register (BAR) space, which may be very limited on some GPUs (e.g. the NVIDIA T4 only has 256 MiB), and it is also used by UCX bounce buffers.|8388608 spark.rapids.memory.gpu.direct.storage.spill.enabled|Should GPUDirect Storage (GDS) be used to spill GPU memory buffers directly to disk. GDS must be enabled and the directory `spark.local.dir` must support GDS. This is an experimental feature. For more information on GDS, see https://docs.nvidia.com/gpudirect-storage/.|false spark.rapids.memory.gpu.maxAllocFraction|The fraction of total GPU memory that limits the maximum size of the RMM pool. The value must be greater than or equal to the setting for spark.rapids.memory.gpu.allocFraction. Note that this limit will be reduced by the reserve memory configured in spark.rapids.memory.gpu.reserve.|1.0 spark.rapids.memory.gpu.minAllocFraction|The fraction of total GPU memory that limits the minimum size of the RMM pool. The value must be less than or equal to the setting for spark.rapids.memory.gpu.allocFraction.|0.25 spark.rapids.memory.gpu.oomDumpDir|The path to a local directory where a heap dump will be created if the GPU encounters an unrecoverable out-of-memory (OOM) error. The filename will be of the form: "gpu-oom-.hprof" where is the process ID.|None -spark.rapids.memory.gpu.pool|Select the RMM pooling allocator to use. Valid values are "DEFAULT", "ARENA", and "NONE". With "DEFAULT", `rmm::mr::pool_memory_resource` is used; with "ARENA", `rmm::mr::arena_memory_resource` is used. If set to "NONE", pooling is disabled and RMM just passes through to CUDA memory allocation directly. Note: "ARENA" is the recommended pool allocator if CUDF is built with Per-Thread Default Stream (PTDS), as "DEFAULT" is known to be unstable (https://github.com/NVIDIA/spark-rapids/issues/1141)|ARENA +spark.rapids.memory.gpu.pool|Select the RMM pooling allocator to use. Valid values are "DEFAULT", "ARENA", "ASYNC", and "NONE". With "DEFAULT", the RMM pool allocator is used; with "ARENA", the RMM arena allocator is used; with "ASYNC", the new CUDA stream-ordered memory allocator in CUDA 11.2+ is used. If set to "NONE", pooling is disabled and RMM just passes through to CUDA memory allocation directly. Note: "ARENA" is the recommended pool allocator if CUDF is built with Per-Thread Default Stream (PTDS), as "DEFAULT" is known to be unstable (https://github.com/NVIDIA/spark-rapids/issues/1141)|ARENA spark.rapids.memory.gpu.pooling.enabled|Should RMM act as a pooling allocator for GPU memory, or should it just pass through to CUDA memory allocation directly. DEPRECATED: please use spark.rapids.memory.gpu.pool instead.|true -spark.rapids.memory.gpu.reserve|The amount of GPU memory that should remain unallocated by RMM and left for system use such as memory needed for kernels, kernel launches or JIT compilation.|1073741824 +spark.rapids.memory.gpu.reserve|The amount of GPU memory that should remain unallocated by RMM and left for system use such as memory needed for kernels and kernel launches.|1073741824 spark.rapids.memory.gpu.unspill.enabled|When a spilled GPU buffer is needed again, should it be unspilled, or only copied back into GPU memory temporarily. Unspilling may be useful for GPU buffers that are needed frequently, for example, broadcast variables; however, it may also increase GPU memory usage|false spark.rapids.memory.host.spillStorageSize|Amount of off-heap host memory to use for buffering spilled GPU data before spilling to local disk|1073741824 spark.rapids.memory.pinnedPool.size|The size of the pinned memory pool in bytes unless otherwise specified. Use 0 to disable the pool.|0 @@ -48,6 +48,7 @@ Name | Description | Default Value spark.rapids.python.memory.gpu.allocFraction|The fraction of total GPU memory that should be initially allocated for pooled memory for all the Python workers. It supposes to be less than (1 - $(spark.rapids.memory.gpu.allocFraction)), since the executor will share the GPU with its owning Python workers. Half of the rest will be used if not specified|None spark.rapids.python.memory.gpu.maxAllocFraction|The fraction of total GPU memory that limits the maximum size of the RMM pool for all the Python workers. It supposes to be less than (1 - $(spark.rapids.memory.gpu.maxAllocFraction)), since the executor will share the GPU with its owning Python workers. when setting to 0 it means no limit.|0.0 spark.rapids.python.memory.gpu.pooling.enabled|Should RMM in Python workers act as a pooling allocator for GPU memory, or should it just pass through to CUDA memory allocation directly. When not specified, It will honor the value of config 'spark.rapids.memory.gpu.pooling.enabled'|None +spark.rapids.shuffle.enabled|Enable or disable the RAPIDS Shuffle Manager at runtime. The [RAPIDS Shuffle Manager](additional-functionality/rapids-shuffle.md) must already be configured. When set to `false`, the built-in Spark shuffle will be used. |true spark.rapids.shuffle.transport.earlyStart|Enable early connection establishment for RAPIDS Shuffle|true spark.rapids.shuffle.transport.earlyStart.heartbeatInterval|Shuffle early start heartbeat interval (milliseconds). Executors will send a heartbeat RPC message to the driver at this interval|5000 spark.rapids.shuffle.transport.earlyStart.heartbeatTimeout|Shuffle early start heartbeat timeout (milliseconds). Executors that don't heartbeat within this timeout will be considered stale. This timeout must be higher than the value for spark.rapids.shuffle.transport.earlyStart.heartbeatInterval|10000 @@ -56,6 +57,7 @@ Name | Description | Default Value spark.rapids.shuffle.ucx.managementServerHost|The host to be used to start the management server|null spark.rapids.shuffle.ucx.useWakeup|When set to true, use UCX's event-based progress (epoll) in order to wake up the progress thread when needed, instead of a hot loop.|true spark.rapids.sql.batchSizeBytes|Set the target number of bytes for a GPU batch. Splits sizes for input data is covered by separate configs. The maximum setting is 2 GB to avoid exceeding the cudf row count limit of a column.|2147483647 +spark.rapids.sql.castDecimalToFloat.enabled|Casting from decimal to floating point types on the GPU returns results that have tiny difference compared to results returned from CPU.|false spark.rapids.sql.castDecimalToString.enabled|When set to true, casting from decimal to string is supported on the GPU. The GPU does NOT produce exact same string as spark produces, but producing strings which are semantically equal. For instance, given input BigDecimal(123, -2), the GPU produces "12300", which spark produces "1.23E+4".|false spark.rapids.sql.castFloatToDecimal.enabled|Casting from floating point types to decimal on the GPU returns results that have tiny difference compared to results returned from CPU.|false spark.rapids.sql.castFloatToIntegralTypes.enabled|Casting from floating point types to integral types on the GPU supports a slightly different range of values when using Spark 3.1.0 or later. Refer to the CAST documentation for more details.|false @@ -64,6 +66,7 @@ Name | Description | Default Value spark.rapids.sql.castStringToFloat.enabled|When set to true, enables casting from strings to float types (float, double) on the GPU. Currently hex values aren't supported on the GPU. Also note that casting from string to float types on the GPU returns incorrect results when the string represents any number "1.7976931348623158E308" <= x < "1.7976931348623159E308" and "-1.7976931348623158E308" >= x > "-1.7976931348623159E308" in both these cases the GPU returns Double.MaxValue while CPU returns "+Infinity" and "-Infinity" respectively|false spark.rapids.sql.castStringToTimestamp.enabled|When set to true, casting from string to timestamp is supported on the GPU. The GPU only supports a subset of formats when casting strings to timestamps. Refer to the CAST documentation for more details.|false spark.rapids.sql.concurrentGpuTasks|Set the number of tasks that can execute concurrently per GPU. Tasks may temporarily block when the number of concurrent tasks in the executor exceeds this amount. Allowing too many concurrent tasks on the same GPU may lead to GPU out of memory errors.|1 +spark.rapids.sql.createMap.enabled|The GPU-enabled version of the `CreateMap` expression (`map` SQL function) does not detect duplicate keys in all cases and does not guarantee which key wins if there are duplicates. When this config is set to true, `CreateMap` will be enabled to run on the GPU even when there might be duplicate keys.|false spark.rapids.sql.csv.read.bool.enabled|Parsing an invalid CSV boolean value produces true instead of null|false spark.rapids.sql.csv.read.byte.enabled|Parsing CSV bytes is much more lenient and will return 0 for some malformed values instead of null|false spark.rapids.sql.csv.read.date.enabled|Parsing invalid CSV dates produces different results from Spark|false @@ -91,6 +94,7 @@ Name | Description | Default Value spark.rapids.sql.format.parquet.reader.type|Sets the parquet reader type. We support different types that are optimized for different environments. The original Spark style reader can be selected by setting this to PERFILE which individually reads and copies files to the GPU. Loading many small files individually has high overhead, and using either COALESCING or MULTITHREADED is recommended instead. The COALESCING reader is good when using a local file system where the executors are on the same nodes or close to the nodes the data is being read on. This reader coalesces all the files assigned to a task into a single host buffer before sending it down to the GPU. It copies blocks from a single file into a host buffer in separate threads in parallel, see spark.rapids.sql.format.parquet.multiThreadedRead.numThreads. MULTITHREADED is good for cloud environments where you are reading from a blobstore that is totally separate and likely has a higher I/O read cost. Many times the cloud environments also get better throughput when you have multiple readers in parallel. This reader uses multiple threads to read each file in parallel and each file is sent to the GPU separately. This allows the CPU to keep reading while GPU is also doing work. See spark.rapids.sql.format.parquet.multiThreadedRead.numThreads and spark.rapids.sql.format.parquet.multiThreadedRead.maxNumFilesParallel to control the number of threads and amount of memory used. By default this is set to AUTO so we select the reader we think is best. This will either be the COALESCING or the MULTITHREADED based on whether we think the file is in the cloud. See spark.rapids.cloudSchemes.|AUTO spark.rapids.sql.format.parquet.write.enabled|When set to false disables parquet output acceleration|true spark.rapids.sql.format.parquet.writer.int96.enabled|When set to false, disables accelerated parquet write if the spark.sql.parquet.outputTimestampType is set to INT96|true +spark.rapids.sql.hasExtendedYearValues|Spark 3.2.0+ extended parsing of years in dates and timestamps to support the full range of possible values. Prior to this it was limited to a positive 4 digit year. The Accelerator does not support the extended range yet. This config indicates if your data includes this extended range or not, or if you don't care about getting the correct values on values with the extended range.|true spark.rapids.sql.hasNans|Config to indicate if your data has NaN's. Cudf doesn't currently support NaN's properly so you can get corrupt data if you have NaN's in your data and it runs on the GPU.|true spark.rapids.sql.hashOptimizeSort.enabled|Whether sorts should be inserted after some hashed operations to improve output ordering. This can improve output file sizes when saving to columnar formats.|false spark.rapids.sql.improvedFloatOps.enabled|For some floating point operations spark uses one way to compute the value and the underlying cudf implementation can use an improved algorithm. In some cases this can result in cudf producing an answer when spark overflows. Because this is not as compatible with spark, we have it disabled by default.|false @@ -144,6 +148,9 @@ Name | SQL Function(s) | Description | Default Value | Notes spark.rapids.sql.expression.And|`and`|Logical AND|true|None| spark.rapids.sql.expression.AnsiCast| |Convert a column of one type of data into another type|true|None| spark.rapids.sql.expression.ArrayContains|`array_contains`|Returns a boolean if the array contains the passed in key|true|None| +spark.rapids.sql.expression.ArrayMax|`array_max`|Returns the maximum value in the array|true|None| +spark.rapids.sql.expression.ArrayMin|`array_min`|Returns the minimum value in the array|true|None| +spark.rapids.sql.expression.ArrayTransform|`transform`|Transform elements in an array using the transform function. This is similar to a `map` in functional programming|true|None| spark.rapids.sql.expression.Asin|`asin`|Inverse sine|true|None| spark.rapids.sql.expression.Asinh|`asinh`|Inverse hyperbolic sine|true|None| spark.rapids.sql.expression.AtLeastNNonNulls| |Checks if number of non null/Nan values is greater than a given value|true|None| @@ -167,7 +174,8 @@ Name | SQL Function(s) | Description | Default Value | Notes spark.rapids.sql.expression.Cos|`cos`|Cosine|true|None| spark.rapids.sql.expression.Cosh|`cosh`|Hyperbolic cosine|true|None| spark.rapids.sql.expression.Cot|`cot`|Cotangent|true|None| -spark.rapids.sql.expression.CreateArray|`array`| Returns an array with the given elements|true|None| +spark.rapids.sql.expression.CreateArray|`array`|Returns an array with the given elements|true|None| +spark.rapids.sql.expression.CreateMap|`map`|Create a map|true|None| spark.rapids.sql.expression.CreateNamedStruct|`named_struct`, `struct`|Creates a struct with the given field names and values|true|None| spark.rapids.sql.expression.CurrentRow$| |Special boundary for a window frame, indicating stopping at the current row|true|None| spark.rapids.sql.expression.DateAdd|`date_add`|Returns the date that is num_days after start_date|true|None| @@ -180,12 +188,12 @@ Name | SQL Function(s) | Description | Default Value | Notes spark.rapids.sql.expression.DayOfYear|`dayofyear`|Returns the day of the year from a date or timestamp|true|None| spark.rapids.sql.expression.DenseRank|`dense_rank`|Window function that returns the dense rank value within the aggregation window|true|None| spark.rapids.sql.expression.Divide|`/`|Division|true|None| -spark.rapids.sql.expression.ElementAt|`element_at`|Returns element of array at given(1-based) index in value if column is array. Returns value for the given key in value if column is map.|true|None| +spark.rapids.sql.expression.ElementAt|`element_at`|Returns element of array at given(1-based) index in value if column is array. Returns value for the given key in value if column is map|true|None| spark.rapids.sql.expression.EndsWith| |Ends with|true|None| spark.rapids.sql.expression.EqualNullSafe|`<=>`|Check if the values are equal including nulls <=>|true|None| spark.rapids.sql.expression.EqualTo|`=`, `==`|Check if the values are equal|true|None| spark.rapids.sql.expression.Exp|`exp`|Euler's number e raised to a power|true|None| -spark.rapids.sql.expression.Explode|`explode`, `explode_outer`|Given an input array produces a sequence of rows for each value in the array.|true|None| +spark.rapids.sql.expression.Explode|`explode`, `explode_outer`|Given an input array produces a sequence of rows for each value in the array|true|None| spark.rapids.sql.expression.Expm1|`expm1`|Euler's number e raised to a power minus 1|true|None| spark.rapids.sql.expression.Floor|`floor`|Floor of a number|true|None| spark.rapids.sql.expression.FromUnixTime|`from_unixtime`|Get the string from a unix timestamp|true|None| @@ -212,6 +220,7 @@ Name | SQL Function(s) | Description | Default Value | Notes spark.rapids.sql.expression.KnownFloatingPointNormalized| |Tag to prevent redundant normalization|true|None| spark.rapids.sql.expression.KnownNotNull| |Tag an expression as known to not be null|true|None| spark.rapids.sql.expression.Lag|`lag`|Window function that returns N entries behind this one|true|None| +spark.rapids.sql.expression.LambdaFunction| |Holds a higher order SQL function|true|None| spark.rapids.sql.expression.LastDay|`last_day`|Returns the last day of the month which the date belongs to|true|None| spark.rapids.sql.expression.Lead|`lead`|Window function that returns N entries ahead of this one|true|None| spark.rapids.sql.expression.Least|`least`|Returns the least value of all parameters, skipping null values|true|None| @@ -227,6 +236,9 @@ Name | SQL Function(s) | Description | Default Value | Notes spark.rapids.sql.expression.Logarithm|`log`|Log variable base|true|None| spark.rapids.sql.expression.Lower|`lower`, `lcase`|String lowercase operator|false|This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly.| spark.rapids.sql.expression.MakeDecimal| |Create a Decimal from an unscaled long value for some aggregation optimizations|true|None| +spark.rapids.sql.expression.MapEntries|`map_entries`|Returns an unordered array of all entries in the given map|true|None| +spark.rapids.sql.expression.MapKeys|`map_keys`|Returns an unordered array containing the keys of the map|true|None| +spark.rapids.sql.expression.MapValues|`map_values`|Returns an unordered array containing the values of the map|true|None| spark.rapids.sql.expression.Md5|`md5`|MD5 hash operator|true|None| spark.rapids.sql.expression.Minute|`minute`|Returns the minute component of the string/timestamp|true|None| spark.rapids.sql.expression.MonotonicallyIncreasingID|`monotonically_increasing_id`|Returns monotonically increasing 64-bit integers|true|None| @@ -234,13 +246,15 @@ Name | SQL Function(s) | Description | Default Value | Notes spark.rapids.sql.expression.Multiply|`*`|Multiplication|true|None| spark.rapids.sql.expression.Murmur3Hash|`hash`|Murmur3 hash operator|true|None| spark.rapids.sql.expression.NaNvl|`nanvl`|Evaluates to `left` iff left is not NaN, `right` otherwise|true|None| +spark.rapids.sql.expression.NamedLambdaVariable| |A parameter to a higher order SQL function|true|None| spark.rapids.sql.expression.Not|`!`, `not`|Boolean not operator|true|None| spark.rapids.sql.expression.Or|`or`|Logical OR|true|None| spark.rapids.sql.expression.Pmod|`pmod`|Pmod|true|None| -spark.rapids.sql.expression.PosExplode|`posexplode_outer`, `posexplode`|Given an input array produces a sequence of rows for each value in the array.|true|None| +spark.rapids.sql.expression.PosExplode|`posexplode_outer`, `posexplode`|Given an input array produces a sequence of rows for each value in the array|true|None| spark.rapids.sql.expression.Pow|`pow`, `power`|lhs ^ rhs|true|None| +spark.rapids.sql.expression.PreciseTimestampConversion| |Expression used internally to convert the TimestampType to Long and back without losing precision, i.e. in microseconds. Used in time windowing|true|None| spark.rapids.sql.expression.PromotePrecision| |PromotePrecision before arithmetic operations between DecimalType data|true|None| -spark.rapids.sql.expression.PythonUDF| |UDF run in an external python process. Does not actually run on the GPU, but the transfer of data to/from it can be accelerated.|true|None| +spark.rapids.sql.expression.PythonUDF| |UDF run in an external python process. Does not actually run on the GPU, but the transfer of data to/from it can be accelerated|true|None| spark.rapids.sql.expression.Quarter|`quarter`|Returns the quarter of the year for date, in the range 1 to 4|true|None| spark.rapids.sql.expression.Rand|`random`, `rand`|Generate a random column with i.i.d. uniformly distributed values in [0, 1)|true|None| spark.rapids.sql.expression.Rank|`rank`|Window function that returns the rank value within the aggregation window|true|None| @@ -267,6 +281,7 @@ Name | SQL Function(s) | Description | Default Value | Notes spark.rapids.sql.expression.StringLPad|`lpad`|Pad a string on the left|true|None| spark.rapids.sql.expression.StringLocate|`position`, `locate`|Substring search operator|true|None| spark.rapids.sql.expression.StringRPad|`rpad`|Pad a string on the right|true|None| +spark.rapids.sql.expression.StringRepeat|`repeat`|StringRepeat operator that repeats the given strings with numbers of times given by repeatTimes|true|None| spark.rapids.sql.expression.StringReplace|`replace`|StringReplace operator|true|None| spark.rapids.sql.expression.StringSplit|`split`|Splits `str` around occurrences that match `regex`|true|None| spark.rapids.sql.expression.StringTrim|`trim`|StringTrim operator|true|None| @@ -282,6 +297,8 @@ Name | SQL Function(s) | Description | Default Value | Notes spark.rapids.sql.expression.ToDegrees|`degrees`|Converts radians to degrees|true|None| spark.rapids.sql.expression.ToRadians|`radians`|Converts degrees to radians|true|None| spark.rapids.sql.expression.ToUnixTimestamp|`to_unix_timestamp`|Returns the UNIX timestamp of the given time|true|None| +spark.rapids.sql.expression.TransformKeys|`transform_keys`|Transform keys in a map using a transform function|true|None| +spark.rapids.sql.expression.TransformValues|`transform_values`|Transform values in a map using a transform function|true|None| spark.rapids.sql.expression.UnaryMinus|`negative`|Negate a numeric value|true|None| spark.rapids.sql.expression.UnaryPositive|`positive`|A numeric value with a + in front of it|true|None| spark.rapids.sql.expression.UnboundedFollowing$| |Special boundary for a window frame, indicating all rows preceding the current row|true|None| @@ -294,16 +311,21 @@ Name | SQL Function(s) | Description | Default Value | Notes spark.rapids.sql.expression.WindowSpecDefinition| |Specification of a window function, indicating the partitioning-expression, the row ordering, and the width of the window|true|None| spark.rapids.sql.expression.Year|`year`|Returns the year from a date or timestamp|true|None| spark.rapids.sql.expression.AggregateExpression| |Aggregate expression|true|None| +spark.rapids.sql.expression.ApproximatePercentile|`percentile_approx`, `approx_percentile`|Approximate percentile|false|This is disabled by default because The GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark. See the compatibility guide for more information.| spark.rapids.sql.expression.Average|`avg`, `mean`|Average aggregate operator|true|None| -spark.rapids.sql.expression.CollectList|`collect_list`|Collect a list of non-unique elements, only supported in rolling window in current.|true|None| -spark.rapids.sql.expression.CollectSet|`collect_set`|Collect a set of unique elements, only supported in rolling window in current.|true|None| +spark.rapids.sql.expression.CollectList|`collect_list`|Collect a list of non-unique elements, not supported in reduction|true|None| +spark.rapids.sql.expression.CollectSet|`collect_set`|Collect a set of unique elements, not supported in reduction|true|None| spark.rapids.sql.expression.Count|`count`|Count aggregate operator|true|None| spark.rapids.sql.expression.First|`first_value`, `first`|first aggregate operator|true|None| spark.rapids.sql.expression.Last|`last`, `last_value`|last aggregate operator|true|None| spark.rapids.sql.expression.Max|`max`|Max aggregate operator|true|None| spark.rapids.sql.expression.Min|`min`|Min aggregate operator|true|None| spark.rapids.sql.expression.PivotFirst| |PivotFirst operator|true|None| +spark.rapids.sql.expression.StddevPop|`stddev_pop`|Aggregation computing population standard deviation|true|None| +spark.rapids.sql.expression.StddevSamp|`stddev_samp`, `std`, `stddev`|Aggregation computing sample standard deviation|true|None| spark.rapids.sql.expression.Sum|`sum`|Sum aggregate operator|true|None| +spark.rapids.sql.expression.VariancePop|`var_pop`|Aggregation computing population variance|true|None| +spark.rapids.sql.expression.VarianceSamp|`var_samp`, `variance`|Aggregation computing sample variance|true|None| spark.rapids.sql.expression.NormalizeNaNAndZero| |Normalize NaN and zero|true|None| spark.rapids.sql.expression.ScalarSubquery| |Subquery that will return only one row and one column|true|None| spark.rapids.sql.expression.HiveGenericUDF| |Hive Generic UDF, support requires the UDF to implement a RAPIDS accelerated interface|true|None| @@ -324,17 +346,18 @@ Name | Description | Default Value | Notes spark.rapids.sql.exec.ProjectExec|The backend for most select, withColumn and dropColumn statements|true|None| spark.rapids.sql.exec.RangeExec|The backend for range operator|true|None| spark.rapids.sql.exec.SortExec|The backend for the sort operator|true|None| -spark.rapids.sql.exec.TakeOrderedAndProjectExec|Take the first limit elements as defined by the sortOrder, and do projection if needed.|true|None| +spark.rapids.sql.exec.TakeOrderedAndProjectExec|Take the first limit elements as defined by the sortOrder, and do projection if needed|true|None| spark.rapids.sql.exec.UnionExec|The backend for the union operator|true|None| spark.rapids.sql.exec.CustomShuffleReaderExec|A wrapper of shuffle query stage|true|None| spark.rapids.sql.exec.HashAggregateExec|The backend for hash based aggregations|true|None| +spark.rapids.sql.exec.ObjectHashAggregateExec|The backend for hash based aggregations supporting TypedImperativeAggregate functions|true|None| spark.rapids.sql.exec.SortAggregateExec|The backend for sort based aggregations|true|None| spark.rapids.sql.exec.DataWritingCommandExec|Writing data|true|None| spark.rapids.sql.exec.BatchScanExec|The backend for most file input|true|None| spark.rapids.sql.exec.BroadcastExchangeExec|The backend for broadcast exchange of data|true|None| spark.rapids.sql.exec.ShuffleExchangeExec|The backend for most data being exchanged between processes|true|None| spark.rapids.sql.exec.BroadcastHashJoinExec|Implementation of join using broadcast data|true|None| -spark.rapids.sql.exec.BroadcastNestedLoopJoinExec|Implementation of join using brute force|true|None| +spark.rapids.sql.exec.BroadcastNestedLoopJoinExec|Implementation of join using brute force. Full outer joins and joins where the broadcast side matches the join side (e.g.: LeftOuter with left broadcast) are not supported|true|None| spark.rapids.sql.exec.CartesianProductExec|Implementation of join using brute force|true|None| spark.rapids.sql.exec.ShuffledHashJoinExec|Implementation of join using hashed shuffled data|true|None| spark.rapids.sql.exec.SortMergeJoinExec|Sort merge join, replacing with shuffled hash join|true|None| @@ -362,20 +385,3 @@ Name | Description | Default Value | Notes spark.rapids.sql.partitioning.RangePartitioning|Range partitioning|true|None| spark.rapids.sql.partitioning.RoundRobinPartitioning|Round robin partitioning|true|None| spark.rapids.sql.partitioning.SinglePartition$|Single partitioning|true|None| - -### JIT Kernel Cache Path - - CUDF can compile GPU kernels at runtime using a just-in-time (JIT) compiler. The - resulting kernels are cached on the filesystem. The default location for this cache is - under the `.cudf` directory in the user's home directory. When running in an environment - where the user's home directory cannot be written, such as running in a container - environment on a cluster, the JIT cache path will need to be specified explicitly with - the `LIBCUDF_KERNEL_CACHE_PATH` environment variable. - The specified kernel cache path should be specific to the user to avoid conflicts with - others running on the same host. For example, the following would specify the path to a - user-specific location under `/tmp`: - - ``` - --conf spark.executorEnv.LIBCUDF_KERNEL_CACHE_PATH="/tmp/cudf-$USER" - ``` - diff --git a/docs/demo/Databricks/generate-init-script-cuda11.ipynb b/docs/demo/Databricks/generate-init-script-cuda11.ipynb index 129e5521fb8..23c9411c64c 100644 --- a/docs/demo/Databricks/generate-init-script-cuda11.ipynb +++ b/docs/demo/Databricks/generate-init-script-cuda11.ipynb @@ -1 +1 @@ -{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.08.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.08.0/rapids-4-spark_2.12-21.08.0.jar\nsudo wget -O /databricks/jars/cudf-21.08.2-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.08.2/cudf-21.08.2-cuda11.jar\n\nsudo wget -O /etc/apt/preferences.d/cuda-repository-pin-600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin\nsudo wget -O ~/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb\nsudo dpkg -i ~/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb\nsudo apt-key add /var/cuda-repo-ubuntu1804-11-0-local/7fa2af80.pub\nsudo apt-get update\nsudo apt -y install cuda-toolkit-11-0\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0} +{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.10.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.10.0/rapids-4-spark_2.12-21.10.0.jar\nsudo wget -O /databricks/jars/cudf-21.10.0-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.10.0/cudf-21.10.0-cuda11.jar\n\nsudo wget -O /etc/apt/preferences.d/cuda-repository-pin-600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin\nsudo wget -O ~/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb\nsudo dpkg -i ~/cuda-repo-ubuntu1804-11-0-local_11.0.3-450.51.06-1_amd64.deb\nsudo apt-key add /var/cuda-repo-ubuntu1804-11-0-local/7fa2af80.pub\nsudo apt-get update\nsudo apt -y install cuda-toolkit-11-0\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0} diff --git a/docs/demo/Databricks/generate-init-script.ipynb b/docs/demo/Databricks/generate-init-script.ipynb index 3f33f701ddf..b5e2c35efbe 100644 --- a/docs/demo/Databricks/generate-init-script.ipynb +++ b/docs/demo/Databricks/generate-init-script.ipynb @@ -1 +1 @@ -{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.08.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.08.0/rapids-4-spark_2.12-21.08.0.jar\nsudo wget -O /databricks/jars/cudf-21.08.2-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.08.2/cudf-21.08.2-cuda11.jar\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0} +{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.10.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.10.0/rapids-4-spark_2.12-21.10.0.jar\nsudo wget -O /databricks/jars/cudf-21.10.0-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.10.0/cudf-21.10.0-cuda11.jar\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0} diff --git a/docs/dev/README.md b/docs/dev/README.md index 72661fb4f37..1cd9dba4512 100644 --- a/docs/dev/README.md +++ b/docs/dev/README.md @@ -1,7 +1,7 @@ --- layout: page title: Developer Overview -nav_order: 10 +nav_order: 11 has_children: true permalink: /developer-overview/ --- diff --git a/docs/download.md b/docs/download.md index 44b6f48785c..23675ad0289 100644 --- a/docs/download.md +++ b/docs/download.md @@ -18,6 +18,64 @@ cuDF jar, that is either preinstalled in the Spark classpath on all nodes or sub that uses the RAPIDS Accelerator For Apache Spark. See the [getting-started guide](https://nvidia.github.io/spark-rapids/Getting-Started/) for more details. +## Release v21.10.0 +Hardware Requirements: + +The plugin is tested on the following architectures: + + GPU Architecture: NVIDIA V100, T4 and A10/A30/A100 GPUs + +Software Requirements: + + OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS 8 + + CUDA & Nvidia Drivers*: 11.0-11.4 & v450.80.02+ + + Apache Spark 3.0.1, 3.0.2, 3.0.3, 3.1.1, 3.1.2, 3.2.0, Cloudera CDP 7.1.6, 7.1.7, Databricks 7.3 ML LTS or 8.2 ML Runtime, and GCP Dataproc 2.0 + + Apache Hadoop 2.10+ or 3.1.1+ (3.1.1 for nvidia-docker version 2) + + Python 3.6+, Scala 2.12, Java 8 + +*Some hardware may have a minimum driver version greater than v450.80.02+. Check the GPU spec sheet +for your hardware's minimum driver version. + +### Download v21.10.0 +* Download the [RAPIDS + Accelerator for Apache Spark 21.10.0 jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.10.0/rapids-4-spark_2.12-21.10.0.jar) +* Download the [RAPIDS cuDF 21.10.0 jar](https://repo1.maven.org/maven2/ai/rapids/cudf/21.10.0/cudf-21.10.0-cuda11.jar) + +This package is built against CUDA 11.2 and has [CUDA forward +compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html) enabled. It is tested +on V100, T4, A30 and A100 GPUs with CUDA 11.0-11.4. For those using other types of GPUs which +do not have CUDA forward compatibility (for example, GeForce), CUDA 11.2 is required. Users will +need to ensure the minimum driver (450.80.02) and CUDA toolkit are installed on each Spark node. + +### Release Notes +New functionality and performance improvements for this release include: +* Support collect_list and collect_set in group-by aggregation +* Support stddev, percentile_approx in group-by aggregation +* RunningWindow operations on map +* HashAggregate on struct and nested struct +* Sorting on nested structs +* Explode on map, array, struct +* Union-all on map, array and struct of maps +* Parquet writing of map +* ORC reader supports reading map/struct columns +* ORC reader support decimal64 +* Spark Qualification Tool + * Add conjunction and disjunction filters + * Filtering specific configuration values + * Filtering user name + * Reporting nested data types + * Reporting write data formats +* Spark Profiling Tool + * Generating structured output format + * Improved profiling tool performance + +For a detailed list of changes, please refer to the +[CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md). + ## Release v21.08.0 Hardware Requirements: diff --git a/docs/examples.md b/docs/examples.md index c91552cec73..72026e41cb3 100644 --- a/docs/examples.md +++ b/docs/examples.md @@ -1,7 +1,7 @@ --- layout: page title: Examples -nav_order: 12 +nav_order: 13 --- # Demos diff --git a/docs/get-started/Dockerfile.cuda b/docs/get-started/Dockerfile.cuda index 22b7204019e..2e1f9c23a26 100644 --- a/docs/get-started/Dockerfile.cuda +++ b/docs/get-started/Dockerfile.cuda @@ -24,10 +24,11 @@ ENV PATH $PATH:/usr/lib/jvm/java-1.8.0-openjdk-amd64/jre/bin:/usr/lib/jvm/java-1 # Before building the docker image, first either download Apache Spark 3.0+ from # http://spark.apache.org/downloads.html or build and make a Spark distribution following the -# instructions in http://spark.apache.org/docs/3.0.2/building-spark.html (3.0.1 or 3.1.1 can -# be used as well). If this docker file is being used in the context of building your images from a -# Spark distribution, the docker build command should be invoked from the top level directory of the -# Spark distribution. E.g.: docker build -t spark:3.0.2 -f kubernetes/dockerfiles/spark/Dockerfile . +# instructions in http://spark.apache.org/docs/3.0.2/building-spark.html (see +# https://nvidia.github.io/spark-rapids/docs/download.html for other supported versions). If this +# docker file is being used in the context of building your images from a Spark distribution, the +# docker build command should be invoked from the top level directory of the Spark +# distribution. E.g.: docker build -t spark:3.0.2 -f kubernetes/dockerfiles/spark/Dockerfile . RUN set -ex && \ ln -s /lib /lib64 && \ @@ -42,16 +43,16 @@ RUN set -ex && \ echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \ chgrp root /etc/passwd && chmod ug+rw /etc/passwd -COPY spark-3.0.2-bin-hadoop3.2/jars /opt/spark/jars -COPY spark-3.0.2-bin-hadoop3.2/bin /opt/spark/bin -COPY spark-3.0.2-bin-hadoop3.2/sbin /opt/spark/sbin -COPY spark-3.0.2-bin-hadoop3.2/kubernetes/dockerfiles/spark/entrypoint.sh /opt/ -COPY spark-3.0.2-bin-hadoop3.2/examples /opt/spark/examples -COPY spark-3.0.2-bin-hadoop3.2/kubernetes/tests /opt/spark/tests -COPY spark-3.0.2-bin-hadoop3.2/data /opt/spark/data +COPY spark/jars /opt/spark/jars +COPY spark/bin /opt/spark/bin +COPY spark/sbin /opt/spark/sbin +COPY spark/kubernetes/dockerfiles/spark/entrypoint.sh /opt/ +COPY spark/examples /opt/spark/examples +COPY spark/kubernetes/tests /opt/spark/tests +COPY spark/data /opt/spark/data -COPY cudf-21.08.2-cuda11.jar /opt/sparkRapidsPlugin -COPY rapids-4-spark_2.12-21.08.0.jar /opt/sparkRapidsPlugin +COPY cudf-*-cuda11.jar /opt/sparkRapidsPlugin +COPY rapids-4-spark_2.12-*.jar /opt/sparkRapidsPlugin COPY getGpusResources.sh /opt/sparkRapidsPlugin RUN mkdir /opt/spark/python @@ -67,8 +68,8 @@ RUN apt-get update && \ # Removed the .cache to save space rm -r /root/.cache && rm -rf /var/cache/apt/* -COPY spark-3.0.2-bin-hadoop3.2/python/pyspark /opt/spark/python/pyspark -COPY spark-3.0.2-bin-hadoop3.2/python/lib /opt/spark/python/lib +COPY spark/python/pyspark /opt/spark/python/pyspark +COPY spark/python/lib /opt/spark/python/lib ENV SPARK_HOME /opt/spark diff --git a/docs/get-started/getting-started-databricks.md b/docs/get-started/getting-started-databricks.md index aa6cd16068b..615e6c22290 100644 --- a/docs/get-started/getting-started-databricks.md +++ b/docs/get-started/getting-started-databricks.md @@ -21,6 +21,38 @@ runtimes which may impact the behavior of the plugin. The number of GPUs per node dictates the number of Spark executors that can run in that node. +## Limitations + +1. Adaptive query execution(AQE) and Delta optimization write do not work. These should be disabled +when using the plugin. Queries may still see significant speedups even with AQE disabled. + + ```bash + spark.databricks.delta.optimizeWrite.enabled false + spark.sql.adaptive.enabled false + ``` + + See [issue-1059](https://github.com/NVIDIA/spark-rapids/issues/1059) for more detail. + +2. Dynamic partition pruning(DPP) does not work. This results in poor performance for queries which + would normally benefit from DPP. See + [issue-3143](https://github.com/NVIDIA/spark-rapids/issues/3143) for more detail. + +3. When selecting GPU nodes, Databricks requires the driver node to be a GPU node. Outside of + Databricks the plugin can operate with the driver as a CPU node and workers as GPU nodes. + +4. Cannot spin off multiple executors on a multi-GPU node. + Even though it is possible to set `spark.executor.resource.gpu.amount=N` (where N is the number + of GPUs per node) in the in Spark Configuration tab, Databricks overrides this to + `spark.executor.resource.gpu.amount=1`. This will result in failed executors when starting the + cluster. + +5. Databricks makes changes to the runtime without notification. + + Databricks makes changes to existing runtimes, applying patches, without notification. + [Issue-3098](https://github.com/NVIDIA/spark-rapids/issues/3098) is one example of this. We run + regular integration tests on the Databricks environment to catch these issues and fix them once + detected. + ## Start a Databricks Cluster Create a Databricks cluster by going to Clusters, then clicking `+ Create Cluster`. Ensure the cluster meets the prerequisites above by configuring it as follows: @@ -110,7 +142,7 @@ Spark plugin and the CUDA 11 toolkit. ```bash spark.rapids.sql.python.gpu.enabled true spark.python.daemon.module rapids.daemon_databricks - spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-21.08.0.jar:/databricks/spark/python + spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-21.10.0.jar:/databricks/spark/python ``` 7. Once you’ve added the Spark config, click “Confirm and Restart”. diff --git a/docs/get-started/getting-started-on-prem.md b/docs/get-started/getting-started-on-prem.md index 37e99917491..8d907669a24 100644 --- a/docs/get-started/getting-started-on-prem.md +++ b/docs/get-started/getting-started-on-prem.md @@ -55,8 +55,8 @@ CUDA and will not run on other versions. The jars use a maven classifier to keep - CUDA 11.0/11.1/11.2 => classifier cuda11 For example, here is a sample version of the jars and cudf with CUDA 11.0 support: -- cudf-21.08.2-cuda11.jar -- rapids-4-spark_2.12-21.08.0.jar +- cudf-21.10.0-cuda11.jar +- rapids-4-spark_2.12-21.10.0.jar jar that your version of the accelerator depends on. @@ -64,8 +64,8 @@ For simplicity export the location to these jars. This example assumes the sampl been placed in the `/opt/sparkRapidsPlugin` directory: ```shell export SPARK_RAPIDS_DIR=/opt/sparkRapidsPlugin -export SPARK_CUDF_JAR=${SPARK_RAPIDS_DIR}/cudf-21.08.2-cuda11.jar -export SPARK_RAPIDS_PLUGIN_JAR=${SPARK_RAPIDS_DIR}/rapids-4-spark_2.12-21.08.0.jar +export SPARK_CUDF_JAR=${SPARK_RAPIDS_DIR}/cudf-21.10.0-cuda11.jar +export SPARK_RAPIDS_PLUGIN_JAR=${SPARK_RAPIDS_DIR}/rapids-4-spark_2.12-21.10.0.jar ``` ## Install the GPU Discovery Script From 6120d69ea73a91968b06aeb80584c638460768f8 Mon Sep 17 00:00:00 2001 From: Rodney Howeedy Date: Thu, 21 Oct 2021 16:07:17 -0600 Subject: [PATCH 3/9] Update gh-pages branch 21.10 --- .../qualification-profiling-tools.md | 591 +- docs/spark-qualification-tool.md | 353 + docs/supported_ops.md | 15728 ++++++---------- 3 files changed, 6491 insertions(+), 10181 deletions(-) create mode 100644 docs/spark-qualification-tool.md diff --git a/docs/additional-functionality/qualification-profiling-tools.md b/docs/additional-functionality/qualification-profiling-tools.md index 03a6b72a90b..e61d0ac4574 100644 --- a/docs/additional-functionality/qualification-profiling-tools.md +++ b/docs/additional-functionality/qualification-profiling-tools.md @@ -1,325 +1,130 @@ --- layout: page -title: Qualification and Profiling tools -parent: Additional Functionality -nav_order: 7 +title: Spark Profiling tool +nav_order: 9 --- -# Spark Qualification and Profiling tools +# Spark Profiling tool -The qualification tool analyzes applications to determine if the RAPIDS Accelerator for Apache Spark -might be a good fit for those applications. - -The profiling tool generates information which can be used for debugging and profiling applications. -The information contains the Spark version, executor details, properties, etc. This runs on either CPU or -GPU generated event logs. - -This document covers below topics: +The Profiling tool analyzes both CPU or GPU generated event logs and generates information +which can be used for debugging and profiling Apache Spark applications. +The output information contains the Spark version, executor details, properties, etc. * TOC {:toc} -## Prerequisites -- Spark 3.0.1 or newer, the Qualification tool just needs the Spark jars and the Profiling tool - runs a Spark application so needs the Spark runtime. -- Java 8 or above -- Spark event log(s) from Spark 2.0 or above version. - Supports both rolled and compressed event logs with `.lz4`, `.lzf`, `.snappy` and `.zstd` suffixes as - well as Databricks-specific rolled and compressed(.gz) eventlogs. - The tool does not support nested directories. Event log files or event log directories should be - at the top level when specifying a directory. +## How to use the Profiling tool + +### Prerequisites +- Java 8 or above, Spark 3.0.1+ jars +- Spark event log(s) from Spark 2.0 or above version. Supports both rolled and compressed event logs + with `.lz4`, `.lzf`, `.snappy` and `.zstd` suffixes as well as + Databricks-specific rolled and compressed(.gz) event logs. +- The tool does not support nested directories. + Event log files or event log directories should be at the top level when specifying a directory. Note: Spark event logs can be downloaded from Spark UI using a "Download" button on the right side, or can be found in the location specified by `spark.eventLog.dir`. See the [Apache Spark Monitoring](http://spark.apache.org/docs/latest/monitoring.html) documentation for more information. -Optional: -- Maven installed - (only if you want to compile the jar yourself) -- hadoop-aws-.jar and aws-java-sdk-.jar - (only if any input event log is from S3) - -## Download the tools jar or compile it -You do not need to compile the jar yourself because you can download it from the Maven repository directly. - -Here are 2 options: -1. Download the jar file from [Maven repository](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/21.08.1/) - -2. Compile the jar from github repo -```bash -git clone https://github.com/NVIDIA/spark-rapids.git -cd spark-rapids -mvn -pl .,tools clean verify -DskipTests -``` -The jar is generated in below directory : - -`./tools/target/rapids-4-spark-tools_2.12-.jar` - -If any input is a S3 file path or directory path, 2 extra steps are needed to access S3 in Spark: -1. Download the matched jars based on the Hadoop version: - - `hadoop-aws-.jar` - - `aws-java-sdk-.jar` - - Take Hadoop 2.7.4 for example, we can download and include below jars in the '--jars' option to spark-shell or spark-submit: - [hadoop-aws-2.7.4.jar](https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-aws/2.7.4/hadoop-aws-2.7.4.jar) and - [aws-java-sdk-1.7.4.jar](https://repo.maven.apache.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar) - -2. In $SPARK_HOME/conf, create `hdfs-site.xml` with below AWS S3 keys inside: - -```xml - - - - fs.s3a.access.key - xxx - - - fs.s3a.secret.key - xxx - - -``` - -Please refer to this [doc](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html) on -more options about integrating hadoop-aws module with S3. - -## Qualification Tool - -### Qualification tool functions - -The qualification tool is used to look at a set of applications to determine if the RAPIDS Accelerator for Apache Spark -might be a good fit for those applications. The tool works by processing the CPU generated event logs from Spark. - -This tool is intended to give the users a starting point and does not guarantee the applications it scores highest -will actually be accelerated the most. Currently it works by looking at the amount of time spent in tasks of SQL -Dataframe operations. The more total task time doing SQL Dataframe operations the higher the score is and the more -likely the plugin will be able to help accelerate that application. The tool also looks for read data formats and types -that the plugin doesn't support and if it finds any not supported it will take away from the score (based on the -total task time in SQL Dataframe operations). - -Each application(event log) could have multiple SQL queries. If a SQL's plan has Dataset API inside such as keyword - `$Lambda` or `.apply`, that SQL query is categorized as a DataSet SQL query, otherwise it is a Dataframe SQL query. - -Note: the duration(s) reported are in milli-seconds. - -There are 2 output files from running the tool. One is a summary text file printing in order the applications most -likely to be good candidates for the GPU to the ones least likely. It outputs the application ID, duration, -the SQL Dataframe duration and the SQL duration spent when we found SQL queries with potential problems. It also -outputs this same report to STDOUT. -The other file is a CSV file that contains more information and can be used for further post processing. - -Note, potential problems are reported in the CSV file in a separate column, which is not included in the score. This -currently includes some UDFs and some decimal operations. The tool won't catch all UDFs, and some of the UDFs can be -handled with additional steps. Please refer to [supported_ops.md](../supported_ops.md) for more details on UDF. -For decimals, it tries to recognize decimal operations but it may not catch them all. - -The CSV output also contains a `Executor CPU Time Percent` column that is not included in the score. This is an estimate -at how much time the tasks spent doing processing on the CPU vs waiting on IO. This is not always a good indicator -because sometimes you may be doing IO that is encrypted and the CPU has to do work to decrypt it, so the environment -you are running on needs to be taken into account. - -`App Duration Estimated` is used to indicate if we had to estimate the application duration. If we -had to estimate it, it means the event log was missing the application finished event so we will use the last job -or sql execution time we find as the end time used to calculate the duration. - -Note that SQL queries that contain failed jobs are not included. - -Sample output in csv: - -``` -App Name,App ID,Score,Potential Problems,SQL DF Duration,SQL Dataframe Task Duration,App Duration,Executor CPU Time Percent,App Duration Estimated,SQL Duration with Potential Problems,SQL Ids with Failures,Read Score Percent,Read File Format Score,Unsupported Read File Formats and Types -job3,app-20210507174503-1704,4320658.0,"",9569,4320658,26171,35.34,false,0,"",20,100.0,"" -job1,app-20210507174503-2538,19864.04,"",6760,21802,83728,71.3,false,0,"",20,55.56,"Parquet[decimal]" -``` - -Sample output in text: +### Step 1 Download the tools jar and Apache Spark 3 distribution +The Profiling tool requires the Spark 3.x jars to be able to run but do not need an Apache Spark run time. +If you do not already have Spark 3.x installed, +you can download the Spark distribution to any machine and include the jars in the classpath. +- Download the jar file from [Maven repository](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/21.10.0/) +- [Download Apache Spark 3.x](http://spark.apache.org/downloads.html) - Spark 3.1.1 for Apache Hadoop is recommended +If you want to compile the jars, please refer to the instructions [here](./spark-qualification-tool.md#How-to-compile-the-tools-jar). -``` -=========================================================================== -| App ID|App Duration|SQL DF Duration|Problematic Duration| -=========================================================================== -|app-20210507174503-2538| 26171| 9569| 0| -|app-20210507174503-1704| 83738| 6760| 0| -``` - -### Download the Spark 3 distribution -The Qualification tool requires the Spark 3.x jars to be able to run. If you do not already have -Spark 3.x installed, you can download the Spark distribution to any machine and include the jars -in the classpath. - -1. [Download Apache Spark 3.x](http://spark.apache.org/downloads.html) - Spark 3.1.1 for Apache Hadoop is recommended -2. Extract the Spark distribution into a local directory. -3. Either set `SPARK_HOME` to point to that directory or just put the path inside of the classpath - `java -cp toolsJar:pathToSparkJars/*:...` when you run the qualification tool. See the - [How to use qualification tool](#How-to-use-qualification-tool) section below. - -### How to use qualification tool -This tool parses the Spark CPU event log(s) and creates an output report. +### Step 2 How to run the Profiling tool +This tool parses the Spark CPU or GPU event log(s) and creates an output report. +We need to extract the Spark distribution into a local directory if necessary. +Either set `SPARK_HOME` to point to that directory or just put the path inside of the +classpath `java -cp toolsJar:pathToSparkJars/*:...` when you run the Profiling tool. Acceptable input event log paths are files or directories containing spark events logs -in the local filesystem, HDFS, S3 or mixed. Note that if you are on an HDFS cluster -the default filesystem is likely HDFS for both the input and output so if you want to -point to the local filesystem be sure to include `file:` in the path - -```bash -Usage: java -cp rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/* - com.nvidia.spark.rapids.tool.qualification.QualificationMain [options] - -``` - -Example running on files in HDFS: (include $HADOOP_CONF_DIR in classpath) - -```bash -java -cp ~/rapids-4-spark-tools_2.12-21..jar:$SPARK_HOME/jars/*:$HADOOP_CONF_DIR/ \ - com.nvidia.spark.rapids.tool.qualification.QualificationMain /eventlogDir -``` - -### Qualification tool options - - Note: `--help` should be before the trailing event logs. - -```bash -java -cp ~/rapids-4-spark-tools_2.12-21..jar:$SPARK_HOME/jars/*:$HADOOP_CONF_DIR/ \ - com.nvidia.spark.rapids.tool.qualification.QualificationMain --help - -RAPIDS Accelerator for Apache Spark qualification tool - -Usage: java -cp rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/* - com.nvidia.spark.rapids.tool.qualification.QualificationMain [options] - - - -a, --application-name Filter event logs whose application name - matches exactly or is a substring of input - string. Regular expressions not - supported. For filtering based on complement - of application name, use ~APPLICATION_NAME. - i.e Select all event logs except the ones - which have application name as the input - string. - -f, --filter-criteria Filter newest or oldest N eventlogs based on - application start timestamp, unique - application name or filesystem timestamp. - Filesystem based filtering happens before - any application based filtering. For - application based filtering, the order in - which filters are applied is: - application-name, start-app-time, - filter-criteria. Application based - filter-criteria are: 100-newest (for - processing newest 100 event logs based on - timestamp insidethe eventlog) i.e - application start time) 100-oldest (for - processing oldest 100 event logs based on - timestamp insidethe eventlog) i.e - application start time) - 100-newest-per-app-name (select at most 100 - newest log files for each unique application - name) 100-oldest-per-app-name (select at - most 100 oldest log files for each unique - application name). Filesystem based filter - criteria are: 100-newest-filesystem (for - processing newest 100 event logs based on - filesystem timestamp). 100-oldest-filesystem - (for processing oldest 100 event logsbased - on filesystem timestamp). - -m, --match-event-logs Filter event logs whose filenames contain - the input string. Filesystem based filtering - happens before any application based - filtering. - -n, --num-output-rows Number of output rows in the summary report. - Default is 1000. - --num-threads Number of thread to use for parallel - processing. The default is the number of - cores on host divided by 4. - --order Specify the sort order of the report. desc - or asc, desc is the default. desc - (descending) would report applications most - likely to be accelerated at the top and asc - (ascending) would show the least likely to - be accelerated at the top. - -o, --output-directory Base output directory. Default is current - directory for the default filesystem. The - final output will go into a subdirectory - called rapids_4_spark_qualification_output. - It will overwrite any existing directory - with the same name. - -r, --read-score-percent The percent the read format and datatypes - apply to the score. Default is 20 percent. - --report-read-schema Whether to output the read formats and - datatypes to the CSV file. This can be very - long. Default is false. - -s, --start-app-time Filter event logs whose application start - occurred within the past specified time - period. Valid time periods are - min(minute),h(hours),d(days),w(weeks),m(months). - If a period is not specified it defaults to - days. - -t, --timeout Maximum time in seconds to wait for the - event logs to be processed. Default is 24 - hours (86400 seconds) and must be greater - than 3 seconds. If it times out, it will - report what it was able to process up until - the timeout. - -h, --help Show help message - - trailing arguments: - eventlog (required) Event log filenames(space separated) or directories - containing event logs. eg: s3a:///eventlog1 - /path/to/eventlog2 -``` - -Example commands: -- Process the 10 newest logs, and only output the top 3 in the output: - -```bash -java -cp ~/rapids-4-spark-tools_2.12-21..jar:$SPARK_HOME/jars/*:$HADOOP_CONF_DIR/ \ - com.nvidia.spark.rapids.tool.qualification.QualificationMain -f 10-newest -n 3 /eventlogDir -``` - -- Process last 100 days' logs: - -```bash -java -cp ~/rapids-4-spark-tools_2.12-21..jar:$SPARK_HOME/jars/*:$HADOOP_CONF_DIR/ \ - com.nvidia.spark.rapids.tool.qualification.QualificationMain -s 100d /eventlogDir -``` - -- Process only the newest log with the same application name: - -```bash -java -cp ~/rapids-4-spark-tools_2.12-21..jar:$SPARK_HOME/jars/*:$HADOOP_CONF_DIR/ \ - com.nvidia.spark.rapids.tool.qualification.QualificationMain -f 1-newest-per-app-name /eventlogDir -``` - -### Qualification tool output -The summary report goes to STDOUT and by default it outputs 2 files under sub-directory -`./rapids_4_spark_qualification_output/` that contain the processed applications. The output will -go into your default filesystem, it supports local filesystem or HDFS. Note that if you are on an -HDFS cluster the default filesystem is likely HDFS for both the input and output so if you want to -point to the local filesystem be sure to include `file:` in the path - -The output location can be changed using the `--output-directory` option. Default is current directory. - -It will output a text file with the name `rapids_4_spark_qualification_output.log` that is a summary report and -it will output a CSV file named `rapids_4_spark_qualification_output.csv` that has more data in it. - -## Profiling Tool - -The profiling tool generates information which can be used for debugging and profiling applications. -It will run a Spark application so requires Spark to be installed and setup. If you have a cluster already setup -you can run it on that, or you can simply run it in local mode as well. See the Apache Spark documentation -for [Downloading Apache Spark 3.x](http://spark.apache.org/downloads.html) - -### Profiling tool functions - -#### A. Collect Information or Compare Information - -Note: Compare mode is enabled by `-c` option if more than 1 event logs are as input. +in the local filesystem, HDFS, S3 or mixed. +Please note, if processing a lot of event logs use combined or compare mode. +Both these modes may need you to increase the java heap size using `-Xmx` option. +For instance, to specify 30 GB heap size `java -Xmx30g`. + +There are 3 modes of operation for the Profiling tool: + 1. Collection Mode: + Collection mode is the default mode when no other options are specified it simply collects information + on each application individually and outputs a file per application + + ```bash + Usage: java -cp rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/* + com.nvidia.spark.rapids.tool.profiling.ProfileMain [options] + + ``` + + 2. Combined Mode: + Combined mode is collection mode but then combines all the applications + together and you get one file for all applications. + + ```bash + Usage: java -cp rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/* + com.nvidia.spark.rapids.tool.profiling.ProfileMain --combined + + ``` + + 3. Compare Mode: + Compare mode will combine all the applications information in the same tables into a single file + and also adds in tables to compare stages and sql ids across all of those applications. + The Compare mode will use more memory if comparing lots of applications. + + ```bash + Usage: java -cp rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/* + com.nvidia.spark.rapids.tool.profiling.ProfileMain --compare + + ``` + Note that if you are on an HDFS cluster the default filesystem is likely HDFS for both the input and output + so if you want to point to the local filesystem be sure to include `file:` in the path. + + Example running on files in HDFS: (include $HADOOP_CONF_DIR in classpath) + + ```bash + java -cp ~/rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/*:$HADOOP_CONF_DIR/ \ + com.nvidia.spark.rapids.tool.profiling.ProfileMain /eventlogDir + ``` + +## Understanding Profiling tool detailed output and examples +The default output location is the current directory. +The output location can be changed using the `--output-directory` option. +The output goes into a sub-directory named `rapids_4_spark_profile/` inside that output location. +If running in normal collect mode, it processes event log individually and outputs files for each application under +a directory named `rapids_4_spark_profile/{APPLICATION_ID}`. It creates a summary text file named `profile.log`. +If running combine mode the output is put under a directory named `rapids_4_spark_profile/combined/` and creates a summar +text file named `rapids_4_spark_tools_combined.log`. +If running compare mode the output is put under a directory named `rapids_4_spark_profile/compare/` and creates a summary +text file named `rapids_4_spark_tools_compare.log`. +The output will go into your default filesystem, it supports local filesystem or HDFS. +Note that if you are on an HDFS cluster the default filesystem is likely HDFS for both the input and output +so if you want to point to the local filesystem be sure to include `file:` in the path. +There are separate files that are generated under the same sub-directory when using the options to generate query +visualizations or printing the SQL plans. +Optionally if the `--csv` option is specified then it creates a csv file for each table for each application in the +corresponding sub-directory. + +There is a 100 characters limit for each output column. +If the result of the column exceeds this limit, it is suffixed with ... for that column. + +ResourceProfile ids are parsed for the event logs that are from Spark 3.1 or later. +A ResourceProfile allows the user to specify executor and task requirements +for an RDD that will get applied during a stage. +This allows the user to change the resource requirements between stages. + +Run `--help` for more information. +#### A. Collect Information or Compare Information(if more than 1 event logs are as input and option -c is specified) - Application information +- Data Source information - Executors information +- Job, stage and SQL ID information - Rapids related parameters - Rapids Accelerator Jar and cuDF Jar -- Job, stage and SQL ID information (not in `compare` mode yet) - SQL Plan Metrics +- Compare Mode: Matching SQL IDs Across Applications +- Compare Mode: Matching Stage IDs Across Applications - Optionally : SQL Plan for each SQL query - Optionally : Generates DOT graphs for each SQL query - Optionally : Generates timeline graph for application @@ -332,21 +137,21 @@ We can input multiple Spark event logs and this tool can compare environments, e ``` -### A. Compare Information Collected ### -Compare Application Information: +### A. Information Collected ### +Application Information: -+--------+-----------+-----------------------+-------------+-------------+--------+-----------+------------+-------------+ -|appIndex|appName |appId |startTime |endTime |duration|durationStr|sparkVersion|pluginEnabled| -+--------+-----------+-----------------------+-------------+-------------+--------+-----------+------------+-------------+ -|1 |Spark shell|app-20210329165943-0103|1617037182848|1617037490515|307667 |5.1 min |3.0.1 |false | -|2 |Spark shell|app-20210329170243-0018|1617037362324|1617038578035|1215711 |20 min |3.0.1 |true | -+--------+-----------+-----------------------+-------------+-------------+--------+-----------+------------+-------------+ ++--------+-----------+-----------------------+---------+-------------+-------------+--------+-----------+------------+------------- +|appIndex|appName |appId |sparkUser|startTime |endTime |duration|durationStr|sparkVersion|pluginEnabled| ++--------+-----------+-----------------------+---------+-------------+-------------+--------+-----------+------------+------------- +|1 |Spark shell|app-20210329165943-0103|user1 |1617037182848|1617037490515|307667 |5.1 min |3.0.1 |false | +|2 |Spark shell|app-20210329170243-0018|user1 |1617037362324|1617038578035|1215711 |20 min |3.0.1 |true | ++--------+-----------+-----------------------+---------+-------------+-------------+--------+-----------+------------+-------------+ ``` -- Compare Executor information: +- Executor information: ``` -Compare Executor Information: +Executor Information: +--------+-----------------+------------+-------------+-----------+------------+-------------+--------------+------------------+---------------+-------+-------+ |appIndex|resourceProfileId|numExecutors|executorCores|maxMem |maxOnHeapMem|maxOffHeapMem|executorMemory|numGpusPerExecutor|executorOffHeap|taskCpu|taskGpu| +--------+-----------------+------------+-------------+-----------+------------+-------------+--------------+------------------+---------------+-------+-------+ @@ -355,6 +160,28 @@ Compare Executor Information: +--------+-----------------+------------+-------------+-----------+------------+-------------+-------------+--------------+------------------+---------------+-------+-------+ ``` +- Data Source information +The details of this output differ between using a Spark Data Source V1 and Data Source V2 reader. +The Data Source V2 truncates the schema, so if you see `...`, then +the full schema is not available. + +``` +Data Source Information: ++--------+-----+-------+---------------------------------------------------------------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------------------------+ +|appIndex|sqlID|format |location |pushedFilters |schema | ++--------+-----+-------+---------------------------------------------------------------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------------------------+ +|1 |0 |Text |InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/integration_tests/src/test/resources/trucks-comments.csv]|[] |value:string | +|1 |1 |csv |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/integration_tests/src/test/re... |PushedFilters: []|_c0:string | +|1 |2 |parquet|Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/lotscolumnsout] |PushedFilters: []|loan_id:bigint,monthly_reporting_period:string,servicer:string,interest_rate:double,curren...| +|1 |3 |parquet|Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/lotscolumnsout] |PushedFilters: []|loan_id:bigint,monthly_reporting_period:string,servicer:string,interest_rate:double,curren...| +|1 |4 |orc |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/logscolumsout.orc] |PushedFilters: []|loan_id:bigint,monthly_reporting_period:string,servicer:string,interest_rate:double,curren...| +|1 |5 |orc |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/logscolumsout.orc] |PushedFilters: []|loan_id:bigint,monthly_reporting_period:string,servicer:string,interest_rate:double,curren...| +|1 |6 |json |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/lotsofcolumnsout.json] |PushedFilters: []|adj_remaining_months_to_maturity:double,asset_recovery_costs:double,credit_enhancement_pro...| +|1 |7 |json |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/lotsofcolumnsout.json] |PushedFilters: []|adj_remaining_months_to_maturity:double,asset_recovery_costs:double,credit_enhancement_pro...| +|1 |8 |json |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/lotsofcolumnsout.json] |PushedFilters: []|adj_remaining_months_to_maturity:double,asset_recovery_costs:double,credit_enhancement_pro...| ++--------+-----+-------+---------------------------------------------------------------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------------------------+ +``` + - Matching SQL IDs Across Applications: ``` @@ -424,19 +251,25 @@ Compare Rapids Properties which are set explicitly: ``` Rapids Accelerator Jar and cuDF Jar: -/path/rapids-4-spark_2.12-0.5.0.jar -/path/cudf-0.19-cuda10-2.jar ++--------+------------------------------------------------------------+ +|appIndex|Rapids4Spark jars | ++--------+------------------------------------------------------------+ +|1 |spark://10.10.10.10:43445/jars/cudf-0.19.2-cuda11.jar | +|1 |spark://10.10.10.10:43445/jars/rapids-4-spark_2.12-0.5.0.jar| +|2 |spark://10.10.10.11:41319/jars/cudf-0.19.2-cuda11.jar | +|2 |spark://10.10.10.11:41319/jars/rapids-4-spark_2.12-0.5.0.jar| ++--------+------------------------------------------------------------+ ``` - Job, stage and SQL ID information(not in `compare` mode yet): ``` -+--------+-----+------------+-----+ -|appIndex|jobID|stageIds |sqlID| -+--------+-----+------------+-----+ -|1 |0 |[0] |null | -|1 |1 |[1, 2, 3, 4]|0 | -+--------+-----+------------+-----+ ++--------+-----+---------+-----+ +|appIndex|jobID|stageIds |sqlID| ++--------+-----+---------+-----+ +|1 |0 |[0] |null | +|1 |1 |[1,2,3,4]|0 | ++--------+-----+---------+-----+ ``` - SQL Plan Metrics for Application for each SQL plan node in each SQL: @@ -457,9 +290,9 @@ SQL Plan Metrics for Application: ``` - Print SQL Plans (-p option): -Prints the SQL plan as a text string to a file prefixed with `planDescriptions-`. +Prints the SQL plan as a text string to a file named `planDescriptions.log`. For example if your application id is app-20210507103057-0000, then the -filename will be `planDescriptions-app-20210507103057-0000` +filename will be `planDescriptions.log` - Generate DOT graph for each SQL (-g option): @@ -495,7 +328,7 @@ If a stage hs no metrics, like if the query crashed early, we cannot establish t - Generate timeline for application (--generate-timeline option): The output of this is an [svg](https://en.wikipedia.org/wiki/Scalable_Vector_Graphics) file -named `${APPLICATION_ID}-timeline.svg`. Most web browsers can display this file. It is a +named `timeline.svg`. Most web browsers can display this file. It is a timeline view similar Apache Spark's [event timeline](https://spark.apache.org/docs/latest/web-ui.html). @@ -526,7 +359,8 @@ stage. Jobs and SQL are not color coordinated. - SQL duration, application during, if it contains a Dataset operation, potential problems, executor CPU time percent - Shuffle Skew Check: (When task's Shuffle Read Size > 3 * Avg Stage-level size) -Below we will aggregate the task level metrics at different levels to do some analysis such as detecting possible shuffle skew. +Below we will aggregate the task level metrics at different levels +to do some analysis such as detecting possible shuffle skew. - Job + Stage level aggregated task metrics: @@ -576,6 +410,7 @@ Shuffle Skew Check: (When task's Shuffle Read Size > 3 * Avg Stage-level size) #### C. Health Check - List failed tasks, stages and jobs +- Removed BlockManagers and Executors - SQL Plan HealthCheck Below are examples. @@ -629,55 +464,22 @@ Failed jobs: +--------+-----+------+--------+---------------------------------------------------------------------------------------------------+ ``` -### Profiling tool metrics definitions -All the metrics definitions can be found in the [executor task metrics doc](https://spark.apache.org/docs/latest/monitoring.html#executor-task-metrics) / [executor metrics doc](https://spark.apache.org/docs/latest/monitoring.html#executor-metrics) or the [SPARK webUI doc](https://spark.apache.org/docs/latest/web-ui.html#content). - -### How to use profiling tool -This tool parses the Spark CPU or GPU event log(s) and creates an output report. -Acceptable input event log paths are files or directories containing spark events logs -in the local filesystem, HDFS, S3 or mixed. Note that if you are on an -HDFS cluster the default filesystem is likely HDFS for both the input and output so if you want to -point to the local filesystem be sure to include `file:` in the path - -#### Use from spark-shell -1. Include `rapids-4-spark-tools_2.12-.jar` in the '--jars' option to spark-shell or spark-submit -2. After starting spark-shell: - -For a single event log analysis: - -```bash -com.nvidia.spark.rapids.tool.profiling.ProfileMain.main(Array("/path/to/eventlog1")) -``` - -For multiple event logs comparison and analysis: +## Profiling tool options ```bash -com.nvidia.spark.rapids.tool.profiling.ProfileMain.main(Array("/path/to/eventlog1", "/path/to/eventlog2")) -``` - -#### Use from spark-submit - -```bash -$SPARK_HOME/bin/spark-submit --class com.nvidia.spark.rapids.tool.profiling.ProfileMain \ -rapids-4-spark-tools_2.12-.jar \ -/path/to/eventlog1 /path/to/eventlog2 /directory/with/eventlogs -``` - -### Profiling tool options - - Note: `--help` should be before the trailing event logs. - -```bash -$SPARK_HOME/bin/spark-submit \ ---class com.nvidia.spark.rapids.tool.profiling.ProfileMain \ -rapids-4-spark-tools_2.12-.jar \ ---help - +RAPIDS Accelerator for Apache Spark Profiling tool -For usage see below: +Usage: java -cp rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/* + com.nvidia.spark.rapids.tool.profiling.ProfileMain [options] + - -c, --compare Compare Applications (Recommended to compare - less than 10 applications). Default is false + --combined Collect mode but combine all applications into + the same tables. + -c, --compare Compare Applications (Note this may require + more memory if comparing a large number of + applications). Default is false. + --csv Output each table to a CSV file as well + creating the summary text file. -f, --filter-criteria Filter newest or oldest N eventlogs for processing.eg: 100-newest-filesystem (for processing newest 100 event logs). eg: @@ -691,14 +493,23 @@ For usage see below: input string -n, --num-output-rows Number of output rows for each Application. Default is 1000 + --num-threads Number of thread to use for parallel + processing. The default is the number of cores + on host divided by 4. -o, --output-directory Base output directory. Default is current directory for the default filesystem. The final output will go into a subdirectory called rapids_4_spark_profile. It will overwrite any existing files with the same name. - -p, --print-plans Print the SQL plans to a file starting with - 'planDescriptions-'. Default is false + -p, --print-plans Print the SQL plans to a file named + 'planDescriptions.log'. + Default is false. + -t, --timeout Maximum time in seconds to wait for the event + logs to be processed. Default is 24 hours + (86400 seconds) and must be greater than 3 + seconds. If it times out, it will report what + it was able to process up until the timeout. -h, --help Show help message trailing arguments: @@ -707,41 +518,9 @@ For usage see below: /path/to/eventlog2 ``` -Example commands: -- Process 10 newest logs with filenames containing "local": - -```bash -$SPARK_HOME/bin/spark-submit --class com.nvidia.spark.rapids.tool.profiling.ProfileMain \ -rapids-4-spark-tools_2.12-.jar \ --m "local" -f "10-newest-filesystem" \ -/directory/with/eventlogs/ -``` - -- Print SQL plans, generate dot files and also generate timeline(SVG graph): +## Profiling tool metrics definitions -```bash -$SPARK_HOME/bin/spark-submit --class com.nvidia.spark.rapids.tool.profiling.ProfileMain \ -rapids-4-spark-tools_2.12-.jar \ --p -g --generate-timeline \ -/directory/with/eventlogs/ -``` - -### Profiling tool output -By default this outputs a log file under sub-directory `./rapids_4_spark_profile` named -`rapids_4_spark_tools_output.log` that contains the processed applications. The output will go into your -default filesystem, it supports local filesystem or HDFS. Note that if you are on an -HDFS cluster the default filesystem is likely HDFS for both the input and output so if you want to -point to the local filesystem be sure to include `file:` in the path -There are separate files that are generated under the same sub-directory when using the options to generate -query visualizations or printing the SQL plans. - -The output location can be changed using the `--output-directory` option. Default is current directory. - -There is a 100 characters limit for each output column. If the result of the column exceeds this limit, it is suffixed with ... for that column. - -ResourceProfile ids are parsed for the event logs that are from Spark 3.1 or later. ResourceProfileId column is added in the output table for such event logs. -A ResourceProfile allows the user to specify executor and task requirements for an RDD that will get applied during a stage. This allows the user to change the resource requirements between stages. - -Note: We suggest you also save the output of the `spark-submit` or `spark-shell` to a log file for troubleshooting. - -Run `--help` for more information. +All the metrics definitions can be found in the +[executor task metrics doc](https://spark.apache.org/docs/latest/monitoring.html#executor-task-metrics) / +[executor metrics doc](https://spark.apache.org/docs/latest/monitoring.html#executor-metrics) or +the [SPARK webUI doc](https://spark.apache.org/docs/latest/web-ui.html#content). \ No newline at end of file diff --git a/docs/spark-qualification-tool.md b/docs/spark-qualification-tool.md new file mode 100644 index 00000000000..e61cfa727a1 --- /dev/null +++ b/docs/spark-qualification-tool.md @@ -0,0 +1,353 @@ +--- +layout: page +title: Spark Qualification tool +nav_order: 8 +--- + +# Spark Qualification tool + +The Qualification tool analyzes event logs generated from CPU based Spark applications to determine +if the RAPIDS Accelerator for Apache Spark might be a good fit for GPU acceleration. + +This tool is intended to give the users a starting point and does not guarantee the +applications with the highest scores will actually be accelerated the most. Currently, +it reports by looking at the amount of time spent in tasks of SQL Dataframe operations. +This document covers below topics: + +* TOC +{:toc} + +## How to use the Qualification tool + +### Prerequisites +- Java 8 or above, Spark 3.0.1+ jars +- Spark event log(s) from Spark 2.0 or above version. Supports both rolled and compressed event logs + with `.lz4`, `.lzf`, `.snappy` and `.zstd` suffixes as well as Databricks-specific rolled and compressed(.gz) event logs. +- The tool does not support nested directories. + Event log files or event log directories should be at the top level when specifying a directory. + +Note: Spark event logs can be downloaded from Spark UI using a "Download" button on the right side, +or can be found in the location specified by `spark.eventLog.dir`. See the +[Apache Spark Monitoring](http://spark.apache.org/docs/latest/monitoring.html) documentation for +more information. + +### Step 1 Download the tools jar and Apache Spark 3 Distribution +The Qualification tools require the Spark 3.x jars to be able to run but do not need an Apache Spark run time. +If you do not already have Spark 3.x installed, you can download the Spark distribution to +any machine and include the jars in the classpath. +- Download the jar file from [Maven repository](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/21.10.0/) +- [Download Apache Spark 3.x](http://spark.apache.org/downloads.html) - Spark 3.1.1 for Apache Hadoop is recommended + +### Step 2 Run the Qualification tool +1. Event logs stored on a local machine: + - Extract the Spark distribution into a local directory if necessary. + - Either set SPARK_HOME to point to that directory or just put the path inside of the classpath + `java -cp toolsJar:pathToSparkJars/*:...` when you run the Qualification tool. + + This tool parses the Spark CPU event log(s) and creates an output report. Acceptable inputs are either individual or + multiple event logs files or directories containing spark event logs in the local filesystem, HDFS, S3 or mixed. + + ```bash + Usage: java -cp rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/* + com.nvidia.spark.rapids.tool.qualification.QualificationMain [options] + + ``` + + ```bash + Sample: java -cp rapids-4-spark-tools_2.12-21.10.jar:$SPARK_HOME/jars/* + com.nvidia.spark.rapids.tool.qualification.QualificationMain /usr/logs/app-name1 + ``` + +2. Event logs stored on an on-premises HDFS cluster: + + Example running on files in HDFS: (include $HADOOP_CONF_DIR in classpath) + + ```bash + Usage: java -cp ~/rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/*:$HADOOP_CONF_DIR/ \ + com.nvidia.spark.rapids.tool.qualification.QualificationMain /eventlogDir + ``` + + Note, on an HDFS cluster, the default filesystem is likely HDFS for both the input and output + so if you want to point to the local filesystem be sure to include file: in the path. + +## Understanding the Qualification tool Output +After the above command is executed, the summary report goes to STDOUT and by default it outputs 2 files +under `./rapids_4_spark_qualification_output/` that contain the processed applications. +The output will go into your default filesystem and it supports both local filesystem and HDFS. +Note that if you are on an HDFS cluster the default filesystem is likely HDFS for both the input and output. +If you want to point to the local filesystem be sure to include `file:` in the path + +First output is a summary that is both printed on the STDOUT and saved as a text file. +If there are more than one application that are currently analyzed, +the applications that are reported on top are likely to be good candidates for the GPU +when there is no problematic duration reported for those same applications. +For more information on the ordering, please refer to the section below that explains the “score” for each application. + +It outputs the following information on the terminal: +1. Application ID +2. Application duration +3. SQL/DF duration +4. Problematic Duration, which indicates potential issues for acceleration. + Some of the potential issues include unsupported data formats such as Decimal 128-bit + or User Defined Function (UDF) or any Dataset APIs. + +Note: the duration(s) reported are in milli-seconds. +Sample output in text: +``` +=========================================================================== +| App ID|App Duration|SQL DF Duration|Problematic Duration| +=========================================================================== +|app-20210507174503-2538| 26171| 9569| 0| +|app-20210507174503-1704| 83738| 6760| 0| +``` + +In the above example, two application event logs were analyzed. “app-20210507174503-2538” is rated higher +than the “app-20210507174503-1704” because the score(in the csv output) for “app-20210507174503-2538” +is higher than “app-20210507174503-1704”. +Here the `Problematic Duration` is zero but please keep in mind that we are only able to detect certain issues. +This currently includes some UDFs, some decimal operations and nested complex types. +The tool won't catch all UDFs, and some of the UDFs can be handled with additional steps. + +Please refer to [supported_ops.md](./supported_ops.md) +for more details on UDF. +For decimals, the tool tries to parse for decimal operations but it may not capture all of the decimal operations +if they aren’t in the event logs. + +The second output is a CSV file that contains more information and can be used for further post processing. +Here is a sample output for csv file: +``` +App Name,App ID,Score,Potential Problems,SQL DF Duration,SQL Dataframe Task Duration,App Duration,Executor CPU Time Percent,App Duration Estimated,SQL Duration with Potential Problems,SQL Ids with Failures,Read Score Percent,Read File Format Score,Unsupporte +job3,app-20210507174503-1704,4320658.0,"",9569,4320658,26171,35.34,false,0,"",20,100.0,"",JSON,array>;map,array> +job1,app-20210507174503-2538,19864.04,"",6760,21802,83728,71.3,false,0,"",20,55.56,"Parquet[decimal]",JSON;CSV,"","" +``` + +As you can see on the above sample csv output, we have more columns than the STDOUT summary. +Here is a brief description of each of column that is in the CSV: + +1. App Name: Spark Application Name. +2. App ID: Spark Application ID. +3. Score : A score calculated based on SQL Dataframe Task Duration and gets negatively affected for any unsupported operators. + Please refer to [Qualification tool score algorithm](#Qualification-tool-score-algorithm) for more details. +4. Potential Problems : Some UDFs, some decimal operations and nested complex types. +5. SQL DF Duration: Time duration that includes only SQL/Dataframe queries. +6. SQL Dataframe Task Duration: Amount of time spent in tasks of SQL Dataframe operations. +7. App Duration: Total Application time. +8. Executor CPU Time Percent: This is an estimate at how much time the tasks spent doing processing on the CPU vs waiting on IO. + This is not always a good indicator because sometimes the IO that is encrypted and the CPU has to do work to decrypt it, + so the environment you are running on needs to be taken into account. +9. App Duration Estimated: True or False indicates if we had to estimate the application duration. + If we had to estimate it, the value will be `True` and it means the event log was missing the application finished + event so we will use the last job or sql execution time we find as the end time used to calculate the duration. +10. SQL Duration with Potential Problems : Time duration of any SQL/DF operations that contains + operations we consider potentially problematic. +11. SQL Ids with Failures: SQL Ids of queries with failed jobs. +12. Read Score Percent: The value of the parameter `--read-score-percent` when the Qualification tool was run. Default is 20 percent. +13. Read File Format Score: A score given based on whether the read file formats and types are supported. +14. Unsupported Read File Formats and Types: Looks at the Read Schema and + reports the file formats along with types which may not be fully supported. + Example: Parquet[decimal]. Note that this is based on the current version of the plugin and + future versions may add support for more file formats and types. +15. Unsupported Write Data Format: Reports the data format which we currently don’t support, i.e. + if the result is written in JSON or CSV format. +16. Complex Types: Looks at the Read Schema and reports if there are any complex types(array, struct or maps) in the schema. +17. Unsupported Nested Complex Types: Nested complex types are complex types which + contain other complex types (Example: `array>`). + Note that it can read all the schemas for DataSource V1. The Data Source V2 truncates the schema, + so if you see ..., then the full schema is not available. + For such schemas we read until ... and report if there are any complex types and nested complex types in that. + +Note that SQL queries that contain failed jobs are not included. + +## Qualification tool options + Note: `--help` should be before the trailing event logs. + +```bash +java -cp ~/rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/*:$HADOOP_CONF_DIR/ \ + com.nvidia.spark.rapids.tool.qualification.QualificationMain --help + +RAPIDS Accelerator for Apache Spark Qualification tool + +Usage: java -cp rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/* + com.nvidia.spark.rapids.tool.qualification.QualificationMain [options] + + + -a, --application-name Filter event logs by application name. The + string specified can be a regular expression, + substring, or exact match. For filtering based + on complement of application name, + use ~APPLICATION_NAME. i.e Select all event + logs except the ones which have application + name as the input string. + --any Apply multiple event log filtering criteria and process + only logs for which any condition is satisfied. + Example: --any -> result + is OR OR + --all Apply multiple event log filtering criteria and process + only logs for which all conditions are satisfied. + Example: --all -> result + is AND AND . Default is all=true. + -f, --filter-criteria Filter newest or oldest N eventlogs based on + application start timestamp, unique + application name or filesystem timestamp. + Filesystem based filtering happens before + any application based filtering. For + application based filtering, the order in + which filters are applied is: + application-name, start-app-time, + filter-criteria. Application based + filter-criteria are: 100-newest (for + processing newest 100 event logs based on + timestamp insidethe eventlog) i.e + application start time) 100-oldest (for + processing oldest 100 event logs based on + timestamp insidethe eventlog) i.e + application start time) + 100-newest-per-app-name (select at most 100 + newest log files for each unique application + name) 100-oldest-per-app-name (select at + most 100 oldest log files for each unique + application name). Filesystem based filter + criteria are: 100-newest-filesystem (for + processing newest 100 event logs based on + filesystem timestamp). 100-oldest-filesystem + (for processing oldest 100 event logsbased + on filesystem timestamp). + -m, --match-event-logs Filter event logs whose filenames contain + the input string. Filesystem based filtering + happens before any application based + filtering. + -n, --num-output-rows Number of output rows in the summary report. + Default is 1000. + --num-threads Number of thread to use for parallel + processing. The default is the number of + cores on host divided by 4. + --order Specify the sort order of the report. desc + or asc, desc is the default. desc + (descending) would report applications most + likely to be accelerated at the top and asc + (ascending) would show the least likely to + be accelerated at the top. + -o, --output-directory Base output directory. Default is current + directory for the default filesystem. The + final output will go into a subdirectory + called rapids_4_spark_qualification_output. + It will overwrite any existing directory + with the same name. + -r, --read-score-percent The percent the read format and datatypes + apply to the score. Default is 20 percent. + --report-read-schema Whether to output the read formats and + datatypes to the CSV file. This can be very + long. Default is false. + -s, --start-app-time Filter event logs whose application start + occurred within the past specified time + period. Valid time periods are + min(minute),h(hours),d(days),w(weeks),m(months). + If a period is not specified it defaults to + days. + --spark-property Filter applications based on certain Spark properties that were + set during launch of the application. It can filter based + on key:value pair or just based on keys. Multiple configs + can be provided where the filtering is done if any of the + config is present in the eventlog. + filter on specific configuration(key:value): + --spark-property=spark.eventLog.enabled:true + filter all eventlogs which has config(key): + --spark-property=spark.driver.port + Multiple configs: + --spark-property=spark.eventLog.enabled:true + --spark-property=spark.driver.port. + -t, --timeout Maximum time in seconds to wait for the + event logs to be processed. Default is 24 + hours (86400 seconds) and must be greater + than 3 seconds. If it times out, it will + report what it was able to process up until + the timeout. + -u, --user-name Applications which a particular user has submitted. + -h, --help Show help message + + trailing arguments: + eventlog (required) Event log filenames(space separated) or directories + containing event logs. eg: s3a:///eventlog1 + /path/to/eventlog2 +``` + +Example commands: +- Process the 10 newest logs, and only output the top 3 in the output: + +```bash +java -cp ~/rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/*:$HADOOP_CONF_DIR/ \ + com.nvidia.spark.rapids.tool.qualification.QualificationMain -f 10-newest -n 3 /eventlogDir +``` + +- Process last 100 days' logs: + +```bash +java -cp ~/rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/*:$HADOOP_CONF_DIR/ \ + com.nvidia.spark.rapids.tool.qualification.QualificationMain -s 100d /eventlogDir +``` + +- Process only the newest log with the same application name: + +```bash +java -cp ~/rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/*:$HADOOP_CONF_DIR/ \ + com.nvidia.spark.rapids.tool.qualification.QualificationMain -f 1-newest-per-app-name /eventlogDir +``` + +Note: The “regular expression” used by -a option is based on +[java.util.regex.Pattern](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). + +## Qualification tool score algorithm + +In the Qualification tool’s output, all applications are ranked based on a “score”. + +The score is based on the total time spent in tasks of SQL Dataframe operations. +The tool also looks for read data formats and types that the plugin doesn't fully support and if it finds any, +it will take away from the score. The parameter to control this negative impact of the +score is `-r, --read-score-percent` with the default value as 20(percent). + +The idea behind this algorithm is that the longer the total task time doing SQL Dataframe operations +the higher the score is and the more likely the plugin will be able to help accelerate that application. + +Note: The score does not guarantee there will be GPU acceleration and we are continuously improving +the score algorithm to qualify the best queries for GPU acceleration. + +## How to compile the tools jar +Note: This step is optional. + +```bash +git clone https://github.com/NVIDIA/spark-rapids.git +cd spark-rapids +mvn -Pdefault -pl .,tools clean verify -DskipTests +``` + +The jar is generated in below directory : + +`./tools/target/rapids-4-spark-tools_2.12-.jar` + +If any input is a S3 file path or directory path, 2 extra steps are needed to access S3 in Spark: +1. Download the matched jars based on the Hadoop version: + - `hadoop-aws-.jar` + - `aws-java-sdk-.jar` + +2. Take Hadoop 2.7.4 for example, we can download and include below jars in the '--jars' option to spark-shell or spark-submit: + [hadoop-aws-2.7.4.jar](https://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-aws/2.7.4/hadoop-aws-2.7.4.jar) and + [aws-java-sdk-1.7.4.jar](https://repo.maven.apache.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar) + +3. In $SPARK_HOME/conf, create `hdfs-site.xml` with below AWS S3 keys inside: + +```xml + + + + fs.s3a.access.key + xxx + + + fs.s3a.secret.key + xxx + + +``` + +Please refer to this [doc](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html) on +more options about integrating hadoop-aws module with S3. \ No newline at end of file diff --git a/docs/supported_ops.md b/docs/supported_ops.md index 71cebfe75c3..8489c220aa4 100644 --- a/docs/supported_ops.md +++ b/docs/supported_ops.md @@ -15,10 +15,11 @@ apply to other versions of Spark, but there may be slight changes. # General limitations ## `Decimal` The `Decimal` type in Spark supports a precision -up to 38 digits (128-bits). The RAPIDS Accelerator stores values up to 64-bits and as such only +up to 38 digits (128-bits). The RAPIDS Accelerator in most cases stores values up to +64-bits and will support 128-bit in the future. As such the accelerator currently only supports a precision up to 18 digits. Note that -decimals are disabled by default in the plugin, because it is supported by a small -number of operations presently, which can result in a lot of data movement to and +decimals are disabled by default in the plugin, because it is supported by a relatively +small number of operations presently. This can result in a lot of data movement to and from the GPU, slowing down processing in some cases. Result `Decimal` precision and scale follow the same rule as CPU mode in Apache Spark: @@ -36,10 +37,15 @@ Result `Decimal` precision and scale follow the same rule as CPU mode in Apache * e1 union e2 max(s1, s2) + max(p1-s1, p2-s2) max(s1, s2) ``` -However Spark inserts `PromotePrecision` to CAST both sides to the same type. +However, Spark inserts `PromotePrecision` to CAST both sides to the same type. GPU mode may fall back to CPU even if the result Decimal precision is within 18 digits. For example, `Decimal(8,2)` x `Decimal(6,3)` resulting in `Decimal (15,5)` runs on CPU, because due to `PromotePrecision`, GPU mode assumes the result is `Decimal(19,6)`. +There are even extreme cases where Spark can temporarily return a Decimal value +larger than what can be stored in 128-bits and then uses the `CheckOverflow` +operator to round it to a desired precision and scale. This means that even when +the accelerator supports 128-bit decimal, we might not be able to support all +operations that Spark can support. ## `Timestamp` Timestamps in Spark will all be converted to the local time zone before processing @@ -90,11 +96,9 @@ the reasons why this particular operator or expression is on the CPU or GPU. |Value|Description| |---------|----------------| -|S| (Supported) Both Apache Spark and the RAPIDS Accelerator support this type.| -|S*| (Supported with limitations) Typically this refers to general limitations with `Timestamp` or `Decimal`| +|S| (Supported) Both Apache Spark and the RAPIDS Accelerator support this type fully.| | | (Not Applicable) Neither Spark not the RAPIDS Accelerator support this type in this situation.| |_PS_| (Partial Support) Apache Spark supports this type, but the RAPIDS Accelerator only partially supports it. An explanation for what is missing will be included with this.| -|_PS*_| (Partial Support with limitations) Like regular Partial Support but with general limitations on `Timestamp` or `Decimal` types.| |**NS**| (Not Supported) Apache Spark supports this type but the RAPIDS Accelerator does not. # SparkPlan or Executor Nodes @@ -107,6 +111,7 @@ Accelerator supports are described below. Executor Description Notes +Param(s) BOOLEAN BYTE SHORT @@ -127,9 +132,10 @@ Accelerator supports are described below. UDT -CoalesceExec -The backend for the dataframe coalesce method -None +CoalesceExec +The backend for the dataframe coalesce method +None +Input/Output S S S @@ -138,21 +144,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -PS* (missing nested BINARY, CALENDAR, UDT) -PS* (missing nested BINARY, CALENDAR, UDT) -PS* (missing nested BINARY, CALENDAR, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -CollectLimitExec -Reduce to single partition and apply limit -This is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU +CollectLimitExec +Reduce to single partition and apply limit +This is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU +Input/Output S S S @@ -161,21 +168,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS NS NS -PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NS -ExpandExec -The backend for the expand operator -None +ExpandExec +The backend for the expand operator +None +Input/Output S S S @@ -184,21 +192,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -NS -NS -NS +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -FileSourceScanExec -Reading data from files, often from Hive tables -None +FileSourceScanExec +Reading data from files, often from Hive tables +None +Input/Output S S S @@ -207,21 +216,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -PS* (missing nested BINARY, CALENDAR, UDT) -PS* (missing nested BINARY, CALENDAR, UDT) -PS* (missing nested BINARY, CALENDAR, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -FilterExec -The backend for most filter statements -None +FilterExec +The backend for most filter statements +None +Input/Output S S S @@ -230,21 +240,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -PS* (missing nested BINARY, CALENDAR, UDT) -PS* (missing nested BINARY, CALENDAR, UDT) -PS* (missing nested BINARY, CALENDAR, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -GenerateExec -The backend for operations that generate more output rows than input rows like explode -None +GenerateExec +The backend for operations that generate more output rows than input rows like explode +None +Input/Output S S S @@ -253,21 +264,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
+S +PS
max DECIMAL precision of 18
S -S* -NS -NS -NS -PS* (missing nested BINARY, CALENDAR, MAP, STRUCT, UDT) NS NS +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -GlobalLimitExec -Limiting of results across partitions -None +GlobalLimitExec +Limiting of results across partitions +None +Input/Output S S S @@ -276,21 +288,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -NS -NS -NS +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -LocalLimitExec -Per-partition limiting of results -None +LocalLimitExec +Per-partition limiting of results +None +Input/Output S S S @@ -299,21 +312,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -NS -NS -NS +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -ProjectExec -The backend for most select, withColumn and dropColumn statements -None +ProjectExec +The backend for most select, withColumn and dropColumn statements +None +Input/Output S S S @@ -322,21 +336,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -PS* (missing nested BINARY, CALENDAR, UDT) -PS* (missing nested BINARY, CALENDAR, UDT) -PS* (missing nested BINARY, CALENDAR, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -RangeExec -The backend for range operator -None +RangeExec +The backend for range operator +None +Input/Output @@ -357,9 +372,10 @@ Accelerator supports are described below. -SortExec -The backend for the sort operator -None +SortExec +The backend for the sort operator +None +Input/Output S S S @@ -368,21 +384,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -PS* (missing nested BINARY, CALENDAR, MAP, UDT) -NS -PS* (missing nested BINARY, CALENDAR, MAP, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -TakeOrderedAndProjectExec -Take the first limit elements as defined by the sortOrder, and do projection if needed. -None +TakeOrderedAndProjectExec +Take the first limit elements as defined by the sortOrder, and do projection if needed +None +Input/Output S S S @@ -391,21 +408,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -NS -NS -PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -UnionExec -The backend for the union operator -None +UnionExec +The backend for the union operator +None +Input/Output S S S @@ -414,21 +432,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -NS -NS -PS* (unionByName will not optionally impute nulls for missing struct fields when the column is a struct and there are non-overlapping fields; missing nested BINARY, CALENDAR, ARRAY, MAP, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
unionByName will not optionally impute nulls for missing struct fields when the column is a struct and there are non-overlapping fields;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -CustomShuffleReaderExec -A wrapper of shuffle query stage -None +CustomShuffleReaderExec +A wrapper of shuffle query stage +None +Input/Output S S S @@ -437,21 +456,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -PS* (missing nested BINARY, CALENDAR, UDT) -PS* (missing nested BINARY, CALENDAR, UDT) -PS* (missing nested BINARY, CALENDAR, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -HashAggregateExec -The backend for hash based aggregations -None +HashAggregateExec +The backend for hash based aggregations +None +Input/Output S S S @@ -460,21 +480,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -PS* (not allowed for grouping expressions; missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) -PS* (not allowed for grouping expressions; missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) -PS* (not allowed for grouping expressions; missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) +PS
not allowed for grouping expressions;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
not allowed for grouping expressions;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
not allowed for grouping expressions if containing Array or Map as child;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS Executor Description Notes +Param(s) BOOLEAN BYTE SHORT @@ -495,9 +516,10 @@ Accelerator supports are described below. UDT -SortAggregateExec -The backend for sort based aggregations -None +ObjectHashAggregateExec +The backend for hash based aggregations supporting TypedImperativeAggregate functions +None +Input/Output S S S @@ -506,21 +528,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -NS -PS (missing nested BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) -NS +PS
not allowed for grouping expressions;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
not allowed for grouping expressions;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
not allowed for grouping expressions if containing Array or Map as child;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -DataWritingCommandExec -Writing data -None +SortAggregateExec +The backend for sort based aggregations +None +Input/Output S S S @@ -529,21 +552,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
+S +PS
max DECIMAL precision of 18
S -PS* (Only supported for Parquet) -NS -NS NS -PS* (Only supported for Parquet; missing nested NULL, BINARY, CALENDAR, MAP, UDT) NS -PS* (Only supported for Parquet; missing nested NULL, BINARY, CALENDAR, MAP, UDT) +PS
not allowed for grouping expressions;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
not allowed for grouping expressions;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
not allowed for grouping expressions if containing Array or Map as child;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -BatchScanExec -The backend for most file input -None +DataWritingCommandExec +Writing data +None +Input/Output S S S @@ -552,22 +576,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
Only supported for Parquet;
max DECIMAL precision of 18
NS NS NS -PS* (missing nested NULL, BINARY, CALENDAR, UDT) -PS* (missing nested NULL, BINARY, CALENDAR, UDT) -PS* (missing nested NULL, BINARY, CALENDAR, UDT) +PS
Only supported for Parquet;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, BINARY, CALENDAR, UDT
+PS
Only supported for Parquet;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, BINARY, CALENDAR, UDT
+PS
Only supported for Parquet;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, BINARY, CALENDAR, UDT
NS -BroadcastExchangeExec -The backend for broadcast exchange of data -None -S +BatchScanExec +The backend for most file input +None +Input/Output S S S @@ -575,21 +599,23 @@ Accelerator supports are described below. S S S -S* S -S* +PS
UTC is only supported TZ for TIMESTAMP
S +PS
max DECIMAL precision of 18
NS NS -PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS -PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, BINARY, CALENDAR, UDT
NS -ShuffleExchangeExec -The backend for most data being exchanged between processes -None +BroadcastExchangeExec +The backend for broadcast exchange of data +None +Input/Output S S S @@ -598,21 +624,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -PS* (Round-robin partitioning is not supported if spark.sql.execution.sortBeforeRepartition is true; missing nested BINARY, CALENDAR, MAP, UDT) -PS* (Round-robin partitioning is not supported if spark.sql.execution.sortBeforeRepartition is true; missing nested BINARY, CALENDAR, MAP, UDT) -PS* (Round-robin partitioning is not supported for nested structs if spark.sql.execution.sortBeforeRepartition is true; missing nested BINARY, CALENDAR, MAP, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT
NS -BroadcastHashJoinExec -Implementation of join using broadcast data -None +ShuffleExchangeExec +The backend for most data being exchanged between processes +None +Input/Output S S S @@ -621,21 +648,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) -NS -PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) +PS
Round-robin partitioning is not supported if spark.sql.execution.sortBeforeRepartition is true;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
Round-robin partitioning is not supported if spark.sql.execution.sortBeforeRepartition is true;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
Round-robin partitioning is not supported for nested structs if spark.sql.execution.sortBeforeRepartition is true;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -BroadcastNestedLoopJoinExec -Implementation of join using brute force -None +BroadcastHashJoinExec +Implementation of join using broadcast data +None +leftKeys S S S @@ -644,21 +672,19 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) -NS NS + +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS -CartesianProductExec -Implementation of join using brute force -None +rightKeys S S S @@ -667,21 +693,40 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) -NS NS + +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS -ShuffledHashJoinExec -Implementation of join using hashed shuffled data -None +condition +S + + + + + + + + + + + + + + + + + + + +Input/Output S S S @@ -690,21 +735,43 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) -NS -PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -SortMergeJoinExec -Sort merge join, replacing with shuffled hash join -None +BroadcastNestedLoopJoinExec +Implementation of join using brute force. Full outer joins and joins where the broadcast side matches the join side (e.g.: LeftOuter with left broadcast) are not supported +None +condition
(A non-inner join only is supported if the condition expression can be converted to a GPU AST expression) +S + + + + + + + + + + + + + + + + + + + +Input/Output S S S @@ -713,21 +780,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) -NS -PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS -AggregateInPandasExec -The backend for an Aggregation Pandas UDF, this accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. -None +CartesianProductExec +Implementation of join using brute force +None +Input/Output S S S @@ -736,21 +804,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
+S +PS
max DECIMAL precision of 18
S NS NS -NS -NS -NS +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NS NS NS -ArrowEvalPythonExec -The backend of the Scalar Pandas UDFs. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled -None +ShuffledHashJoinExec +Implementation of join using hashed shuffled data +None +leftKeys S S S @@ -759,21 +828,19 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
+S +PS
max DECIMAL precision of 18
S NS NS NS -NS -PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT) -NS -PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT) + +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS -FlatMapGroupsInPandasExec -The backend for Flat Map Groups Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. -None +rightKeys S S S @@ -782,44 +849,40 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
+S +PS
max DECIMAL precision of 18
S NS NS NS -NS -NS -NS -NS + +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS -MapInPandasExec -The backend for Map Pandas Iterator UDF. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. -None -S -S -S -S -S -S -S +condition S -S* -S -NS -NS -NS -NS -PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT) -NS -PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT) -NS + + + + + + + + + + + + + + + + + -WindowInPandasExec -The backend for Window Aggregation Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. For now it only supports row based window frame. -This is disabled by default because it only supports row based frame for now +Input/Output S S S @@ -828,21 +891,22 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
+S +PS
max DECIMAL precision of 18
S NS NS -NS -NS -PS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) -NS -NS +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS Executor Description Notes +Param(s) BOOLEAN BYTE SHORT @@ -863,9 +927,10 @@ Accelerator supports are described below. UDT -WindowExec -Window-operator backend -None +SortMergeJoinExec +Sort merge join, replacing with shuffled hash join +None +leftKeys S S S @@ -874,133 +939,41 @@ Accelerator supports are described below. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS -PS* (Not supported as a partition by key; missing nested BINARY, CALENDAR, MAP, UDT) NS -PS* (Not supported as a partition by key; missing nested BINARY, CALENDAR, MAP, UDT) + +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS - -* As was stated previously Decimal is only supported up to a precision of -18 and Timestamp is only supported in the -UTC time zone. Decimals are off by default due to performance impact in -some cases. - -# Expression and SQL Functions -Inside each node in the DAG there can be one or more trees of expressions -that describe various types of processing that happens in that part of the plan. -These can be things like adding two numbers together or checking for null. -These expressions can have multiple input parameters and one output value. -These expressions also can happen in different contexts. Because of how the -accelerator works different contexts have different levels of support. - -The most common expression context is `project`. In this context values from a single -input row go through the expression and the result will also be use to produce -something in the same row. Be aware that even in the case of aggregation and window -operations most of the processing is still done in the project context either before -or after the other processing happens. - -Aggregation operations like count or sum can take place in either the `aggregation`, -`reduction`, or `window` context. `aggregation` is when the operation was done while -grouping the data by one or more keys. `reduction` is when there is no group by and -there is a single result for an entire column. `window` is for window operations. - -The final expression context is `lambda` which happens primarily for higher order -functions in SQL. -Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - + + - - - - - - - - - - - - - - - - - - - - - + + + + + @@ -1008,20 +981,9 @@ Accelerator support is described below. - - - - - - - - - - - @@ -1031,23 +993,272 @@ Accelerator support is described below. - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
Abs`abs`Absolute valueNoneprojectinput SSrightKeys S S S S S*
result S S S SPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
S S*
lambdainput NS NS NSNSNSNS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
conditionS
result NSNSNSNSNSNS NS
Acos`acos`Inverse cosineInput/OutputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
AggregateInPandasExecThe backend for an Aggregation Pandas UDF, this accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled.NoneInput/OutputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SNSNSNSNSNSNSNSNS
ArrowEvalPythonExecThe backend of the Scalar Pandas UDFs. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabledNoneInput/OutputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SNSNSNSNSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS
FlatMapGroupsInPandasExecThe backend for Flat Map Groups Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled.NoneInput/OutputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SNSNSNSNSNSNSNSNS
MapInPandasExecThe backend for Map Pandas Iterator UDF. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled.NoneInput/OutputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SNSNSNSNSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS
WindowInPandasExecThe backend for Window Aggregation Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. For now it only supports row based window frame.This is disabled by default because it only supports row based frame for nowInput/OutputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SNSNSNSNSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NSNSNS
WindowExecWindow-operator backendNonepartitionSpecSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNSNSNSNS
Input/OutputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
+ +# Expression and SQL Functions +Inside each node in the DAG there can be one or more trees of expressions +that describe various types of processing that happens in that part of the plan. +These can be things like adding two numbers together or checking for null. +These expressions can have multiple input parameters and one output value. +These expressions also can happen in different contexts. Because of how the +accelerator works different contexts have different levels of support. + +The most common expression context is `project`. In this context values from a single +input row go through the expression and the result will also be use to produce +something in the same row. Be aware that even in the case of aggregation and window +operations most of the processing is still done in the project context either before +or after the other processing happens. + +Aggregation operations like count or sum can take place in either the `aggregation`, +`reduction`, or `window` context. `aggregation` is when the operation was done while +grouping the data by one or more keys. `reduction` is when there is no group by and +there is a single result for an entire column. `window` is for window operations. + +The final expression context is `AST` or Abstract Syntax Tree. +Before explaining AST we first need to explain in detail how project context operations +work. Generally for a project context operation the plan Spark developed is read +on the CPU and an appropriate set of GPU kernels are selected to do those +operations. For example `a >= b + 1`. Would result in calling a GPU kernel to add +`1` to `b`, followed by another kernel that is called to compare `a` to that result. +The interpretation is happening on the CPU, and the GPU is used to do the processing. +For AST the interpretation for some reason cannot happen on the CPU and instead must +be done in the GPU kernel itself. An example of this is conditional joins. If you +want to join on `A.a >= B.b + 1` where `A` and `B` are separate tables or data +frames, the `+` and `>=` operations cannot run as separate independent kernels +because it is done on a combination of rows in both `A` and `B`. Instead part of the +plan that Spark developed is turned into an abstract syntax tree and sent to the GPU +where it can be interpreted. The number and types of operations supported in this +are limited. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - + + + + + - + @@ -1059,16 +1270,16 @@ Accelerator support is described below. - - - - - + + + + + - + @@ -1078,19 +1289,19 @@ Accelerator support is described below. - + - - - - - + + + + + - + @@ -1102,16 +1313,16 @@ Accelerator support is described below. - - - - - + + + + + - + @@ -1121,9 +1332,9 @@ Accelerator support is described below. - - - + + + @@ -1168,7 +1379,7 @@ Accelerator support is described below. - + @@ -1176,7 +1387,7 @@ Accelerator support is described below. - + @@ -1197,7 +1408,7 @@ Accelerator support is described below. - + @@ -1211,15 +1422,105 @@ Accelerator support is described below. - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -1227,7 +1528,7 @@ Accelerator support is described below. - + @@ -1248,7 +1549,7 @@ Accelerator support is described below. - + @@ -1269,7 +1570,7 @@ Accelerator support is described below. - + @@ -1279,15 +1580,15 @@ Accelerator support is described below. - + - - - - + + + + @@ -1305,10 +1606,10 @@ Accelerator support is described below. - - - - + + + + @@ -1326,10 +1627,10 @@ Accelerator support is described below. - - - - + + + + @@ -1383,15 +1684,15 @@ Accelerator support is described below. - + - + - - - + + + @@ -1404,29 +1705,29 @@ Accelerator support is described below. - + - + - - - + + + - + + + + + + + + - - - - - - - - + @@ -1439,15 +1740,15 @@ Accelerator support is described below. + + + + + + + - - - - - - - - + @@ -1527,9 +1828,9 @@ Accelerator support is described below. - + - + @@ -1550,7 +1851,7 @@ Accelerator support is described below. - + @@ -1571,7 +1872,7 @@ Accelerator support is described below. - + @@ -1591,10 +1892,10 @@ Accelerator support is described below. - - - - + + + + @@ -1611,7 +1912,7 @@ Accelerator support is described below. - + @@ -1623,10 +1924,10 @@ Accelerator support is described below. + + - - - + @@ -1659,8 +1960,12 @@ Accelerator support is described below. - - + + + + + + @@ -1675,52 +1980,31 @@ Accelerator support is described below. - + - - - - - - - - - - - - - - - + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - @@ -1749,10 +2033,10 @@ Accelerator support is described below. - - - - + + + + @@ -1761,7 +2045,6 @@ Accelerator support is described below. - @@ -1770,41 +2053,45 @@ Accelerator support is described below. + - - - - - - + + + + + + + + + + + + + + - - - - - - - - - - + + - - + + + + + + - @@ -1813,19 +2100,40 @@ Accelerator support is described below. + - - + + + + + + + + + + + + + + + + + + + + + + + - @@ -1834,14 +2142,15 @@ Accelerator support is described below. + - - - + + + @@ -1886,7 +2195,7 @@ Accelerator support is described below. - + @@ -1894,7 +2203,7 @@ Accelerator support is described below. - + @@ -1915,7 +2224,7 @@ Accelerator support is described below. - + @@ -1929,40 +2238,83 @@ Accelerator support is described below. - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - + + + + + + + + + + + - + @@ -1976,30 +2328,34 @@ Accelerator support is described below. - + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - + + + - + @@ -2019,6 +2375,32 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -2066,7 +2448,7 @@ Accelerator support is described below. - + @@ -2074,7 +2456,7 @@ Accelerator support is described below. - + @@ -2095,7 +2477,7 @@ Accelerator support is described below. - + @@ -2109,32 +2491,6 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -2182,7 +2538,7 @@ Accelerator support is described below. - + @@ -2190,7 +2546,7 @@ Accelerator support is described below. - + @@ -2211,7 +2567,7 @@ Accelerator support is described below. - + @@ -2239,29 +2595,29 @@ Accelerator support is described below. - + - + - - - + + + - + + + + + + + + - - - - - - - - + @@ -2273,10 +2629,10 @@ Accelerator support is described below. - - - - + + + + @@ -2284,12 +2640,12 @@ Accelerator support is described below. - - + + - + @@ -2331,7 +2687,7 @@ Accelerator support is described below. - + @@ -2341,19 +2697,20 @@ Accelerator support is described below. - - + + + + + + - - - - - - + + + + - @@ -2361,14 +2718,17 @@ Accelerator support is described below. - - - - + + + + + + + @@ -2386,16 +2746,16 @@ Accelerator support is described below. - - - - - - + + + + + + + - @@ -2405,81 +2765,13 @@ Accelerator support is described below. - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + @@ -2499,8 +2791,8 @@ Accelerator support is described below. - - + + @@ -2520,8 +2812,8 @@ Accelerator support is described below. - - + + @@ -2610,13 +2902,13 @@ Accelerator support is described below. - + - - + + @@ -2636,8 +2928,8 @@ Accelerator support is described below. - - + + @@ -2721,13 +3013,13 @@ Accelerator support is described below. - + - - + + @@ -2747,8 +3039,8 @@ Accelerator support is described below. - - + + @@ -2768,8 +3060,8 @@ Accelerator support is described below. - - + + @@ -2853,13 +3145,13 @@ Accelerator support is described below. - + - - + + @@ -2879,8 +3171,8 @@ Accelerator support is described below. - - + + @@ -2900,8 +3192,8 @@ Accelerator support is described below. - - + + @@ -2943,10 +3235,10 @@ Accelerator support is described below. - - - - + + + + @@ -2978,15 +3270,15 @@ Accelerator support is described below. - + - + - - - + + + @@ -2999,79 +3291,15 @@ Accelerator support is described below. - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + @@ -3122,7 +3350,7 @@ Accelerator support is described below. - + @@ -3130,7 +3358,7 @@ Accelerator support is described below. - + @@ -3151,7 +3379,7 @@ Accelerator support is described below. - + @@ -3165,10 +3393,10 @@ Accelerator support is described below. - - - - + + + + @@ -3181,7 +3409,7 @@ Accelerator support is described below. - + @@ -3202,7 +3430,7 @@ Accelerator support is described below. - + @@ -3212,19 +3440,23 @@ Accelerator support is described below. - + + + + + - - - + + + @@ -3239,13 +3471,13 @@ Accelerator support is described below. - - - + + + @@ -3255,10 +3487,57 @@ Accelerator support is described below. - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -3270,12 +3549,12 @@ Accelerator support is described below. - - - + + + @@ -3291,55 +3570,12 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + @@ -3371,100 +3607,10 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + @@ -3479,9 +3625,9 @@ Accelerator support is described below. - - + + @@ -3500,37 +3646,20 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - + + + + + + @@ -3540,23 +3669,18 @@ Accelerator support is described below. - + + + - - - - - - - - + @@ -3566,18 +3690,19 @@ Accelerator support is described below. - + + - + @@ -3587,7 +3712,6 @@ Accelerator support is described below. - @@ -3598,7 +3722,11 @@ Accelerator support is described below. - + + + + + @@ -3606,15 +3734,15 @@ Accelerator support is described below. + + - - @@ -3627,10 +3755,10 @@ Accelerator support is described below. + - @@ -3641,22 +3769,18 @@ Accelerator support is described below. - - - - - - + + + - @@ -3667,17 +3791,17 @@ Accelerator support is described below. - + + - @@ -3688,14 +3812,19 @@ Accelerator support is described below. - - + + + + + + + @@ -3709,18 +3838,17 @@ Accelerator support is described below. - - + + - @@ -3731,17 +3859,18 @@ Accelerator support is described below. - + + + - @@ -3753,13 +3882,13 @@ Accelerator support is described below. - + @@ -3773,35 +3902,9 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + @@ -3846,7 +3949,7 @@ Accelerator support is described below. - + @@ -3854,7 +3957,7 @@ Accelerator support is described below. - + @@ -3875,32 +3978,6 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -3915,14 +3992,61 @@ Accelerator support is described below. - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -3934,17 +4058,68 @@ Accelerator support is described below. + + + + + - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - @@ -3952,10 +4127,36 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -3965,7 +4166,6 @@ Accelerator support is described below. - @@ -3976,27 +4176,28 @@ Accelerator support is described below. + - - - - - - + + + + + + - + @@ -4005,15 +4206,20 @@ Accelerator support is described below. - + + + + + + - + @@ -4026,15 +4232,14 @@ Accelerator support is described below. - - - - + + + + - @@ -4055,8 +4260,8 @@ Accelerator support is described below. - + @@ -4069,33 +4274,12 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + @@ -4103,6 +4287,7 @@ Accelerator support is described below. + @@ -4110,35 +4295,12 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - + @@ -4152,45 +4314,14 @@ Accelerator support is described below. + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + @@ -4198,9 +4329,9 @@ Accelerator support is described below. + - @@ -4211,28 +4342,12 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - + + + + + + @@ -4240,6 +4355,7 @@ Accelerator support is described below. + @@ -4249,12 +4365,10 @@ Accelerator support is described below. - - - + @@ -4262,9 +4376,9 @@ Accelerator support is described below. + - @@ -4275,31 +4389,11 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - + @@ -4313,16 +4407,41 @@ Accelerator support is described below. - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -4331,10 +4450,10 @@ Accelerator support is described below. + - @@ -4343,12 +4462,7 @@ Accelerator support is described below. - - - - - - + @@ -4356,9 +4470,9 @@ Accelerator support is described below. - + @@ -4369,11 +4483,8 @@ Accelerator support is described below. - + - - - @@ -4382,6 +4493,9 @@ Accelerator support is described below. + + + @@ -4390,7 +4504,12 @@ Accelerator support is described below. - + + + + + + @@ -4411,16 +4530,15 @@ Accelerator support is described below. - - - - + + + + - @@ -4433,17 +4551,17 @@ Accelerator support is described below. - + - - - + + + @@ -4454,7 +4572,12 @@ Accelerator support is described below. - + + + + + + @@ -4462,7 +4585,7 @@ Accelerator support is described below. - + @@ -4475,20 +4598,15 @@ Accelerator support is described below. - - - - - - + + - @@ -4501,7 +4619,12 @@ Accelerator support is described below. - + + + + + + @@ -4509,13 +4632,13 @@ Accelerator support is described below. + - @@ -4526,11 +4649,11 @@ Accelerator support is described below. + - @@ -4543,8 +4666,12 @@ Accelerator support is described below. - - + + + + + + @@ -4552,7 +4679,7 @@ Accelerator support is described below. - + @@ -4565,10 +4692,11 @@ Accelerator support is described below. - + + @@ -4579,22 +4707,47 @@ Accelerator support is described below. - + + + + + + + + + + + + + + + + + + + + + + + + + + + - @@ -4607,36 +4760,10 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + @@ -4645,11 +4772,11 @@ Accelerator support is described below. - + @@ -4666,11 +4793,11 @@ Accelerator support is described below. - + @@ -4684,14 +4811,14 @@ Accelerator support is described below. - - + + @@ -4701,16 +4828,44 @@ Accelerator support is described below. - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - @@ -4719,11 +4874,60 @@ Accelerator support is described below. + + - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -4731,9 +4935,9 @@ Accelerator support is described below. - + @@ -4744,17 +4948,17 @@ Accelerator support is described below. - + - + @@ -4765,12 +4969,8 @@ Accelerator support is described below. - - - - - - + + @@ -4779,7 +4979,6 @@ Accelerator support is described below. - @@ -4791,28 +4990,55 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -4822,7 +5048,6 @@ Accelerator support is described below. - @@ -4833,18 +5058,55 @@ Accelerator support is described below. - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -4853,9 +5115,6 @@ Accelerator support is described below. - - - @@ -4865,18 +5124,53 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - + + + @@ -4886,7 +5180,6 @@ Accelerator support is described below. - @@ -4897,13 +5190,38 @@ Accelerator support is described below. - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -4921,18 +5239,19 @@ Accelerator support is described below. + - + - - - + + + @@ -4944,8 +5263,8 @@ Accelerator support is described below. - - + + @@ -4963,18 +5282,18 @@ Accelerator support is described below. + - - + + - @@ -4987,11 +5306,13 @@ Accelerator support is described below. - + + + + + + - - - @@ -5005,6 +5326,9 @@ Accelerator support is described below. + + + @@ -5016,7 +5340,6 @@ Accelerator support is described below. - @@ -5024,40 +5347,15 @@ Accelerator support is described below. + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + @@ -5067,7 +5365,6 @@ Accelerator support is described below. - @@ -5079,16 +5376,17 @@ Accelerator support is described below. + - + @@ -5102,7 +5400,7 @@ Accelerator support is described below. - + @@ -5110,8 +5408,8 @@ Accelerator support is described below. + - @@ -5128,10 +5426,10 @@ Accelerator support is described below. - + @@ -5145,23 +5443,23 @@ Accelerator support is described below. - - - - + + + + - - + + @@ -5175,14 +5473,14 @@ Accelerator support is described below. - - - + + + @@ -5192,16 +5490,20 @@ Accelerator support is described below. - - + + + + + + + - @@ -5214,17 +5516,17 @@ Accelerator support is described below. - + - + @@ -5235,12 +5537,7 @@ Accelerator support is described below. - - - - - - + @@ -5248,9 +5545,9 @@ Accelerator support is described below. - + @@ -5261,29 +5558,38 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + - - + + + + + + @@ -5291,7 +5597,6 @@ Accelerator support is described below. - @@ -5299,16 +5604,17 @@ Accelerator support is described below. + - + - + @@ -5325,12 +5631,7 @@ Accelerator support is described below. - - - - - - + @@ -5339,29 +5640,34 @@ Accelerator support is described below. - + - + - - - + + + - + + + + + + - + @@ -5372,23 +5678,18 @@ Accelerator support is described below. - - - - - - + - - + + @@ -5398,18 +5699,18 @@ Accelerator support is described below. - + - - + + @@ -5419,61 +5720,138 @@ Accelerator support is described below. - + + + + + + - - + + - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - + - - + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + - @@ -5483,18 +5861,18 @@ Accelerator support is described below. - + - - + + @@ -5504,38 +5882,8 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + @@ -5543,6 +5891,7 @@ Accelerator support is described below. + @@ -5550,34 +5899,37 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + - + @@ -5586,21 +5938,22 @@ Accelerator support is described below. - + - + - - - + + + - - - + + + + @@ -5614,101 +5967,53 @@ Accelerator support is described below. - - - - - - - - - - + + + + + + + + + - - - - - - - - + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -5730,73 +6035,35 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + - - - + + + @@ -5808,9 +6075,9 @@ Accelerator support is described below. - + - + @@ -5829,9 +6096,9 @@ Accelerator support is described below. - + - + @@ -5862,17 +6129,17 @@ Accelerator support is described below. - + + + + + + - - - - - - + @@ -5885,15 +6152,15 @@ Accelerator support is described below. + + + + + - - - - - - + @@ -5906,7 +6173,7 @@ Accelerator support is described below. - + @@ -5926,38 +6193,12 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + @@ -5966,9 +6207,9 @@ Accelerator support is described below. - + - + @@ -5978,7 +6219,7 @@ Accelerator support is described below. - + @@ -5987,9 +6228,9 @@ Accelerator support is described below. - + - + @@ -5999,8 +6240,12 @@ Accelerator support is described below. - - + + + + + + @@ -6009,6 +6254,7 @@ Accelerator support is described below. + @@ -6020,54 +6266,11 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + @@ -6084,19 +6287,19 @@ Accelerator support is described below. - - - + + + - - + + + - @@ -6110,36 +6313,125 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - - - - + + - - + + + - @@ -6150,17 +6442,68 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - @@ -6174,10 +6517,10 @@ Accelerator support is described below. - - - - + + + + @@ -6189,13 +6532,13 @@ Accelerator support is described below. + + - - @@ -6210,30 +6553,30 @@ Accelerator support is described below. + - - - - - - - + + + + + + + - @@ -6247,14 +6590,19 @@ Accelerator support is described below. + + + + + + - @@ -6268,18 +6616,22 @@ Accelerator support is described below. - - + + + + + + - + @@ -6290,18 +6642,23 @@ Accelerator support is described below. - - + + + + + + + - + @@ -6311,49 +6668,18 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + - - + + @@ -6370,11 +6696,11 @@ Accelerator support is described below. - - + + @@ -6384,19 +6710,23 @@ Accelerator support is described below. - + + + + + - - + + + - @@ -6407,53 +6737,54 @@ Accelerator support is described below. + - - - - - - - - - - - - - + + + + + + + + - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + - + + @@ -6463,7 +6794,6 @@ Accelerator support is described below. - @@ -6474,7 +6804,34 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -6484,7 +6841,6 @@ Accelerator support is described below. - @@ -6495,15 +6851,45 @@ Accelerator support is described below. - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + @@ -6517,17 +6903,17 @@ Accelerator support is described below. - - + + + - @@ -6538,17 +6924,90 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - @@ -6559,12 +7018,122 @@ Accelerator support is described below. - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -6572,6 +7141,7 @@ Accelerator support is described below. + @@ -6579,21 +7149,20 @@ Accelerator support is described below. - - + - + @@ -6606,54 +7175,37 @@ Accelerator support is described below. - - - + + + + + + + + - + - + - - - + - - - - - - - - - - - - - - - - - - + - - - - + - + @@ -6670,24 +7222,45 @@ Accelerator support is described below. - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -6717,54 +7290,59 @@ Accelerator support is described below. - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + - - - - - - - + + - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + - + + + + + + @@ -6777,7 +7355,7 @@ Accelerator support is described below. - + @@ -6785,15 +7363,11 @@ Accelerator support is described below. - - - - - - + + @@ -6806,55 +7380,59 @@ Accelerator support is described below. - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + - - - - - - - - - - + + + + + + + + + + + + - - - + - - + + @@ -6870,59 +7448,55 @@ Accelerator support is described below. - - - - - - - - - + + + + + + + + + - - - - - - - - + + + + + + + + - - - - - - + + @@ -6939,11 +7513,15 @@ Accelerator support is described below. - - + + + + + + @@ -6952,20 +7530,41 @@ Accelerator support is described below. - + - + - - - + + + - - + + + + + + + + + + + + + + + + + + + + + + + @@ -6982,58 +7581,54 @@ Accelerator support is described below. - - + + + + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - + + - + + @@ -7043,7 +7638,6 @@ Accelerator support is described below. - @@ -7054,29 +7648,38 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + - - + + + + + + @@ -7084,9 +7687,9 @@ Accelerator support is described below. - - - + + + @@ -7097,7 +7700,7 @@ Accelerator support is described below. - + @@ -7107,7 +7710,7 @@ Accelerator support is described below. - + @@ -7119,6 +7722,7 @@ Accelerator support is described below. + @@ -7127,7 +7731,6 @@ Accelerator support is described below. - @@ -7139,38 +7742,12 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + @@ -7179,47 +7756,53 @@ Accelerator support is described below. - + - + - - - - + + + + - - - - + + - - + + + + + + - - - + + + + + + + @@ -7233,57 +7816,40 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + @@ -7297,61 +7863,61 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - + + + + + + + - - - - - + + + + + + + + + + - - - - - - - - - - - - + + + + + + + + + + + + - - - - - + + + + + + + + + + - + @@ -7365,57 +7931,40 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - + @@ -7429,94 +7978,72 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - + + + + + + + + + + + + - - - - - + + + + + + + + + + - - - - - - - - - - - - + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + @@ -7545,10 +8072,10 @@ Accelerator support is described below. - - - - + + + + @@ -7559,8 +8086,8 @@ Accelerator support is described below. - + @@ -7575,13 +8102,13 @@ Accelerator support is described below. - + @@ -7592,17 +8119,21 @@ Accelerator support is described below. - + + + + + + - @@ -7618,7 +8149,6 @@ Accelerator support is described below. - @@ -7626,6 +8156,7 @@ Accelerator support is described below. + @@ -7635,13 +8166,12 @@ Accelerator support is described below. - - - - - - - + + + + + + @@ -7657,77 +8187,12 @@ Accelerator support is described below. + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -7742,124 +8207,18 @@ Accelerator support is described below. + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + @@ -7875,53 +8234,12 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - - - - - - - - - @@ -7936,65 +8254,18 @@ Accelerator support is described below. + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + - - - - - - - - - - - - - - - - - - - - - - @@ -8010,34 +8281,12 @@ Accelerator support is described below. + - - - - - - - - - - - - - - - - - - - - - - - @@ -8052,15 +8301,16 @@ Accelerator support is described below. + - - - - + + + + @@ -8072,10 +8322,10 @@ Accelerator support is described below. - + @@ -8104,7 +8354,11 @@ Accelerator support is described below. - + + + + + @@ -8114,8 +8368,8 @@ Accelerator support is described below. + - @@ -8130,13 +8384,13 @@ Accelerator support is described below. + - @@ -8147,10 +8401,10 @@ Accelerator support is described below. - - - - + + + + @@ -8173,16 +8427,46 @@ Accelerator support is described below. - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + @@ -8195,16 +8479,10 @@ Accelerator support is described below. - - - - - - @@ -8219,21 +8497,47 @@ Accelerator support is described below. + - - + + + + + + + + + + + + + + + - + + + + + + + + + + + + + @@ -8243,13 +8547,40 @@ Accelerator support is described below. - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -8257,11 +8588,22 @@ Accelerator support is described below. + + + + + + + + + + + @@ -8269,18 +8611,64 @@ Accelerator support is described below. - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -8291,23 +8679,23 @@ Accelerator support is described below. - - - - + + + + - + + - @@ -8322,13 +8710,13 @@ Accelerator support is described below. - + + - @@ -8343,7 +8731,59 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -8359,19 +8799,18 @@ Accelerator support is described below. - - + + + - - @@ -8381,18 +8820,19 @@ Accelerator support is described below. - + + + + - - @@ -8403,11 +8843,11 @@ Accelerator support is described below. + - @@ -8449,19 +8889,19 @@ Accelerator support is described below. - - - - - - + + + + + + + + - - @@ -8475,7 +8915,7 @@ Accelerator support is described below. - + @@ -8496,15 +8936,14 @@ Accelerator support is described below. - - - + + + + - - @@ -8518,8 +8957,9 @@ Accelerator support is described below. - - + + + @@ -8539,33 +8979,7 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - + @@ -8586,30 +9000,8 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - + @@ -8629,36 +9021,65 @@ Accelerator support is described below. - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + - - + + + - - - - - + + + + + + + + + + + + + + + @@ -8666,6 +9087,14 @@ Accelerator support is described below. + + + + + + + + @@ -8674,32 +9103,19 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + - @@ -8714,23 +9130,45 @@ Accelerator support is described below. + - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -8751,7 +9189,29 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + + + + @@ -8766,15 +9226,14 @@ Accelerator support is described below. - - + - - + + @@ -8794,8 +9253,8 @@ Accelerator support is described below. - - + + @@ -8835,103 +9294,213 @@ Accelerator support is described below. - - - - + + + + + + + + - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - - - + - - - - - - - + @@ -8939,41 +9508,21 @@ Accelerator support is described below. - - + - - - - - - - - - - - - - - - - - - - - - - + + + - - + + @@ -8981,15 +9530,16 @@ Accelerator support is described below. - - + - - + + + + @@ -9002,22 +9552,22 @@ Accelerator support is described below. - - - + - - + + + + - - - - + + + + @@ -9044,11 +9594,11 @@ Accelerator support is described below. + - @@ -9061,16 +9611,20 @@ Accelerator support is described below. - - - + + + + + + + + - @@ -9090,50 +9644,6 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -9146,51 +9656,6 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -9219,12 +9684,12 @@ Accelerator support is described below. - - - - - - + + + + + + @@ -9233,88 +9698,45 @@ Accelerator support is described below. - + - + - + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + - - - + + + - - + + @@ -9327,7 +9749,7 @@ Accelerator support is described below. - + @@ -9335,17 +9757,17 @@ Accelerator support is described below. - + - + @@ -9356,8 +9778,7 @@ Accelerator support is described below. - - + @@ -9367,10 +9788,10 @@ Accelerator support is described below. - + + - @@ -9382,13 +9803,13 @@ Accelerator support is described below. - + @@ -9399,58 +9820,65 @@ Accelerator support is described below. - - - - + + + + + - - - - - - - - - + + - + + + + + + + + - - - - - - - - - + + - + + + + + + + + + + + + + + @@ -9458,6 +9886,23 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + @@ -9465,59 +9910,18 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + @@ -9531,61 +9935,36 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - + + + + + + + - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + - + @@ -9599,51 +9978,42 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + @@ -9653,7 +10023,20 @@ Accelerator support is described below. + + + + + + + + + + + + + @@ -9689,22 +10072,22 @@ Accelerator support is described below. - - - - - - + + + + + + + - @@ -9715,7 +10098,59 @@ Accelerator support is described below. - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -9724,8 +10159,8 @@ Accelerator support is described below. + - @@ -9737,10 +10172,10 @@ Accelerator support is described below. - + @@ -9757,18 +10192,22 @@ Accelerator support is described below. - - - + + + + + + + + - @@ -9779,17 +10218,17 @@ Accelerator support is described below. - + + - @@ -9801,11 +10240,11 @@ Accelerator support is described below. - - + + @@ -9821,67 +10260,19 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + - @@ -9895,14 +10286,14 @@ Accelerator support is described below. - + + - @@ -9916,15 +10307,14 @@ Accelerator support is described below. - - - + + + - @@ -9938,14 +10328,19 @@ Accelerator support is described below. - - + + + + + + + + - @@ -9959,19 +10354,14 @@ Accelerator support is described below. - - - - - - + + - @@ -9989,11 +10379,11 @@ Accelerator support is described below. + + - - @@ -10006,7 +10396,11 @@ Accelerator support is described below. - + + + + + @@ -10014,7 +10408,7 @@ Accelerator support is described below. - + @@ -10035,7 +10429,7 @@ Accelerator support is described below. - + @@ -10075,9 +10469,9 @@ Accelerator support is described below. - - - + + + @@ -10122,7 +10516,7 @@ Accelerator support is described below. - + @@ -10130,7 +10524,7 @@ Accelerator support is described below. - + @@ -10151,7 +10545,7 @@ Accelerator support is described below. - + @@ -10165,9 +10559,9 @@ Accelerator support is described below. - - - + + + @@ -10212,7 +10606,7 @@ Accelerator support is described below. - + @@ -10220,7 +10614,7 @@ Accelerator support is described below. - + @@ -10241,7 +10635,7 @@ Accelerator support is described below. - + @@ -10255,19 +10649,17 @@ Accelerator support is described below. - - - - - - - + + + + + + - @@ -10277,18 +10669,20 @@ Accelerator support is described below. + + - + + - @@ -10302,14 +10696,18 @@ Accelerator support is described below. - + + + + + + - @@ -10318,20 +10716,20 @@ Accelerator support is described below. + - - + + - @@ -10345,14 +10743,13 @@ Accelerator support is described below. - + - @@ -10361,19 +10758,98 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - @@ -10387,26 +10863,47 @@ Accelerator support is described below. - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -10415,12 +10912,16 @@ Accelerator support is described below. + + + + + + - - - + @@ -10428,13 +10929,13 @@ Accelerator support is described below. - - - - - + + + + + @@ -10442,10 +10943,10 @@ Accelerator support is described below. + - @@ -10463,10 +10964,10 @@ Accelerator support is described below. + - @@ -10477,45 +10978,15 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + - + @@ -10536,11 +11007,11 @@ Accelerator support is described below. + - @@ -10550,12 +11021,12 @@ Accelerator support is described below. - - - - - - + + + + + + @@ -10565,10 +11036,10 @@ Accelerator support is described below. + - @@ -10576,7 +11047,7 @@ Accelerator support is described below. - + @@ -10586,7 +11057,7 @@ Accelerator support is described below. - + @@ -10597,8 +11068,8 @@ Accelerator support is described below. - - + + @@ -10611,7 +11082,6 @@ Accelerator support is described below. - @@ -10619,7 +11089,12 @@ Accelerator support is described below. - + + + + + + @@ -10629,7 +11104,7 @@ Accelerator support is described below. - + @@ -10640,21 +11115,16 @@ Accelerator support is described below. - - - - - - + + - @@ -10666,17 +11136,17 @@ Accelerator support is described below. - + - + @@ -10687,8 +11157,7 @@ Accelerator support is described below. - - + @@ -10697,8 +11166,8 @@ Accelerator support is described below. - + @@ -10709,17 +11178,48 @@ Accelerator support is described below. - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + @@ -10730,22 +11230,17 @@ Accelerator support is described below. - - - - - - + - + @@ -10756,13 +11251,12 @@ Accelerator support is described below. - - + + - @@ -10778,20 +11272,15 @@ Accelerator support is described below. - - - - - - + + - @@ -10804,17 +11293,22 @@ Accelerator support is described below. - + + + + + + - + @@ -10825,16 +11319,15 @@ Accelerator support is described below. - - + + - @@ -10847,17 +11340,17 @@ Accelerator support is described below. - + - + @@ -10868,91 +11361,8 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -10960,21 +11370,9 @@ Accelerator support is described below. - - - - - - - - - - - - + - @@ -10984,39 +11382,23 @@ Accelerator support is described below. - - - - - - - - - - - - - + + + + + + - - - - - - - - - + - @@ -11026,33 +11408,7 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - + @@ -11073,65 +11429,17 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -11142,38 +11450,22 @@ Accelerator support is described below. - - - - - - - - - - - - - - - + + + + + + - - - - - - - @@ -11184,19 +11476,18 @@ Accelerator support is described below. - - + - - + + @@ -11206,17 +11497,17 @@ Accelerator support is described below. - + + - - + @@ -11233,12 +11524,12 @@ Accelerator support is described below. - - + + @@ -11274,13 +11565,12 @@ Accelerator support is described below. - - - + + + - - - + + @@ -11290,6 +11580,7 @@ Accelerator support is described below. + @@ -11300,8 +11591,7 @@ Accelerator support is described below. - - + @@ -11311,6 +11601,7 @@ Accelerator support is described below. + @@ -11321,12 +11612,11 @@ Accelerator support is described below. - - - + + @@ -11344,7 +11634,6 @@ Accelerator support is described below. - @@ -11359,18 +11648,18 @@ Accelerator support is described below. + - - - - + + + + - - + @@ -11380,6 +11669,7 @@ Accelerator support is described below. + @@ -11390,8 +11680,7 @@ Accelerator support is described below. - - + @@ -11401,6 +11690,7 @@ Accelerator support is described below. + @@ -11412,7 +11702,6 @@ Accelerator support is described below. - @@ -11422,6 +11711,7 @@ Accelerator support is described below. + @@ -11432,9 +11722,12 @@ Accelerator support is described below. - - - + + + + + + @@ -11444,6 +11737,7 @@ Accelerator support is described below. + @@ -11454,8 +11748,7 @@ Accelerator support is described below. - - + @@ -11465,6 +11758,7 @@ Accelerator support is described below. + @@ -11476,7 +11770,6 @@ Accelerator support is described below. - @@ -11486,6 +11779,7 @@ Accelerator support is described below. + @@ -11496,23 +11790,23 @@ Accelerator support is described below. - - - - + + + + - + - - - - - - - + + + + + + + @@ -11522,18 +11816,18 @@ Accelerator support is described below. - + + + + - - - - - - - + + + + @@ -11545,16 +11839,16 @@ Accelerator support is described below. - - - - - - - + + + + + + + @@ -11564,40 +11858,65 @@ Accelerator support is described below. - - + + + + + + + - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + + + + - - - - - - - @@ -11609,21 +11928,21 @@ Accelerator support is described below. - - - - - - - + + + + + + + @@ -11654,12 +11973,12 @@ Accelerator support is described below. - - - - - - + + + + + + @@ -11669,18 +11988,18 @@ Accelerator support is described below. + - - + @@ -11690,30 +12009,25 @@ Accelerator support is described below. + - - - - - - - + + - @@ -11727,17 +12041,17 @@ Accelerator support is described below. - + - + @@ -11748,33 +12062,87 @@ Accelerator support is described below. - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + + + + + + + + @@ -11782,20 +12150,41 @@ Accelerator support is described below. + + + + + + + + + + + + + + - + + + + + + + + @@ -11803,6 +12192,25 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + @@ -11819,7 +12227,7 @@ Accelerator support is described below. - + @@ -11833,11 +12241,7 @@ Accelerator support is described below. - - - - - + @@ -11845,11 +12249,11 @@ Accelerator support is described below. + - @@ -11866,11 +12270,11 @@ Accelerator support is described below. + - @@ -11880,7 +12284,11 @@ Accelerator support is described below. - + + + + + @@ -11888,11 +12296,11 @@ Accelerator support is described below. + - @@ -11909,11 +12317,11 @@ Accelerator support is described below. + - @@ -11923,179 +12331,46 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - + + + + + + + + + @@ -12125,12 +12400,12 @@ Accelerator support is described below. - - - - - - + + + + + + @@ -12138,8 +12413,8 @@ Accelerator support is described below. - + @@ -12151,11 +12426,10 @@ Accelerator support is described below. - + - @@ -12166,14 +12440,14 @@ Accelerator support is described below. + - - + @@ -12181,8 +12455,8 @@ Accelerator support is described below. - + @@ -12194,16 +12468,21 @@ Accelerator support is described below. - + + + + + + - + @@ -12215,17 +12494,11 @@ Accelerator support is described below. - - - - - - + + - - @@ -12235,6 +12508,7 @@ Accelerator support is described below. + @@ -12248,9 +12522,9 @@ Accelerator support is described below. - + @@ -12262,15 +12536,19 @@ Accelerator support is described below. - - + + + + + + + - - + @@ -12291,7 +12569,7 @@ Accelerator support is described below. - + @@ -12305,40 +12583,19 @@ Accelerator support is described below. - - - + + + - - - - - - - - - - - - - - - - - - - - - - - + + - + @@ -12352,22 +12609,17 @@ Accelerator support is described below. - - - - - - + + - @@ -12378,9 +12630,12 @@ Accelerator support is described below. - - - + + + + + + @@ -12388,7 +12643,9 @@ Accelerator support is described below. - + + + @@ -12399,7 +12656,7 @@ Accelerator support is described below. - + @@ -12409,7 +12666,7 @@ Accelerator support is described below. - + @@ -12425,12 +12682,12 @@ Accelerator support is described below. + - @@ -12441,8 +12698,12 @@ Accelerator support is described below. - - + + + + + + @@ -12452,57 +12713,36 @@ Accelerator support is described below. - + - - - - - - - - - - + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - + + @@ -12515,13 +12755,13 @@ Accelerator support is described below. - + @@ -12552,23 +12792,20 @@ Accelerator support is described below. - - - - + + + + - + + + + + - - - - - - - @@ -12576,20 +12813,37 @@ Accelerator support is described below. + + + - - + + + + + + + + + + + + + + + + + - @@ -12597,9 +12851,21 @@ Accelerator support is described below. + + + + + + + - + + + + + + @@ -12610,53 +12876,53 @@ Accelerator support is described below. - - + + - - - - - - - - - + + + + + + + - + + - + + - - - - + + + + - + @@ -12667,43 +12933,43 @@ Accelerator support is described below. - - - - + + + + - + - - - + + + - - - - - + + + + + + - - + @@ -12712,41 +12978,41 @@ Accelerator support is described below. - - - - - + + + + + + - - + - + - - + + + + + + - - - - - + @@ -12755,9 +13021,16 @@ Accelerator support is described below. + + + + + + + @@ -12765,6 +13038,14 @@ Accelerator support is described below. + + + + + + + + @@ -12772,26 +13053,11 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - + @@ -12800,11 +13066,15 @@ Accelerator support is described below. - + + + + + + - @@ -12813,6 +13083,7 @@ Accelerator support is described below. + @@ -12821,54 +13092,43 @@ Accelerator support is described below. - - - - - - - - + + + + + + - + + + - - - - - - - - - - - + + + - - - - + @@ -12877,26 +13137,20 @@ Accelerator support is described below. + + + + - - - - - - - - - - @@ -12932,126 +13186,10 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + @@ -13062,9 +13200,9 @@ Accelerator support is described below. - + @@ -13078,8 +13216,8 @@ Accelerator support is described below. - + @@ -13095,7 +13233,11 @@ Accelerator support is described below. - + + + + + @@ -13105,8 +13247,8 @@ Accelerator support is described below. - + @@ -13121,13 +13263,13 @@ Accelerator support is described below. - + @@ -13138,20 +13280,20 @@ Accelerator support is described below. - - - - - - + + + + + + + - - + @@ -13164,7 +13306,7 @@ Accelerator support is described below. - + @@ -13185,58 +13327,156 @@ Accelerator support is described below. - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - + + + + + + + + + + + + - - - - - - - - - - - - - + + + + - + + + + + + + + + + + + + + + + + + + + + + + + + + + - + @@ -13253,8 +13493,8 @@ Accelerator support is described below. - - + + @@ -13270,43 +13510,38 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - + + + - - @@ -13322,11 +13557,54 @@ Accelerator support is described below. - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - @@ -13344,11 +13622,54 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - @@ -13364,13 +13685,70 @@ Accelerator support is described below. - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -13378,15 +13756,30 @@ Accelerator support is described below. + + + + + + + + + + + + + + + - + @@ -13409,16 +13802,38 @@ Accelerator support is described below. - - + + + + + + + + + + + + + + + + + + + + + + + + @@ -13428,20 +13843,14 @@ Accelerator support is described below. - - - - - - + - - + @@ -13449,12 +13858,13 @@ Accelerator support is described below. + - + @@ -13477,16 +13887,38 @@ Accelerator support is described below. - - + + + + + + + + + + + + + + + + + + + + + + + + @@ -13496,16 +13928,14 @@ Accelerator support is described below. - - + - - + @@ -13513,12 +13943,13 @@ Accelerator support is described below. + - + @@ -13541,16 +13972,42 @@ Accelerator support is described below. - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + @@ -13560,12 +14017,7 @@ Accelerator support is described below. - - - - - - + @@ -13576,7 +14028,7 @@ Accelerator support is described below. - + @@ -13586,40 +14038,62 @@ Accelerator support is described below. - + + + + + + + + + + + - + + + + + - - - - + - + + + + + + + + + + + + @@ -13636,11 +14110,11 @@ Accelerator support is described below. - - + + @@ -13676,19 +14150,39 @@ Accelerator support is described below. - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + - @@ -13697,11 +14191,34 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + + + + + @@ -13709,7 +14226,6 @@ Accelerator support is described below. - @@ -13718,31 +14234,32 @@ Accelerator support is described below. + - + - - - - - - + + + + + + + + + + + + + + + + + - - - - - - - - - - - @@ -13752,7 +14269,6 @@ Accelerator support is described below. - @@ -13761,24 +14277,45 @@ Accelerator support is described below. + - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + - @@ -13787,11 +14324,34 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + + + + + @@ -13799,7 +14359,6 @@ Accelerator support is described below. - @@ -13808,20 +14367,41 @@ Accelerator support is described below. + - + + + + + + + + + + + + + + + + + + + + + + - @@ -13830,19 +14410,46 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + + + + + + + + + + - @@ -13856,141 +14463,26 @@ Accelerator support is described below. - - - - - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + @@ -13998,6 +14490,7 @@ Accelerator support is described below. + @@ -14008,53 +14501,31 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - + + + + @@ -14062,6 +14533,7 @@ Accelerator support is described below. + @@ -14072,7 +14544,6 @@ Accelerator support is described below. - @@ -14104,11 +14575,11 @@ Accelerator support is described below. - - - - - + + + + + @@ -14118,15 +14589,15 @@ Accelerator support is described below. - + - + - - - + + + @@ -14139,4084 +14610,40 @@ Accelerator support is described below. - + - + - - - + + + - - - - - - - - - + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -18226,188 +14653,20 @@ Accelerator support is described below. - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -18415,37 +14674,23 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + - - + + + + @@ -18461,33 +14706,56 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + - - - - + @@ -18497,44 +14765,40 @@ Accelerator support is described below. - + + + - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + - - - - - + @@ -18557,28 +14821,6 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - @@ -18599,76 +14841,33 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + - - - - - + + - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -18677,41 +14876,41 @@ Accelerator support is described below. - + - + + - - - - - - + + + + + + + + + + + + + + + + - - - - - - - - - - - - + + - - + @@ -18720,41 +14919,41 @@ Accelerator support is described below. - + - + + - - - - - - + + + + + + + + + + + + + + + + - - - - - - - - - - - - + + - - + @@ -18763,39 +14962,18 @@ Accelerator support is described below. - + - + + - - - - - - - - - - - - - - - - - - - - - - @@ -18822,9 +15000,9 @@ Accelerator support is described below. - - - + + + @@ -18833,17 +15011,17 @@ Accelerator support is described below. + + - - - + - + @@ -18857,14 +15035,14 @@ Accelerator support is described below. - + - + @@ -18876,17 +15054,17 @@ Accelerator support is described below. + + - - - + - + @@ -18900,83 +15078,36 @@ Accelerator support is described below. - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + - - - + - + @@ -18990,31 +15121,35 @@ Accelerator support is described below. - + - + - + - - - - + + + + + + + + - + - + @@ -19024,7 +15159,7 @@ Accelerator support is described below. - + @@ -19033,9 +15168,9 @@ Accelerator support is described below. - + - + @@ -19045,55 +15180,7 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -19102,41 +15189,41 @@ Accelerator support is described below. - + - + + - - - + + + + + - + - - - - + @@ -19145,14 +15232,14 @@ Accelerator support is described below. - + - + - + @@ -19166,59 +15253,149 @@ Accelerator support is described below. - + - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - + + + + + - + + + + + + + + + + + - - - - - - - - - - - - - - - + + + + + - + + + + + + + + + + + @@ -19247,284 +15424,367 @@ Accelerator support is described below. - - - + + + - + - - - - - - - - - - - - - - - + + + + + - + + + + + + + + + + + - - - - - - - - - - - - - - - + + + + + - + + + + + + + + + + + - + + + + + + + - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - + + + + + + + + + + - - - - - - - - - - - - - - - + + + + + - + + + + + + + + + + + - - - - - - - - - - - - - - + + + + + + - - + + + + + + + + + + - - - + + + - - - - - - + + + - - + + + - - - - - + + + + + + + - - - + + + + + + + + + + + + + + + + + + + + + + - - + + + - - - - - + + + + + + + + + + + + - - - - - - - - - - - - - - - + + + + + + + + + + - - - - + + + - + + + + + + + + + + + + + + + + + + - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - + + + + + + + + + + + - - - - - - - - - - - - + + + + + + + + - - - - - - + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - + + + + + + + + + + + - - - - - + - - - - - - + + @@ -19536,6 +15796,10 @@ Accelerator support is described below. + + + + @@ -19543,13 +15807,12 @@ Accelerator support is described below. - - + + - @@ -19557,17 +15820,44 @@ Accelerator support is described below. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - + + @@ -19579,6 +15869,10 @@ Accelerator support is described below. + + + + @@ -19586,13 +15880,13 @@ Accelerator support is described below. - - + + + - @@ -19602,19 +15896,19 @@ Accelerator support is described below. - + - - - - - + + + + + - + @@ -19629,13 +15923,13 @@ Accelerator support is described below. - + - + @@ -19645,45 +15939,15 @@ Accelerator support is described below. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + - - + + @@ -19703,8 +15967,8 @@ Accelerator support is described below. - - + + @@ -19718,15 +15982,19 @@ Accelerator support is described below. - + + + + + - - + + @@ -19746,8 +16014,8 @@ Accelerator support is described below. - - + + @@ -19775,9 +16043,9 @@ Accelerator support is described below. - + - + @@ -19787,10 +16055,10 @@ Accelerator support is described below. - - - - + + + + @@ -19801,15 +16069,15 @@ Accelerator support is described below. - + - + - - - + + + @@ -19822,65 +16090,22 @@ Accelerator support is described below. - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + - - - - + + + + @@ -19891,15 +16116,15 @@ Accelerator support is described below. - + - + - - - + + + @@ -19912,65 +16137,18 @@ Accelerator support is described below. - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + +
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
Abs`abs`Absolute value None project input SSSSS S PS
max DECIMAL precision of 18
result SSSSS S PS
max DECIMAL precision of 18
lambdaAST input NSNSSSSS NS
result NSNSSSSS NS
Acosh`acosh`Inverse hyperbolic cosineAcos`acos`Inverse cosine None project input
lambdaAST input NSS NSS
Add`+`AdditionNoneprojectlhs SSAcosh`acosh`Inverse hyperbolic cosineNoneprojectinput S
result S
ASTinput S
result S
Add`+`AdditionNoneprojectlhs SS S S S S*PS
max DECIMAL precision of 18
NS S*PS
max DECIMAL precision of 18
NS S*PS
max DECIMAL precision of 18
NS
lambdaAST lhs NS NSNSNSNSNSSSSS NS NSNSNSNSNSSSSS NS NSNSNSNSNSSSSS S S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NSPS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
S S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NSPS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
lambdaAST inputSSSSSSS NSNSNSNSNSNSNSNSNSPS
UTC is only supported TZ for TIMESTAMP
NS NS NS
resultSSSSSSS NSNSNSNSNSNSNSNSNSPS
UTC is only supported TZ for TIMESTAMP
NS NS NS
lambdaAST lhsNSS
rhsNSS
resultNSS
ArrayContains`array_contains`Returns a boolean if the array contains the passed in keyNoneArrayContains`array_contains`Returns a boolean if the array contains the passed in keyNone project array PS* (missing nested DECIMAL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT)PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
S S SPS
NaN literals are not supported. Columnar input must not contain NaNs and spark.rapids.sql.hasNans must be false.
PS
NaN literals are not supported. Columnar input must not contain NaNs and spark.rapids.sql.hasNans must be false.
SSSS*PS
UTC is only supported TZ for TIMESTAMP
S NS NS
lambdaarrayArrayMax`array_max`Returns the maximum value in the arrayNoneprojectinput NSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT
keyNSNSNSNSNSNSNSNSNSNSNSNSNSNSresultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
S NS NS NS NS
result NS
ExpressionUDT
Asin`asin`Inverse sineNoneArrayMin`array_min`Returns the minimum value in the arrayNone project input S PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT
result SSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNS NSNS
lambdainputArrayTransform`transform`Transform elements in an array using the transform function. This is similar to a `map` in functional programmingNoneprojectargument NS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
result functionSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
result NS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
Asinh`asinh`Inverse hyperbolic sineAsin`asin`Inverse sine None project input
lambdaAST input NSS NSS
AtLeastNNonNulls Checks if number of non null/Nan values is greater than a given valueAsinh`asinh`Inverse hyperbolic sine None project input S
result S
ASTinput SSSSSSS*SS*SNSNSPS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)NS
resultS S
lambdaAtLeastNNonNulls Checks if number of non null/Nan values is greater than a given valueNoneproject inputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
S NS NSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
resultNSS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
Atan `atan` Inverse tangent
lambdaAST input NSS NSS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
Atanh `atanh` Inverse hyperbolic tangent
lambdaAST input NSS NSS S S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NSPS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
lambdaAST resultSSSSSSS NSNSNSNSNSNSNSNSNSPS
UTC is only supported TZ for TIMESTAMP
NS NS NSNS
BRound`bround`Round an expression to d decimal places using HALF_EVEN rounding modeNoneBRound`bround`Round an expression to d decimal places using HALF_EVEN rounding modeNone project value S S SPS (result may round slightly differently)PS (result may round slightly differently)PS
result may round slightly differently
PS
result may round slightly differently
S*PS
max DECIMAL precision of 18
S*PS
max DECIMAL precision of 18
lambdavalueBitwiseAnd`&`Returns the bitwise AND of the operandsNoneprojectlhs NSNSNSNSNSNSSSSS NS
scale NS
rhs SSSS
result NSNSNSNSNSNSSSSS NS
BitwiseAnd`&`Returns the bitwise AND of the operandsNoneprojectAST lhs SSSS
rhs SSSS
result SSSS
lambdalhs NSNS NS NSSS NS NSNSNSSS NS NSNSNSSS
lambdaAST input NS NSNSNSSS NS NSNSNSSS
lambdaAST lhs NS NSNSNSSS NS NSNSNSSS NS NSNSNSSS
lambdaAST lhs NS NSNSNSSS NS NSNSNSSS NS NSNSNSSS UDT
CaseWhen`when`CASE WHEN expressionNoneCaseWhen`when`CASE WHEN expressionNone project predicate SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested BINARY, CALENDAR, MAP, UDT)PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
S S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NS
lambdapredicateNS
valueNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
resultNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
lambdaAST input NSS NSS
Ceil`ceiling`, `ceil`Ceiling of a numberNoneCeil`ceiling`, `ceil`Ceiling of a numberNone project input S*PS
max DECIMAL precision of 18
S*PS
max DECIMAL precision of 18
lambdaCheckOverflow CheckOverflow after arithmetic operations between DecimalType dataNoneproject input NS NS NS PS
max DECIMAL precision of 18
NS NS NS PS
max DECIMAL precision of 18
CheckOverflow CheckOverflow after arithmetic operations between DecimalType dataNoneCoalesce`coalesce`Returns the first non-null argument if exists. Otherwise, nullNoneprojectparamSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
resultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
Concat`concat`List/String concatenateNone project input S* S NS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
S*
lambdainput NS
result NS S NS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
UDT
Coalesce`coalesce`Returns the first non-null argument if exists. Otherwise, nullNoneprojectparamSSSSSSSSS*SS*SNSNSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NS
resultSSSSSSSSS*SS*SNSNSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NS
lambdaparamNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
resultNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
Concat`concat`List/String concatenateNoneConcatWs`concat_ws`Concatenates multiple input strings or array of strings into a single string using a given separatorNone project input S NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) S S NS PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT)
lambdainput NS NS NS
resultContains ContainsNoneprojectsrc NSS NS NS
ConcatWs`concat_ws`Concatenates multiple input strings or array of strings into a single string using a given separatorNoneprojectinputsearch SPS
Literal value only
S
resultS S
lambdaCos`cos`CosineNoneproject input S NS NS S NS
Contains ContainsNoneprojectsrcASTinput S S
searchresult S PS (Literal value only)
resultSCosh`cosh`Hyperbolic cosineNoneprojectinput S
lambdasrcresult S NS
searchASTinput S NS
resultNS S
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
Cos`cos`CosineCot`cot`Cotangent None project input
lambdaAST input NSS NS
Cosh`cosh`Hyperbolic cosineNoneprojectinput S
result ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
CreateArray`array`Returns an array with the given elementsNoneprojectargSSS SSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, STRUCT, UDT
NSNSNS
result PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, STRUCT, UDT
lambdainputCreateMap`map`Create a mapNoneprojectkeySSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
S PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP
valueSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
S PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP
CreateNamedStruct`named_struct`, `struct`Creates a struct with the given field names and valuesNoneprojectname NS S
valueSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
result NS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
Cot`cot`CotangentNoneprojectinputCurrentRow$ Special boundary for a window frame, indicating stopping at the current rowNoneprojectresult S S
resultDateAdd`date_add`Returns the date that is num_days after start_dateNoneprojectstartDate S S
lambdainput days SSS NS NS S
CreateArray`array` Returns an array with the given elementsNoneprojectargSSSSSSSSS*SS*SNSNSPS* (missing nested BINARY, CALENDAR, MAP, STRUCT, UDT)NSNSNS
resultDateAddInterval Adds interval to dateNoneprojectstart S PS* (missing nested BINARY, CALENDAR, MAP, STRUCT, UDT)
lambdaargNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
resultinterval PS
month intervals are not supported;
Literal value only
NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
CreateNamedStruct`named_struct`, `struct`Creates a struct with the given field names and valuesNoneprojectnameresult S S
valueSSSSSSSSS*SS*SNSNSPS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)NS
resultDateDiff`datediff`Returns the number of days from startDate to endDateNoneprojectlhs S PS* (missing nested BINARY, CALENDAR, UDT)
lambdanamerhs S NS
valueNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
result S NS
CurrentRow$ Special boundary for a window frame, indicating stopping at the current rowNoneprojectresultExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
DateFormatClass`date_format`Converts timestamp to a value of string in the format specified by the date formatNoneprojecttimestamp PS
UTC is only supported TZ for TIMESTAMP
S
DateAdd`date_add`Returns the date that is num_days after start_dateNoneprojectstartDatestrfmt S PS
A limited number of formats are supported;
Literal value only
daysresult SSS S
resultDateSub`date_sub`Returns the date that is num_days before start_dateNoneprojectstartDate
lambdastartDate days SSS NS
daysresult NSNSNS S
resultDayOfMonth`dayofmonth`, `day`Returns the day of the month from a date or timestampNoneprojectinput NSS
DateAddInterval Adds interval to dateNoneprojectstartresult S S
intervalDayOfWeek`dayofweek`Returns the day of the week (1 = Sunday...7=Saturday)Noneprojectinput S PS (month intervals are not supported; Literal value only) S S
lambdastartDayOfYear`dayofyear`Returns the day of the year from a date or timestampNoneprojectinput NSS
intervalresult S NS
DenseRank`dense_rank`Window function that returns the dense rank value within the aggregation windowNonewindoworderingSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNSNSNSNS
result S NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
DateDiff`datediff`Returns the number of days from startDate to endDateNoneDivide`/`DivisionNone project lhs S PS
max DECIMAL precision of 18
S PS
max DECIMAL precision of 18
S S PS
Because of Spark's inner workings the full range of decimal precision (even for 64-bit values) is not supported.;
max DECIMAL precision of 18
lambdalhs ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
ElementAt`element_at`Returns element of array at given(1-based) index in value if column is array. Returns value for the given key in value if column is mapNoneprojectarray/map NS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
If it's map, only string is supported.;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
rhsindex/keyNSNSNSPS
ints are only supported as array indexes, not as maps keys;
Literal value only
NSNSNSNSNSPS
strings are only supported as map keys, not array indexes;
Literal value only
NSNSNSNSNSNSNSNS
resultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
EndsWith Ends withNoneprojectsrc NS S
resultsearch NS PS
Literal value only
DateFormatClass`date_format`Converts timestamp to a value of string in the format specified by the date formatNoneprojecttimestampresultS S*
strfmt PS (A limited number of formats are supported; Literal value only) EqualNullSafe`<=>`Check if the values are equal including nulls <=>NoneprojectlhsSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNS NSNS
rhsSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNS NSNS
resultS S
lambdatimestamp EqualTo`=`, `==`Check if the values are equalNoneprojectlhsSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNS NSNS
rhsSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNS NSNS
resultS
strfmt
ASTlhsSSSSSNSNSNSPS
UTC is only supported TZ for TIMESTAMP
NSNSNSNSNS NS NSNS
rhsSSSSSNSNSNSPS
UTC is only supported TZ for TIMESTAMP
NSNSNSNSNSNS NSNS
resultS NS
DateSub`date_sub`Returns the date that is num_days before start_dateNoneprojectstartDate ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
Exp`exp`Euler's number e raised to a powerNoneprojectinput
daysresult SSS S
result ASTinput
lambdastartDateresult S NS
daysExplode`explode`, `explode_outer`Given an input array produces a sequence of rows for each value in the arrayNoneprojectinput NSNSNS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
DayOfMonth`dayofmonth`, `day`Returns the day of the month from a date or timestampExpm1`expm1`Euler's number e raised to a power minus 1 None project input S
result S S
lambdaAST input S NS NS S
DayOfWeek`dayofweek`Returns the day of the week (1 = Sunday...7=Saturday)NoneFloor`floor`Floor of a numberNone project input S S PS
max DECIMAL precision of 18
S S S PS
max DECIMAL precision of 18
lambdainputFromUnixTime`from_unixtime`Get the string from a unix timestampNoneprojectsec S NS
resultformat NS PS
Only a limited number of formats are supported;
Literal value only
DayOfYear`dayofyear`Returns the day of the year from a date or timestampNoneprojectinputresult S S
result S ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
lambdainputGetArrayItem Gets the field at `ordinal` in the ArrayNoneprojectarray NS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
resultordinal NSPS
Literal value only
DenseRank`dense_rank`Window function that returns the dense rank value within the aggregation windowNonewindoworderingresult S S SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NSNSNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
resultGetJsonObject`get_json_object`Extracts a json object from pathNoneprojectjson S S
Divide`/`DivisionNoneprojectlhspath S S*PS
Literal value only
rhsresult S S*S
resultGetMapValue Gets Value from a Map based on a keyNoneprojectmap S PS* (Because of Spark's inner workings the full range of decimal precision (even for 64-bit values) is not supported.) PS
unsupported child types BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
lambdalhskeyNSNSNSNSNSNSNSNSNSPS
Literal value only
NSNSNSNSNSNSNSNS
resultNSNSNSNSNSNSNSNSNSSNSNSNSNSNSNSNSNS
GetStructField Gets the named field of the structNoneprojectinput NS NS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
rhs resultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
GetTimestamp Gets timestamps from strings using given pattern.NoneprojecttimeExp NS SPS
UTC is only supported TZ for TIMESTAMP
S NS
resultformat NS NSPS
A limited number of formats are supported;
Literal value only
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
ElementAt`element_at`Returns element of array at given(1-based) index in value if column is array. Returns value for the given key in value if column is map.Noneprojectarray/mapresult PS
UTC is only supported TZ for TIMESTAMP
PS* (missing nested BINARY, CALENDAR, UDT)PS* (If it's map, only string is supported.; missing nested BINARY, CALENDAR, UDT)
index/keyNSNSNSPS (ints are only supported as array indexes, not as maps keys; Literal value only)NSNSNSNSNSPS (strings are only supported as map keys, not array indexes; Literal value only)NSNSNSGreaterThan`>`> operatorNoneprojectlhsSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
S NS NS NS NS NS
resultrhs S S SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NSPS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)NS NS NS
lambdaarray/map resultS NSNS
index/keyNSNSNSNSNSNSNSASTlhsSSSSS NS NS NSPS
UTC is only supported TZ for TIMESTAMP
NS NS NS NS NS NS NS NS
resultNSNSNSNSNSNSNSrhsSSSSS NS NS NSPS
UTC is only supported TZ for TIMESTAMP
NS NS NS NS NS NS NS NS
EndsWith Ends withNoneprojectsrc S
search PS (Literal value only)
result S
lambdasrc NS
search NS
resultNS ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
EqualNullSafe`<=>`Check if the values are equal including nulls <=>GreaterThanOrEqual`>=`>= operator None project lhsS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NSS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NS
lambdaAST lhsSSSSS NS NS NSNSNSNSNSNSNSPS
UTC is only supported TZ for TIMESTAMP
NS NS NS
rhsSSSSS NS NS NSNSNSNSNSNSNSPS
UTC is only supported TZ for TIMESTAMP
NS NS NS
resultNSS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
EqualTo`=`, `==`Check if the values are equalNoneprojectlhsGreatest`greatest`Returns the greatest value of all parameters, skipping null valuesNoneprojectparam S S SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NSNS
rhsresult S S SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NSNS
resultSHour`hour`Returns the hour component of the string/timestampNoneprojectinput PS
UTC is only supported TZ for TIMESTAMP
lambdalhsNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS NSNS
rhsNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS NSNS
resultNSresult S
Exp`exp`Euler's number e raised to a powerIf`if`IF expression NoneprojectinputprojectpredicateS S
trueValueSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
falseValueSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
result SSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
In`in`IN operatorNoneprojectvalueSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNS NSNS
listPS
Literal value only
PS
Literal value only
PS
Literal value only
PS
Literal value only
PS
Literal value only
PS
Literal value only
PS
Literal value only
PS
Literal value only
PS
UTC is only supported TZ for TIMESTAMP;
Literal value only
PS
Literal value only
PS
max DECIMAL precision of 18;
Literal value only
NSNSNSNS NSNS
lambdainputresultS NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
InSet INSET operatorNoneprojectinputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNS NSNS
resultS NS
Explode`explode`, `explode_outer`Given an input array produces a sequence of rows for each value in the array.NoneInitCap`initcap`Returns str with the first letter of each word in uppercase. All other letters are in lowercaseThis is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. project input S PS* (missing nested BINARY, CALENDAR, MAP, STRUCT, UDT)NS
S PS* (missing nested BINARY, CALENDAR, MAP, UDT)
Expm1`expm1`Euler's number e raised to a power minus 1NoneprojectinputInputFileBlockLength`input_file_block_length`Returns the length of the block being read, or -1 if not availableNoneprojectresult S S
InputFileBlockStart`input_file_block_start`Returns the start offset of the block being read, or -1 if not availableNoneproject result S S
lambdainputInputFileName`input_file_name`Returns the name of the file being read, or empty string if not availableNoneprojectresult NS S
result IntegralDivide`div`Division with a integer resultNoneprojectlhs S NS PS
max DECIMAL precision of 18
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
Floor`floor`Floor of a numberNoneprojectinputrhs S S S* PS
max DECIMAL precision of 18
S S S*
lambdaIsNaN`isnan`Checks if a value is NaNNoneproject input NS NSSS NS
resultS NS NS NS
FromUnixTime`from_unixtime`Get the string from a unix timestampNoneprojectsec
IsNotNull`isnotnull`Checks if a value is not nullNoneprojectinput S SSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
formatresultS PS (Only a limited number of formats are supported; Literal value only)
IsNull`isnull`Checks if a value is nullNoneprojectinputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
resultS S
lambdasec ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
KnownFloatingPointNormalized Tag to prevent redundant normalizationNoneprojectinput NS SS
format result SS NS
KnownNotNull Tag an expression as known to not be nullNoneprojectinputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
NSSSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, UDT
NS
resultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
NSSSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, UDT
NS
Lag`lag`Window function that returns N entries behind this oneNonewindowinputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
offset S NS
GetArrayItem Gets the field at `ordinal` in the ArrayNonedefaultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
resultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
LambdaFunction Holds a higher order SQL functionNone projectarrayfunctionSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
argumentsSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
resultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
LastDay`last_day`Returns the last day of the month which the date belongs toNoneprojectinput S PS* (missing nested BINARY, CALENDAR, UDT)
ordinalresult PS (Literal value only) S
resultSSLead`lead`Window function that returns N entries ahead of this oneNonewindowinputSS S S S S S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NSPS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
lambdaarray PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
ordinaloffset NSS
resultNSNSNSNSNSNSNSNSNSNSdefaultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
S NS NSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
resultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
S NS NSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, MAP, UDT
NS
UDT
GetJsonObject`get_json_object`Extracts a json object from pathNoneprojectjson Least`least`Returns the least value of all parameters, skipping null valuesNoneprojectparam SSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNS NSNS
path PS (Literal value only) resultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNS NSNS
resultLength`length`, `character_length`, `char_length`String character length or binary byte lengthNoneprojectinput S NS
GetMapValue Gets Value from a Map based on a keyNoneprojectmapresult S PS (missing nested BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT)
keyNSNSNSNSNSNSNSNSNSPS (Literal value only)NSNSNSLessThan`<`< operatorNoneprojectlhsSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
S NS NS NS NS NS
resultNSNSNSNSNSNSNSNSNSrhsSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
S NS NS NSNSNSNS NS NS
lambdamapresultS NS
keyNSNSNSNSNSNSNSASTlhsSSSSS NS NS NSPS
UTC is only supported TZ for TIMESTAMP
NS NS NS NS NS NS NS NS
resultNSNSNSNSNSNSNSrhsSSSSS NS NS NSPS
UTC is only supported TZ for TIMESTAMP
NS NS NS NS NS NS NS NS
GetStructField Gets the named field of the structNoneprojectinputresultS PS* (missing nested BINARY, CALENDAR, UDT)
resultLessThanOrEqual`<=`<= operatorNoneprojectlhs S S SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NSPS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)NS NS NS
lambdainputrhsSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNS NSNS
resultS NS
resultASTlhsSSSSSNS NS NSPS
UTC is only supported TZ for TIMESTAMP
NS NS NS NS NS NS NS NS
rhsSSSSS NS NS NSPS
UTC is only supported TZ for TIMESTAMP
NS NS NS NS NS NS
GetTimestamp Gets timestamps from strings using given pattern.NoneprojecttimeExp SS*S NSNS
formatresultS PS (A limited number of formats are supported; Literal value only)
result S* ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
lambdatimeExpLike`like`LikeNoneprojectsrc NSNSNS S
formatsearch NSPS
Literal value only
resultS NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
GreaterThan`>`> operatorNoneprojectlhsLiteral Holds a static value from the queryNoneprojectresult S S SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NSNSNS NSSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
rhsSSSASTresult S S S S SS* SS* S NSPS
UTC is only supported TZ for TIMESTAMP
NSNSNSNSNS NS NS NS NS
resultSLog`ln`Natural logNoneprojectinput S
lambdalhsNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSresult S NSNS
rhsNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS NSNS
resultNSLog10`log10`Log base 10Noneprojectinput S
GreaterThanOrEqual`>=`>= operatorNoneprojectlhsSSSSSSSSS*SS*result SNSNSNS NSNS
rhsSSSSSSSSS*SS*Log1p`log1p`Natural log 1 + exprNoneprojectinput SNSNSNS NSNS
resultS S
lambdalhsNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSLog2`log2`Log base 2Noneprojectinput S NSNS
rhsNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS NSNS
resultNS S
Greatest`greatest`Returns the greatest value of all parameters, skipping null valuesNoneprojectparamSSSSSSSSS*SS*Logarithm`log`Log variable baseNoneprojectvalue SNSNSNS NSNS
resultSSSSSSSSS*SS*base SNSNSNS NSNS
lambdaparamNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS NSNS
resultNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS NSNS S
ExpressionUDT
Hour`hour`Returns the hour component of the string/timestampNoneLower`lower`, `lcase`String lowercase operatorThis is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. project input S* S S S
lambdaMakeDecimal Create a Decimal from an unscaled long value for some aggregation optimizationsNoneproject input S NS NS PS
max DECIMAL precision of 18
If`if`IF expressionNoneprojectpredicateSMapEntries`map_entries`Returns an unordered array of all entries in the given mapNoneprojectinput PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
trueValueSSSSSSSSS*SS*SNSNSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NS
falseValueSSSSSSSSS*SS*SNSNSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NS
resultSSSSSSSSS*SS*SNSNSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NS
lambdapredicateNS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
trueValueNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
falseValueNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
resultNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
In`in`IN operatorNoneprojectvalueSSSSSSSSS*SS*SNSNSNS NSNS
listPS (Literal value only)PS (Literal value only)PS (Literal value only)PS (Literal value only)PS (Literal value only)PS (Literal value only)PS (Literal value only)PS (Literal value only)PS* (Literal value only)PS (Literal value only)PS* (Literal value only)NSNSNSNS NSNS
resultS MapKeys`map_keys`Returns an unordered array containing the keys of the mapNoneprojectinput
lambdavalueNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NSNS
listNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS NSNS
resultNS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
InSet INSET operatorNoneMapValues`map_values`Returns an unordered array containing the values of the mapNone project inputSSSSSSSSS*SS*SNSNSNS NSNS
resultS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
lambdainputNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS NSNS
resultNS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
InitCap`initcap`Returns str with the first letter of each word in uppercase. All other letters are in lowercaseThis is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly.Md5`md5`MD5 hash operatorNone project input S S
lambdaMinute`minute`Returns the minute component of the string/timestampNoneproject input PS
UTC is only supported TZ for TIMESTAMP
NS S NS
InputFileBlockLength`input_file_block_length`Returns the length of the block being read, or -1 if not availableNoneMonotonicallyIncreasingID`monotonically_increasing_id`Returns monotonically increasing 64-bit integersNone project result
lambdaresultExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
Month`month`Returns the month from a date or timestampNoneprojectinput NS S
InputFileBlockStart`input_file_block_start`Returns the start offset of the block being read, or -1 if not availableNoneproject result S
lambdaresultMultiply`*`MultiplicationNoneprojectlhs SSSSSS PS
max DECIMAL precision of 18
NS
rhs SSSSSS PS
max DECIMAL precision of 18
InputFileName`input_file_name`Returns the name of the file being read, or empty string if not availableNoneproject result SSSSSS PS
Because of Spark's inner workings the full range of decimal precision (even for 64-bit values) is not supported.;
max DECIMAL precision of 18
ASTlhs NSNSSSSS NS
rhs NSNSSSS S NS
lambda result NSNSSSSS NS
Murmur3Hash`hash`Murmur3 hash operatorNoneprojectinputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNS NSNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT
NS
result S
IntegralDivide`div`Division with a integer resultNoneNaNvl`nanvl`Evaluates to `left` iff left is not NaN, `right` otherwiseNone project lhs S SS S* S SS S* SS
NamedLambdaVariable A parameter to a higher order SQL functionNoneprojectresultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
Not`!`, `not`Boolean not operatorNoneprojectinputS
lambdalhsresultS NS NS
rhsASTinputS NS NS
resultS NS UDT
IsNaN`isnan`Checks if a value is NaNNoneprojectinputOr`or`Logical ORNoneprojectlhsS SS
resultrhs S
lambdainput resultS NSNS
resultNSASTlhsS
IsNotNull`isnotnull`Checks if a value is not nullNoneprojectinputSSSSSSSSS*SS*SNSNSPS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)NS
resultrhs S
lambdainputNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
resultNSS
IsNull`isnull`Checks if a value is nullNoneprojectinputPmod`pmod`PmodNoneprojectlhs S S S S S S NS
rhs SS S SS* SS* S NSNSPS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)NS
result SS SSSS NS
PosExplode`posexplode_outer`, `posexplode`Given an input array produces a sequence of rows for each value in the arrayNoneprojectinput
lambdainputNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
resultNS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
KnownFloatingPointNormalized Tag to prevent redundant normalizationNoneprojectinputPow`pow`, `power`lhs ^ rhsNoneprojectlhs S
rhs S S
ASTlhs S
lambdainputrhs NSNS S NSNS S UDT
KnownNotNull Tag an expression as known to not be nullNonePreciseTimestampConversion Expression used internally to convert the TimestampType to Long and back without losing precision, i.e. in microseconds. Used in time windowingNone project input SSSSSSSSS*SS*NSSSPS* (missing nested NULL, UDT)PS* (missing nested NULL, UDT)PS* (missing nested NULL, UDT)NS PS
UTC is only supported TZ for TIMESTAMP
result SSSSSSSSS*SS*NSSSPS* (missing nested NULL, UDT)PS* (missing nested NULL, UDT)PS* (missing nested NULL, UDT)NS PS
UTC is only supported TZ for TIMESTAMP
lambdaPromotePrecision PromotePrecision before arithmetic operations between DecimalType dataNoneproject input PS
max DECIMAL precision of 18
result PS
max DECIMAL precision of 18
PythonUDF UDF run in an external python process. Does not actually run on the GPU, but the transfer of data to/from it can be acceleratedNoneaggregationparamSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS
resultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, MAP
NSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, MAP
reductionparamSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NSNSNSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS
resultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
S NS NS NS PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, MAP
NSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, MAP
windowparamSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
S NS NS NS NSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NSNSNSNSNSNSNSNSNSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS
Lag`lag`Window function that returns N entries behind this oneNonewindowinputSresult S S SS S SS* SS*PS
UTC is only supported TZ for TIMESTAMP
S NS NSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested BINARY, CALENDAR, MAP, UDT) NS
offset S PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, MAP
NSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, MAP
defaultSprojectparam S S SS S SS* SS*PS
UTC is only supported TZ for TIMESTAMP
S NS NSPS* (missing nested BINARY, CALENDAR, MAP, UDT) NSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT
NS
S S SS*SS*PS
UTC is only supported TZ for TIMESTAMP
S NS NSPS* (missing nested BINARY, CALENDAR, MAP, UDT) NSPS* (missing nested BINARY, CALENDAR, MAP, UDT) PS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, MAP
NSPS
UTC is only supported TZ for child TIMESTAMP;
unsupported child types DECIMAL, NULL, BINARY, MAP
LastDay`last_day`Returns the last day of the month which the date belongs toNoneQuarter`quarter`Returns the quarter of the year for date, in the range 1 to 4None project input S S
lambdainput Rand`random`, `rand`Generate a random column with i.i.d. uniformly distributed values in [0, 1)Noneprojectseed SS NS NS
Lead`lead`Window function that returns N entries ahead of this oneNonewindowinputSSSSSSSSS*SS*SNSNSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NS
offset S
defaultSSSSSSSSS*SS*SNSNSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NS
resultSSSSSSSSS*SS*SNSNSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NS
ExpressionUDT
Least`least`Returns the least value of all parameters, skipping null valuesNoneprojectparamRank`rank`Window function that returns the rank value within the aggregation windowNonewindowordering S S SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NS NS NS NS NS
result SSSSSSSSS*SS*SNSNSNS NSNS
lambdaparamNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS NSNS
resultNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS NSNS
Length`length`, `character_length`, `char_length`String character length or binary byte lengthRegExpReplace`regexp_replace`RegExpReplace support for string literal input patterns Noneprojectinputprojectstr S NS
resultregex S PS
very limited regex support;
Literal value only
lambdainputrep NSPS
Literal value only
NS NS S
LessThan`<`< operatorNoneRemainder`%`, `mod`Remainder or moduloNone project lhs S S S S S SSSS*SS*SNSNSNS NSNS
rhs S S S S S SSSS*SS*SNSNSNS NSNS
result SSSSS S NS
Rint`rint`Rounds up a double value to the nearest double equal to an integerNoneprojectinput S
lambdalhsNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS NSNS
rhsNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS NSNS
resultNS S
LessThanOrEqual`<=`<= operatorNoneprojectlhsSSSSSSSSS*SS*SNSNSNSASTinput NSNS
rhsSSSSSSSSS*SS* SNSNSNS NSNS
resultS S
lambdalhsNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSRound`round`Round an expression to d decimal places using HALF_UP rounding modeNoneprojectvalue SSSSPS
result may round slightly differently
PS
result may round slightly differently
PS
max DECIMAL precision of 18
NSNS
rhsNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS NSNS
resultNSscale S
result SSSSSS PS
max DECIMAL precision of 18
UDT
Like`like`LikeNoneprojectsrcRowNumber`row_number`Window function that returns the index for the row within the aggregation windowNonewindowresult S S
searchScalaUDF User Defined Function, support requires the UDF to implement a RAPIDS accelerated interfaceNoneprojectparamSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SSSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS
resultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SSSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS
Second`second`Returns the second component of the string/timestampNoneprojectinput PS
UTC is only supported TZ for TIMESTAMP
PS (Literal value only)
resultS S
lambdasrc ShiftLeft`shiftleft`Bitwise shift left (<<)Noneprojectvalue SS NS
searchamount S NS
resultNS SS
Literal Holds a static value from the queryNoneprojectresultSSSSSSSSS*SS*SNSSPS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)NS
lambdaresultNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
Log`ln`Natural logNoneprojectinput ShiftRight`shiftright`Bitwise shift right (>>)Noneprojectvalue SS S
resultamount S S
lambdainput result SS NS
result ShiftRightUnsigned`shiftrightunsigned`Bitwise unsigned shift right (>>>)Noneprojectvalue SS NS
Log10`log10`Log base 10Noneprojectinputamount S S SS S
lambdaSignum`sign`, `signum`Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positiveNoneproject input NSS NSS UDT
Log1p`log1p`Natural log 1 + exprSin`sin`Sine None project input
lambdaAST input NSS NSS
Log2`log2`Log base 2Sinh`sinh`Hyperbolic sine None project input
lambdaAST input NSS NSS
Logarithm`log`Log variable baseNoneprojectvalue Size`size`, `cardinality`The size of an array or a mapNoneprojectinput S PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
baseresult S S
resultSortArray`sort_array`Returns a sorted array with the input array and the ascending / descending orderNoneprojectarray S PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
lambdavalueascendingOrderS NS
baseresult NS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
SortOrder Sort orderNoneprojectinputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
resultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
SparkPartitionID`spark_partition_id`Returns the current partition idNoneproject result S NS
Lower`lower`, `lcase`String lowercase operatorThis is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly.projectinputSpecifiedWindowFrame Specification of the width of the group (or "frame") of input rows around which a window function is evaluatedNoneprojectlower SSSSNSNS NS S
upper SSSSNSNS NS S
result SSSSNSNS NS S
lambdaSqrt`sqrt`Square rootNoneproject input S NS S NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
MakeDecimal Create a Decimal from an unscaled long value for some aggregation optimizationsNoneprojectAST input S S S S*
Md5`md5`MD5 hash operatorNoneprojectinputStartsWith Starts withNoneprojectsrc S S
resultsearch SPS
Literal value only
lambdainputresultS NS
resultStringLPad`lpad`Pad a string on the leftNoneprojectstr NSS
Minute`minute`Returns the minute component of the string/timestampNoneprojectinputlen PS
Literal value only
S*
resultpad S PS
Literal value only
lambdainputresult NS S
resultExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
StringLocate`position`, `locate`Substring search operatorNoneprojectsubstr NS PS
Literal value only
MonotonicallyIncreasingID`monotonically_increasing_id`Returns monotonically increasing 64-bit integersNoneprojectresultstr S S
lambdaresultstart PS
Literal value only
NS
Month`month`Returns the month from a date or timestampNoneprojectinputresult S S
resultStringRPad`rpad`Pad a string on the rightNoneprojectstr S S
lambdainputlen PS
Literal value only
NS
resultpad NS PS
Literal value only
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
Multiply`*`MultiplicationNoneprojectlhs SSSSSS S*
rhs SSSSSS S*
result SSSSSS PS* (Because of Spark's inner workings the full range of decimal precision (even for 64-bit values) is not supported.)
lambdalhs NSNSNSNSNSNS S NS
rhs NSNSNSNSNSNS NS StringRepeat`repeat`StringRepeat operator that repeats the given strings with numbers of times given by repeatTimesNoneprojectinput
result NSNSNSNSNSNS S NS
Murmur3Hash`hash`Murmur3 hash operatorNoneprojectinputSSSSSSSSS*SS*SNSNSNSNSPS* (missing nested BINARY, CALENDAR, ARRAY, MAP, UDT)NS
resultrepeatTimes
lambdainputNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
result NS
NaNvl`nanvl`Evaluates to `left` iff left is not NaN, `right` otherwiseNoneprojectlhs SS
rhs SS StringReplace`replace`StringReplace operatorNoneprojectsrc
result SS
lambdalhssearch NSNS PS
Literal value only
rhsreplace NSNS PS
Literal value only
NSNS S UDT
Not`!`, `not`Boolean not operatorStringSplit`split`Splits `str` around occurrences that match `regex` NoneprojectinputSprojectstr S
resultSregexp PS
very limited subset of regex supported;
Literal value only
lambdainputNSlimit PS
Literal value only
resultNS S
Or`or`Logical ORNoneStringTrim`trim`StringTrim operatorNone projectlhsSsrc S
rhsStrimStr PS
Literal value only
resultS S
lambdalhsNSStringTrimLeft`ltrim`StringTrimLeft operatorNoneprojectsrc S
rhsNStrimStr PS
Literal value only
resultNS S
Pmod`pmod`PmodNoneStringTrimRight`rtrim`StringTrimRight operatorNone projectlhssrc SSSSSS NS S
rhstrimStr SSSSSS NS PS
Literal value only
result SSSSSS NS S
lambdalhsSubstring`substr`, `substring`Substring operatorNoneprojectstr NSNSNSNSNSNS S NS
pos PS
Literal value only
rhslen PS
Literal value only
NSNSNSNSNSNS NS
result NSNSNSNSNSNS NS S NS
UDT
PosExplode`posexplode_outer`, `posexplode`Given an input array produces a sequence of rows for each value in the array.NoneprojectinputSubstringIndex`substring_index`substring_index operatorNoneprojectstr S PS* (missing nested BINARY, CALENDAR, MAP, STRUCT, UDT)
resultdelim PS
only a single character is allowed;
Literal value only
PS* (missing nested BINARY, CALENDAR, MAP, UDT)
Pow`pow`, `power`lhs ^ rhsNoneprojectlhscount PS
Literal value only
S
rhsresult S S
resultSubtract`-`SubtractionNoneprojectlhs SSSSSS PS
max DECIMAL precision of 18
NS
rhs SSSSSS PS
max DECIMAL precision of 18
NS
result SSSSSS PS
max DECIMAL precision of 18
NS
lambdaAST lhs NSNSSSSS NS NS
rhs NSNSSSSS NS NS
rhsresult NSNSSSSS NS NS
Tan`tan`TangentNoneprojectinput S NSS
PromotePrecision PromotePrecision before arithmetic operations between DecimalType dataNoneprojectAST input S S* S S*
lambdaTanh`tanh`Hyperbolic tangentNoneproject input S NS S NS
PythonUDF UDF run in an external python process. Does not actually run on the GPU, but the transfer of data to/from it can be accelerated.NoneaggregationparamSSSSSSSSS*SNSNSNSNSPS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT)NS
resultSSSSSSSSS*SNSNSNSASTinput PS* (missing nested DECIMAL, NULL, BINARY, MAP)NSPS* (missing nested DECIMAL, NULL, BINARY, MAP)
reductionparamSSSSSSSSS*SNSNSNSNSPS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT)NS
resultSSSSSSSSS*SNSNSNS PS* (missing nested DECIMAL, NULL, BINARY, MAP)NSPS* (missing nested DECIMAL, NULL, BINARY, MAP)
windowparamSSSSSSSSS*SNSNSNSNSPS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT)NS
resultSSSSSSSSS*SNSNSNS PS* (missing nested DECIMAL, NULL, BINARY, MAP)NSPS* (missing nested DECIMAL, NULL, BINARY, MAP)
projectparamSSSSS SSSS*SNSNSNSNSPS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT)NS
result SSSSSSSSS*SNSNSNS PS* (missing nested DECIMAL, NULL, BINARY, MAP)NSPS* (missing nested DECIMAL, NULL, BINARY, MAP)
UDT
Quarter`quarter`Returns the quarter of the year for date, in the range 1 to 4NoneprojectinputTimeAdd Adds interval to timestampNoneprojectstart S PS
UTC is only supported TZ for TIMESTAMP
resultinterval S PS
month intervals are not supported;
Literal value only
lambdainputresult NS PS
UTC is only supported TZ for TIMESTAMP
resultTimeSub Subtracts interval from timestampNoneprojectstart NS PS
UTC is only supported TZ for TIMESTAMP
Rand`random`, `rand`Generate a random column with i.i.d. uniformly distributed values in [0, 1)Noneprojectseedinterval SS PS
months not supported;
Literal value only
S PS
UTC is only supported TZ for TIMESTAMP
lambdaseedToDegrees`degrees`Converts radians to degreesNoneprojectinput NSNS S NSS
Rank`rank`Window function that returns the rank value within the aggregation windowToRadians`radians`Converts degrees to radians NonewindoworderingSSSSSSSSS*SS*SNSNSNSNSNSNS
resultprojectinput S S
RegExpReplace`regexp_replace`RegExpReplace support for string literal input patternsNoneprojectstrresult S S
regex ToUnixTimestamp`to_unix_timestamp`Returns the UNIX timestamp of the given timeNoneprojecttimeExp PS (very limited regex support; Literal value only)SPS
UTC is only supported TZ for TIMESTAMP
S
repformat PS (Literal value only)PS
A limited number of formats are supported;
Literal value only
S S
lambdastrTransformKeys`transform_keys`Transform keys in a map using a transform functionNoneprojectargument NS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
regex functionSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNS NS
rep NS NSNS
result NS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
UDT
Remainder`%`, `mod`Remainder or moduloNoneTransformValues`transform_values`Transform values in a map using a transform functionNone projectlhsargument SSSSSS NS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
rhs functionSSSS S S S SPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
result NS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
resultUnaryMinus`negative`Negate a numeric valueNoneprojectinput S S NS PS
max DECIMAL precision of 18
NS
lambdalhs NSNSNSNSNSNSresult SSSSSS NS PS
max DECIMAL precision of 18
NS
rhsASTinput NS NSNSNSNSNSSSSS NS NS NS NSNSNSNSNSSSSS NS NS
Rint`rint`Rounds up a double value to the nearest double equal to an integerUnaryPositive`positive`A numeric value with a + in front of it None project input SSSSS S PS
max DECIMAL precision of 18
NS
result SSSSS S PS
max DECIMAL precision of 18
NS
lambdaAST input SSSSSS NS NS
result SSSSSS NS NS
UnboundedFollowing$ Special boundary for a window frame, indicating all rows preceding the current rowNoneprojectresult
Round`round`Round an expression to d decimal places using HALF_UP rounding modeNoneprojectvalue SSSSPS (result may round slightly differently)PS (result may round slightly differently) S* S
scaleUnboundedPreceding$ Special boundary for a window frame, indicating all rows preceding the current rowNoneprojectresult S S
result SSSSSSUnixTimestamp`unix_timestamp`Returns the UNIX timestamp of current or specified timeNoneprojecttimeExp S* SPS
UTC is only supported TZ for TIMESTAMP
S
lambdavalue NSNSNSNSNSNS NS
format
scale NSPS
A limited number of formats are supported;
Literal value only
result S
result NSNSNSNSNSNS NS UDT
RowNumber`row_number`Window function that returns the index for the row within the aggregation windowNonewindowresult S
ScalaUDF User Defined Function, support requires the UDF to implement a RAPIDS accelerated interfaceNoneprojectparamSSSSSSSSS*SS*SSSPS* (missing nested UDT)PS* (missing nested UDT)PS* (missing nested UDT)NS
resultSSSSSSSSS*SS*SSSPS* (missing nested UDT)PS* (missing nested UDT)PS* (missing nested UDT)NS
lambdaparamNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
resultNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
Second`second`Returns the second component of the string/timestampNoneUnscaledValue Convert a Decimal to an unscaled long value for some aggregation optimizationsNone project input S* PS
max DECIMAL precision of 18
S S
lambdaUpper`upper`, `ucase`String uppercase operatorThis is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly.project input NS S NS S
ShiftLeft`shiftleft`Bitwise shift left (<<)NoneprojectvalueWeekDay`weekday`Returns the day of the week (0 = Monday...6=Sunday)Noneprojectinput SS S
amountresult
result WindowExpression Calculates a return value for every input row of a table based on a group (or "window") of rowsNonewindowwindowFunction S SSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
windowSpec SSSSNSNS PS
max DECIMAL precision of 18
S
lambdaresultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
WindowSpecDefinition Specification of a window function, indicating the partitioning-expression, the row ordering, and the width of the windowNoneprojectpartitionSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NS
value SSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
S NS NS NSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NS
amountresultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NS
Year`year`Returns the year from a date or timestampNoneprojectinput NS S NSNSS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDTAggregateExpression Aggregate expressionNoneaggregationaggFuncSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
ShiftRight`shiftright`Bitwise shift right (>>)NoneprojectvaluefilterS SS
amountresultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
reductionaggFuncSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
filterS S
resultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
windowaggFuncSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
filterS SS
lambdavalue resultSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
ApproximatePercentile`percentile_approx`, `approx_percentile`Approximate percentileThis is disabled by default because The GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark. See the compatibility guide for more information.reductioninput NS NSNSNSNSNSNSNS NS
percentage NS NS
amountaccuracy
result NS NSNSNSNSNSNSNS NS NS
aggregationinput SSSSSSNSNS PS
max DECIMAL precision of 18
ShiftRightUnsigned`shiftrightunsigned`Bitwise unsigned shift right (>>>)Noneprojectvaluepercentage SS S S
amountaccuracy
result S SSSSSNSNS PS
max DECIMAL precision of 18
PS
max child DECIMAL precision of 18;
unsupported child types DATE, TIMESTAMP
windowinput NSNSNSNSNSNSNSNS NS
lambdavaluepercentage NSNS NS NS
amountaccuracy
result NS NSNSNSNSNSNSNS NS NS
Average`avg`, `mean`Average aggregate operatorNoneaggregationinput SSSSSS NS
Signum`sign`, `signum`Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positiveNoneprojectinputresult NS
resultreductioninput SSSSSS NS S
result S
lambdainputNS NS
windowinput SSSSSS NS NS S NS UDT
Sin`sin`SineNoneprojectCollectList`collect_list`Collect a list of non-unique elements, not supported in reductionNonereduction inputNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
result S NS
aggregationinputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
result S PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
lambdawindow input SSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
result NS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
Sinh`sinh`Hyperbolic sineNoneprojectCollectSet`collect_set`Collect a set of unique elements, not supported in reductionNonereduction inputNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
result S NS
aggregationinputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNSNSNSNS
result S PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
lambdawindow inputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNSNSNSNS
result NS PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
Count`count`Count aggregate operatorNoneaggregationinputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NS
result S NS
Size`size`, `cardinality`The size of an array or a mapNoneprojectreduction input PS* (missing nested BINARY, CALENDAR, UDT)PS* (missing nested BINARY, CALENDAR, UDT)
result S
lambdainput SSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
S NS NS
result NS
SortArray`sort_array`Returns a sorted array with the input array and the ascending / descending orderNoneprojectarray PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT)
ascendingOrderS NSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NS
result S PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT)
lambdaarray windowinputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
S NS
ascendingOrder NS NSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NS
result S NS UDT
SortOrder Sort orderNoneprojectFirst`first_value`, `first`first aggregate operatorNoneaggregation input S SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NSNS PS* (missing nested BINARY, CALENDAR, ARRAY, STRUCT, UDT)PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
S S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NSNS PS* (missing nested BINARY, CALENDAR, ARRAY, STRUCT, UDT)PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
SparkPartitionID`spark_partition_id`Returns the current partition idNoneprojectresult reductioninput S
lambdaresult NS
SpecifiedWindowFrame Specification of the width of the group (or "frame") of input rows around which a window function is evaluatedNoneprojectlower S S S SNSNS NS S
upper SSSSNSNS NS S
result SSSSNSNS NS S
Sqrt`sqrt`Square rootNoneprojectinput S
result S
lambdainput NS
result NS
StartsWith Starts withNoneprojectsrc S
search PS (Literal value only)
resultS
lambdasrc NS
search NS
resultNS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
StringLPad`lpad`Pad a string on the leftNoneprojectstr S
len PS (Literal value only)
pad PS (Literal value only)
result S
lambdastr NS
len NS
pad NS
result NS
StringLocate`position`, `locate`Substring search operatorNoneprojectsubstr PS (Literal value only)
str S
start PS (Literal value only)
result S
lambdasubstr NS
str NS
start NS
result NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
StringRPad`rpad`Pad a string on the rightNoneprojectstr S
len PS (Literal value only)
pad PS (Literal value only)
result S
lambdastr NS
len NS
pad NS
result NS
StringReplace`replace`StringReplace operatorNoneprojectsrc S
search PS (Literal value only)
replace PS (Literal value only)
result S
lambdasrc NS
search NS
replace NS
result NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
StringSplit`split`Splits `str` around occurrences that match `regex`Noneprojectstr S
regexp PS (very limited subset of regex supported; Literal value only)
limit PS (Literal value only)
result S
lambdastr NS
regexp NS
limit NS
result NS
StringTrim`trim`StringTrim operatorNoneprojectsrc S
trimStr PS (Literal value only)
result S
lambdasrc NS
trimStr NS
result NS
StringTrimLeft`ltrim`StringTrimLeft operatorNoneprojectsrc S
trimStr PS (Literal value only)
result S
lambdasrc NS
trimStr NS
result NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
StringTrimRight`rtrim`StringTrimRight operatorNoneprojectsrc S
trimStr PS (Literal value only)
result S
lambdasrc NS
trimStr NS
result NS
Substring`substr`, `substring`Substring operatorNoneprojectstr S NS
pos PS (Literal value only)
len PS (Literal value only)
result S NS
lambdastr NS NS
pos NS
len NS
result NS NS
SubstringIndex`substring_index`substring_index operatorNoneprojectstr S
delim PS (only a single character is allowed; Literal value only)
count PS (Literal value only)
result S
lambdastr NS
delim NS
count NS
result NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
Subtract`-`SubtractionNoneprojectlhs SSSSSS S* NS
rhs SSSSSS S* NS
result SSSSSS S* NS
lambdalhs NSNSNSNSNSNS NS NS
rhs NSNSNSNSNSNS NS NS
result NSNSNSNSNSNS NS NS
Tan`tan`TangentNoneprojectinput S
result S
lambdainput NS
result NS
Tanh`tanh`Hyperbolic tangentNoneprojectinput S
result S
lambdainput NS
result NS
TimeAdd Adds interval to timestampNoneprojectstart S*
interval PS (month intervals are not supported; Literal value only)
result S*
lambdastart NS
interval NS
result NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
TimeSub Subtracts interval from timestampNoneprojectstart S*
interval PS (months not supported; Literal value only)
result S*
lambdastart NS
interval NS
result NS
ToDegrees`degrees`Converts radians to degreesNoneprojectinput S
result S
lambdainput NS
result NS
ToRadians`radians`Converts degrees to radiansNoneprojectinput S
result S
lambdainput NS
result NS
ToUnixTimestamp`to_unix_timestamp`Returns the UNIX timestamp of the given timeNoneprojecttimeExp SS*S
format PS (A limited number of formats are supported; Literal value only)
result S
lambdatimeExp NSNSNS
format NS
result NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
UnaryMinus`negative`Negate a numeric valueNoneprojectinput SSSSSS S* NS
result SSSSSS S* NS
lambdainput NSNSNSNSNSNS NS NS
result NSNSNSNSNSNS NS NS
UnaryPositive`positive`A numeric value with a + in front of itNoneprojectinput SSSSSS S* NS
result SSSSSS S* NS
lambdainput NSNSNSNSNSNS NS NS
result NSNSNSNSNSNS NS NS
UnboundedFollowing$ Special boundary for a window frame, indicating all rows preceding the current rowNoneprojectresult S
UnboundedPreceding$ Special boundary for a window frame, indicating all rows preceding the current rowNoneprojectresult S
UnixTimestamp`unix_timestamp`Returns the UNIX timestamp of current or specified timeNoneprojecttimeExp SS*S
format PS (A limited number of formats are supported; Literal value only)
result S
lambdatimeExp NSNSNS
format NS
result NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
UnscaledValue Convert a Decimal to an unscaled long value for some aggregation optimizationsNoneprojectinput S*
result S
Upper`upper`, `ucase`String uppercase operatorThis is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly.projectinput S
result S
lambdainput NS
result NS
WeekDay`weekday`Returns the day of the week (0 = Monday...6=Sunday)Noneprojectinput S
result S
lambdainput NS
result NS
WindowExpression Calculates a return value for every input row of a table based on a group (or "window") of rowsNonewindowwindowFunctionSSSSSSSSS*SS*SNSNSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NS
windowSpec SSSSNSNS S* S
resultSSSSSSSSS*SS*SNSNSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NSPS* (missing nested BINARY, CALENDAR, MAP, UDT)NS
WindowSpecDefinition Specification of a window function, indicating the partitioning-expression, the row ordering, and the width of the windowNoneprojectpartitionSSSSSSSSS*SS*SNSNSNSNSPS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT)NS
valueSSSSSSSSS*SS*SNSNSNSNSPS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT)NS
resultSSSSSSSSS*SS*SNSNSNSNSPS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT)NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
Year`year`Returns the year from a date or timestampNoneprojectinput S
result S
lambdainput NS
result NS
AggregateExpression Aggregate expressionNoneaggregationaggFuncSSSSSSSSS*SS*SNSNSPS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, UDT)NSNSNS
filterS
resultSSSSSSSSS*SS*SNSNSPS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, UDT)NSNSNS
reductionaggFuncSSSSSSSSS*SS*SNSNSPS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, UDT)NSNSNS
filterS
resultSS S S SPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SSS*SS*SNS NSPS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, UDT)NSNSNS
windowaggFuncSSSSSSSSS*SS*S NS NSPS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, UDT) NS NS NS
filterS
result S SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NSPS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, UDT)NSNSNS
Average`avg`, `mean`Average aggregate operatorNoneaggregationinput SSSSSS NS
result S NS
reductioninput SSSSSS NS
result S NS
window input SSSSSS NS
result S NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
CollectList`collect_list`Collect a list of non-unique elements, only supported in rolling window in current.NoneaggregationinputNSNSNSNSNS NS NS NSNS NS NSNSNSNSNSNSNS
result NS NSNSNSNSNSNSNSNSNSNSNS
reductioninputresultNSNSNS NS NS NSNS NS NS
Last`last`, `last_value`last aggregate operatorNoneaggregationinputSSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
S NS NSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
result SSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, UDT
NS
windowreduction input S SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SS* NS NS NS NS NSPS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS
result PS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, UDT) SSSSSSSSPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNSNSNSNS
CollectSet`collect_set`Collect a set of unique elements, only supported in rolling window in current.Noneaggregationwindow input NS NS
result NS
reductioninput NS NS NSNS
result NS
windowMax`max`Max aggregate operatorNoneaggregation input S S S S SPS
Input must not contain NaNs and spark.rapids.sql.hasNans must be false.
PS
Input must not contain NaNs and spark.rapids.sql.hasNans must be false.
SPS
UTC is only supported TZ for TIMESTAMP
SSS*SS*NS NSS NS NS NS NS NS
result PS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT)
Count`count`Count aggregate operatorNoneaggregationinput S S SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*NS S NS NS NS NSPS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS
result reductioninput SSSSSPS
Input must not contain NaNs and spark.rapids.sql.hasNans must be false.
PS
Input must not contain NaNs and spark.rapids.sql.hasNans must be false.
SPS
UTC is only supported TZ for TIMESTAMP
SNSSNSNSNS NSNS
reductioninputresult S S SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*NS S NS NS NS NSPS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS
result windowinput SSSSSPS
Input must not contain NaNs and spark.rapids.sql.hasNans must be false.
PS
Input must not contain NaNs and spark.rapids.sql.hasNans must be false.
SPS
UTC is only supported TZ for TIMESTAMP
SPS
max DECIMAL precision of 18
SNSNSNS NSNS
windowinputresult S S SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NS NS NS NSPS* (missing nested BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT) NS
result S
Expression SQL Functions(s) DescriptionUDT
First`first_value`, `first`first aggregate operatorMin`min`Min aggregate operator None aggregation inputS S SPS
Input must not contain NaNs and spark.rapids.sql.hasNans must be false.
PS
Input must not contain NaNs and spark.rapids.sql.hasNans must be false.
SSSS*PS
UTC is only supported TZ for TIMESTAMP
S NS S NS NS NSNS NS NS
S S SS*PS
UTC is only supported TZ for TIMESTAMP
S NS S NS NS NSNS NS NS
S S SPS
Input must not contain NaNs and spark.rapids.sql.hasNans must be false.
PS
Input must not contain NaNs and spark.rapids.sql.hasNans must be false.
SSSS*PS
UTC is only supported TZ for TIMESTAMP
S NS S NS NS NSNS NS NS
S S SS*PS
UTC is only supported TZ for TIMESTAMP
S NS S NS NS NSNS NS NS
window inputNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
resultNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
Last`last`, `last_value`last aggregate operatorNoneaggregationinputS S S S S SPS
Input must not contain NaNs and spark.rapids.sql.hasNans must be false.
PS
Input must not contain NaNs and spark.rapids.sql.hasNans must be false.
SPS
UTC is only supported TZ for TIMESTAMP
SS*SNSPS
max DECIMAL precision of 18
S NS NS NSNS NS NS
S S SS*PS
UTC is only supported TZ for TIMESTAMP
SNSPS
max DECIMAL precision of 18
S NS NS NSNS NS NS
reductioninputSSPivotFirst PivotFirst operatorNoneaggregationpivotColumn S S S S SPS
Input must not contain NaNs and spark.rapids.sql.hasNans must be false.
PS
Input must not contain NaNs and spark.rapids.sql.hasNans must be false.
SS*PS
UTC is only supported TZ for TIMESTAMP
SNSPS
max DECIMAL precision of 18
S NS NSNS
resultvalueColumn S S SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SNSPS
max DECIMAL precision of 18
S NS NSNS
windowinputNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
resultNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
Max`max`Max aggregate operatorNoneaggregationinput S S SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SNSPS
max DECIMAL precision of 18
S NS NSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NS NS NS
resultSreductionpivotColumn S S S S SPS
Input must not contain NaNs and spark.rapids.sql.hasNans must be false.
PS
Input must not contain NaNs and spark.rapids.sql.hasNans must be false.
SPS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S NSS NS NS NS NS NS
reductioninputvalueColumn S S SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SNSPS
max DECIMAL precision of 18
S NS NS NS NS NS NS
S S SS*PS
UTC is only supported TZ for TIMESTAMP
SNSPS
max DECIMAL precision of 18
S NS NSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT
NSNSNS
StddevPop`stddev_pop`Aggregation computing population standard deviationNonereductioninput NS
result NS NSNS
aggregationinput S
result S
window inputSSSSSSSSS*SS*SNSNSNS NSNS
resultSSSSSSSSS*SS*SNSNSNS NSNS
ExpressionUDT
Min`min`Min aggregate operatorStddevSamp`stddev_samp`, `std`, `stddev`Aggregation computing sample standard deviation Noneaggregationreduction inputSSSSSSSSS*SNSSNSNSNS NSNS
resultSSSSSSSSS*SNSSNSNSNS NSNS
reductionaggregation input SSSSSSSSS*SNSSNSNSNS NSNS
result SSSSSSSSS*SNSSNSNSNS NSNS
window inputSSSSSSSSS*SS*SNSNSNS NSNS
resultSSSSSSSSS*SS*SNSNS NS NSNS
PivotFirst PivotFirst operatorSum`sum`Sum aggregate operator NoneaggregationpivotColumnSSSSaggregationinput S S S SS* SS* S NSNSNSNSNSNS
valueColumnSSresult S S NS
reductioninput S S S SS* SS* S NSNSNSNSNSNS
result S SSSSSSSS*SS*SNSNSPS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT)NSNS NS
reductionpivotColumnSSwindowinput S S S S S SS* PS
max DECIMAL precision of 18
result SS* S PS
max DECIMAL precision of 18
VariancePop`var_pop`Aggregation computing population varianceNonereductioninput NS
result NSNSNSNSNS
valueColumnSSSSSSSSS*SS*aggregationinput SNSNSNSNSNSNS
result SSSSSSSSS*SS*SNSNSPS* (missing nested NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT)NSNSNS
Sum`sum`Sum aggregate operatorNoneaggregationwindow input SSSSSS
result S S NS NS
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
VarianceSamp`var_samp`, `variance`Aggregation computing sample varianceNone reduction input SSSSSS
result S S NS NS
windowaggregation input SSSSS S S* S S S*
ExpressionSQL Functions(s)DescriptionNotesContextParam/OutputBOOLEANBYTESHORTINTLONGFLOATDOUBLEDATETIMESTAMPSTRINGDECIMALNULLBINARYCALENDARARRAYMAPSTRUCTUDT
NormalizeNaNAndZero Normalize NaN and zeroNoneprojectwindow input SS NS SS NS
lambdaNormalizeNaNAndZero Normalize NaN and zeroNoneproject input NSNSSS NSNSSS S S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S
HiveGenericUDF Hive Generic UDF, support requires the UDF to implement a RAPIDS accelerated interfaceNoneHiveGenericUDF Hive Generic UDF, support requires the UDF to implement a RAPIDS accelerated interfaceNone project param SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S S SPS* (missing nested UDT)PS* (missing nested UDT)PS* (missing nested UDT)PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS
S S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S S SPS* (missing nested UDT)PS* (missing nested UDT)PS* (missing nested UDT)NS
lambdaparamNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
resultNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS
HiveSimpleUDF Hive UDF, support requires the UDF to implement a RAPIDS accelerated interfaceNoneHiveSimpleUDF Hive UDF, support requires the UDF to implement a RAPIDS accelerated interfaceNone project param SS S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S S SPS* (missing nested UDT)PS* (missing nested UDT)PS* (missing nested UDT)PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS
S S SS*PS
UTC is only supported TZ for TIMESTAMP
SS*PS
max DECIMAL precision of 18
S S SPS* (missing nested UDT)PS* (missing nested UDT)PS* (missing nested UDT)NS
lambdaparamNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNS
resultNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSPS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types UDT
NS
-* as was state previously Decimal is only supported up to a precision of -18 and Timestamp is only supported in the -UTC time zone. Decimals are off by default due to performance impact in -some cases. ## Casting The above table does not show what is and is not supported for cast. @@ -20020,7 +16198,7 @@ and the accelerator produces the same result. S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S NS @@ -20041,9 +16219,9 @@ and the accelerator produces the same result. S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S @@ -20062,9 +16240,9 @@ and the accelerator produces the same result. S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S @@ -20083,9 +16261,9 @@ and the accelerator produces the same result. S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S @@ -20104,9 +16282,9 @@ and the accelerator produces the same result. S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S @@ -20125,9 +16303,9 @@ and the accelerator produces the same result. S S -S* -S -S* +PS
UTC is only supported TZ for TIMESTAMP
+PS
Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true.
+PS
max DECIMAL precision of 18
@@ -20146,9 +16324,9 @@ and the accelerator produces the same result. S S -S* -S -S* +PS
UTC is only supported TZ for TIMESTAMP
+PS
Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true.
+PS
max DECIMAL precision of 18
@@ -20167,7 +16345,7 @@ and the accelerator produces the same result. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S NS @@ -20188,7 +16366,7 @@ and the accelerator produces the same result. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S NS @@ -20209,9 +16387,9 @@ and the accelerator produces the same result. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS @@ -20223,16 +16401,16 @@ and the accelerator produces the same result. DECIMAL NS -NS -NS -NS -NS -NS -NS +S +S +S +S +S +S NS S -S* +PS
max DECIMAL precision of 18
@@ -20251,7 +16429,7 @@ and the accelerator produces the same result. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S NS S @@ -20320,7 +16498,7 @@ and the accelerator produces the same result. -PS (missing nested BOOLEAN, BYTE, SHORT, LONG, DATE, TIMESTAMP, STRING, DECIMAL, NULL, BINARY, CALENDAR, MAP, STRUCT, UDT) +PS
The array's child type must also support being cast to the desired child type;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
@@ -20342,7 +16520,7 @@ and the accelerator produces the same result. -NS +PS
the map's key and value must also support being cast to the desired child types;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
@@ -20357,14 +16535,14 @@ and the accelerator produces the same result. -PS (the struct's children must also support being cast to string) +PS
the struct's children must also support being cast to string
-NS +PS
the struct's children must also support being cast to the desired child type(s);
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
@@ -20424,7 +16602,7 @@ and the accelerator produces the same result. S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S NS @@ -20445,9 +16623,9 @@ and the accelerator produces the same result. S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S @@ -20466,9 +16644,9 @@ and the accelerator produces the same result. S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S @@ -20487,9 +16665,9 @@ and the accelerator produces the same result. S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S @@ -20508,9 +16686,9 @@ and the accelerator produces the same result. S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S @@ -20529,9 +16707,9 @@ and the accelerator produces the same result. S S -S* -S -S* +PS
UTC is only supported TZ for TIMESTAMP
+PS
Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true.
+PS
max DECIMAL precision of 18
@@ -20550,9 +16728,9 @@ and the accelerator produces the same result. S S -S* -S -S* +PS
UTC is only supported TZ for TIMESTAMP
+PS
Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true.
+PS
max DECIMAL precision of 18
@@ -20571,7 +16749,7 @@ and the accelerator produces the same result. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S NS @@ -20592,7 +16770,7 @@ and the accelerator produces the same result. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S NS @@ -20613,9 +16791,9 @@ and the accelerator produces the same result. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS @@ -20627,16 +16805,16 @@ and the accelerator produces the same result. DECIMAL NS -NS -NS -NS -NS -NS -NS +S +S +S +S +S +S NS S -S* +PS
max DECIMAL precision of 18
@@ -20655,7 +16833,7 @@ and the accelerator produces the same result. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S NS S @@ -20724,7 +16902,7 @@ and the accelerator produces the same result. -PS (missing nested BOOLEAN, BYTE, SHORT, LONG, DATE, TIMESTAMP, STRING, DECIMAL, NULL, BINARY, CALENDAR, MAP, STRUCT, UDT) +PS
The array's child type must also support being cast to the desired child type;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
@@ -20746,7 +16924,7 @@ and the accelerator produces the same result. -NS +PS
the map's key and value must also support being cast to the desired child types;
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
@@ -20761,14 +16939,14 @@ and the accelerator produces the same result. -PS (the struct's children must also support being cast to string) +PS
the struct's children must also support being cast to string
-NS +PS
the struct's children must also support being cast to the desired child type(s);
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types CALENDAR, UDT
@@ -20841,15 +17019,15 @@ as `a` don't show up in the table. They are controlled by the rules for S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS NS NS -PS* (missing nested BINARY, CALENDAR, ARRAY, MAP, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT
NS @@ -20865,15 +17043,15 @@ as `a` don't show up in the table. They are controlled by the rules for S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
S NS NS NS -PS* (Only supported for a single partition; missing nested BINARY, CALENDAR, ARRAY, STRUCT, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, CALENDAR, ARRAY, UDT
NS @@ -20965,7 +17143,7 @@ dates or timestamps, or for a lack of type coercion support. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S NS @@ -21008,15 +17186,15 @@ dates or timestamps, or for a lack of type coercion support. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -NS +PS
max DECIMAL precision of 18
NS -PS* (missing nested DECIMAL, BINARY, MAP, STRUCT, UDT) -NS -NS +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, UDT
NS @@ -21029,7 +17207,7 @@ dates or timestamps, or for a lack of type coercion support. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S NS @@ -21051,15 +17229,15 @@ dates or timestamps, or for a lack of type coercion support. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
NS -PS* (missing nested BINARY, UDT) -PS* (missing nested BINARY, UDT) -PS* (missing nested BINARY, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, UDT
NS @@ -21072,15 +17250,15 @@ dates or timestamps, or for a lack of type coercion support. S S S -S* +PS
UTC is only supported TZ for TIMESTAMP
S -S* +PS
max DECIMAL precision of 18
NS -PS* (missing nested BINARY, MAP, UDT) -NS -PS* (missing nested BINARY, MAP, UDT) +PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, UDT
+PS
max child DECIMAL precision of 18;
UTC is only supported TZ for child TIMESTAMP;
unsupported child types BINARY, UDT
NS - + \ No newline at end of file From ff8f28e0374ef81ac8c7ed0a7bf84eb36f6feaac Mon Sep 17 00:00:00 2001 From: Rodney Howeedy Date: Fri, 22 Oct 2021 09:49:39 -0600 Subject: [PATCH 4/9] Update documentation for spark-rapids 21.10 release to the gh-pages. Signed-off-by: Rodney Howeedy --- .../qualification-profiling-tools.md | 526 ------------------ 1 file changed, 526 deletions(-) delete mode 100644 docs/additional-functionality/qualification-profiling-tools.md diff --git a/docs/additional-functionality/qualification-profiling-tools.md b/docs/additional-functionality/qualification-profiling-tools.md deleted file mode 100644 index e61d0ac4574..00000000000 --- a/docs/additional-functionality/qualification-profiling-tools.md +++ /dev/null @@ -1,526 +0,0 @@ ---- -layout: page -title: Spark Profiling tool -nav_order: 9 ---- -# Spark Profiling tool - -The Profiling tool analyzes both CPU or GPU generated event logs and generates information -which can be used for debugging and profiling Apache Spark applications. -The output information contains the Spark version, executor details, properties, etc. - -* TOC -{:toc} - -## How to use the Profiling tool - -### Prerequisites -- Java 8 or above, Spark 3.0.1+ jars -- Spark event log(s) from Spark 2.0 or above version. Supports both rolled and compressed event logs - with `.lz4`, `.lzf`, `.snappy` and `.zstd` suffixes as well as - Databricks-specific rolled and compressed(.gz) event logs. -- The tool does not support nested directories. - Event log files or event log directories should be at the top level when specifying a directory. - -Note: Spark event logs can be downloaded from Spark UI using a "Download" button on the right side, -or can be found in the location specified by `spark.eventLog.dir`. See the -[Apache Spark Monitoring](http://spark.apache.org/docs/latest/monitoring.html) documentation for -more information. - -### Step 1 Download the tools jar and Apache Spark 3 distribution -The Profiling tool requires the Spark 3.x jars to be able to run but do not need an Apache Spark run time. -If you do not already have Spark 3.x installed, -you can download the Spark distribution to any machine and include the jars in the classpath. -- Download the jar file from [Maven repository](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/21.10.0/) -- [Download Apache Spark 3.x](http://spark.apache.org/downloads.html) - Spark 3.1.1 for Apache Hadoop is recommended -If you want to compile the jars, please refer to the instructions [here](./spark-qualification-tool.md#How-to-compile-the-tools-jar). - -### Step 2 How to run the Profiling tool -This tool parses the Spark CPU or GPU event log(s) and creates an output report. -We need to extract the Spark distribution into a local directory if necessary. -Either set `SPARK_HOME` to point to that directory or just put the path inside of the -classpath `java -cp toolsJar:pathToSparkJars/*:...` when you run the Profiling tool. -Acceptable input event log paths are files or directories containing spark events logs -in the local filesystem, HDFS, S3 or mixed. -Please note, if processing a lot of event logs use combined or compare mode. -Both these modes may need you to increase the java heap size using `-Xmx` option. -For instance, to specify 30 GB heap size `java -Xmx30g`. - -There are 3 modes of operation for the Profiling tool: - 1. Collection Mode: - Collection mode is the default mode when no other options are specified it simply collects information - on each application individually and outputs a file per application - - ```bash - Usage: java -cp rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/* - com.nvidia.spark.rapids.tool.profiling.ProfileMain [options] - - ``` - - 2. Combined Mode: - Combined mode is collection mode but then combines all the applications - together and you get one file for all applications. - - ```bash - Usage: java -cp rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/* - com.nvidia.spark.rapids.tool.profiling.ProfileMain --combined - - ``` - - 3. Compare Mode: - Compare mode will combine all the applications information in the same tables into a single file - and also adds in tables to compare stages and sql ids across all of those applications. - The Compare mode will use more memory if comparing lots of applications. - - ```bash - Usage: java -cp rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/* - com.nvidia.spark.rapids.tool.profiling.ProfileMain --compare - - ``` - Note that if you are on an HDFS cluster the default filesystem is likely HDFS for both the input and output - so if you want to point to the local filesystem be sure to include `file:` in the path. - - Example running on files in HDFS: (include $HADOOP_CONF_DIR in classpath) - - ```bash - java -cp ~/rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/*:$HADOOP_CONF_DIR/ \ - com.nvidia.spark.rapids.tool.profiling.ProfileMain /eventlogDir - ``` - -## Understanding Profiling tool detailed output and examples -The default output location is the current directory. -The output location can be changed using the `--output-directory` option. -The output goes into a sub-directory named `rapids_4_spark_profile/` inside that output location. -If running in normal collect mode, it processes event log individually and outputs files for each application under -a directory named `rapids_4_spark_profile/{APPLICATION_ID}`. It creates a summary text file named `profile.log`. -If running combine mode the output is put under a directory named `rapids_4_spark_profile/combined/` and creates a summar -text file named `rapids_4_spark_tools_combined.log`. -If running compare mode the output is put under a directory named `rapids_4_spark_profile/compare/` and creates a summary -text file named `rapids_4_spark_tools_compare.log`. -The output will go into your default filesystem, it supports local filesystem or HDFS. -Note that if you are on an HDFS cluster the default filesystem is likely HDFS for both the input and output -so if you want to point to the local filesystem be sure to include `file:` in the path. -There are separate files that are generated under the same sub-directory when using the options to generate query -visualizations or printing the SQL plans. -Optionally if the `--csv` option is specified then it creates a csv file for each table for each application in the -corresponding sub-directory. - -There is a 100 characters limit for each output column. -If the result of the column exceeds this limit, it is suffixed with ... for that column. - -ResourceProfile ids are parsed for the event logs that are from Spark 3.1 or later. -A ResourceProfile allows the user to specify executor and task requirements -for an RDD that will get applied during a stage. -This allows the user to change the resource requirements between stages. - -Run `--help` for more information. - -#### A. Collect Information or Compare Information(if more than 1 event logs are as input and option -c is specified) -- Application information -- Data Source information -- Executors information -- Job, stage and SQL ID information -- Rapids related parameters -- Rapids Accelerator Jar and cuDF Jar -- SQL Plan Metrics -- Compare Mode: Matching SQL IDs Across Applications -- Compare Mode: Matching Stage IDs Across Applications -- Optionally : SQL Plan for each SQL query -- Optionally : Generates DOT graphs for each SQL query -- Optionally : Generates timeline graph for application - -For example, GPU run vs CPU run performance comparison or different runs with different parameters. - -We can input multiple Spark event logs and this tool can compare environments, executors, Rapids related Spark parameters, - -- Compare the durations/versions/gpuMode on or off: - - -``` -### A. Information Collected ### -Application Information: - -+--------+-----------+-----------------------+---------+-------------+-------------+--------+-----------+------------+------------- -|appIndex|appName |appId |sparkUser|startTime |endTime |duration|durationStr|sparkVersion|pluginEnabled| -+--------+-----------+-----------------------+---------+-------------+-------------+--------+-----------+------------+------------- -|1 |Spark shell|app-20210329165943-0103|user1 |1617037182848|1617037490515|307667 |5.1 min |3.0.1 |false | -|2 |Spark shell|app-20210329170243-0018|user1 |1617037362324|1617038578035|1215711 |20 min |3.0.1 |true | -+--------+-----------+-----------------------+---------+-------------+-------------+--------+-----------+------------+-------------+ -``` - -- Executor information: - -``` -Executor Information: -+--------+-----------------+------------+-------------+-----------+------------+-------------+--------------+------------------+---------------+-------+-------+ -|appIndex|resourceProfileId|numExecutors|executorCores|maxMem |maxOnHeapMem|maxOffHeapMem|executorMemory|numGpusPerExecutor|executorOffHeap|taskCpu|taskGpu| -+--------+-----------------+------------+-------------+-----------+------------+-------------+--------------+------------------+---------------+-------+-------+ -|1 |0 |1 |4 |11264537395|11264537395 |0 |20480 |1 |0 |1 |0.0 | -|1 |1 |2 |2 |3247335014 |3247335014 |0 |6144 |2 |0 |2 |2.0 | -+--------+-----------------+------------+-------------+-----------+------------+-------------+-------------+--------------+------------------+---------------+-------+-------+ -``` - -- Data Source information -The details of this output differ between using a Spark Data Source V1 and Data Source V2 reader. -The Data Source V2 truncates the schema, so if you see `...`, then -the full schema is not available. - -``` -Data Source Information: -+--------+-----+-------+---------------------------------------------------------------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------------------------+ -|appIndex|sqlID|format |location |pushedFilters |schema | -+--------+-----+-------+---------------------------------------------------------------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------------------------+ -|1 |0 |Text |InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/integration_tests/src/test/resources/trucks-comments.csv]|[] |value:string | -|1 |1 |csv |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/integration_tests/src/test/re... |PushedFilters: []|_c0:string | -|1 |2 |parquet|Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/lotscolumnsout] |PushedFilters: []|loan_id:bigint,monthly_reporting_period:string,servicer:string,interest_rate:double,curren...| -|1 |3 |parquet|Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/lotscolumnsout] |PushedFilters: []|loan_id:bigint,monthly_reporting_period:string,servicer:string,interest_rate:double,curren...| -|1 |4 |orc |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/logscolumsout.orc] |PushedFilters: []|loan_id:bigint,monthly_reporting_period:string,servicer:string,interest_rate:double,curren...| -|1 |5 |orc |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/logscolumsout.orc] |PushedFilters: []|loan_id:bigint,monthly_reporting_period:string,servicer:string,interest_rate:double,curren...| -|1 |6 |json |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/lotsofcolumnsout.json] |PushedFilters: []|adj_remaining_months_to_maturity:double,asset_recovery_costs:double,credit_enhancement_pro...| -|1 |7 |json |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/lotsofcolumnsout.json] |PushedFilters: []|adj_remaining_months_to_maturity:double,asset_recovery_costs:double,credit_enhancement_pro...| -|1 |8 |json |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/lotsofcolumnsout.json] |PushedFilters: []|adj_remaining_months_to_maturity:double,asset_recovery_costs:double,credit_enhancement_pro...| -+--------+-----+-------+---------------------------------------------------------------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------------------------+ -``` - -- Matching SQL IDs Across Applications: - -``` -Matching SQL IDs Across Applications: -+-----------------------+-----------------------+ -|app-20210329165943-0103|app-20210329170243-0018| -+-----------------------+-----------------------+ -|0 |0 | -|1 |1 | -|2 |2 | -|3 |3 | -|4 |4 | -+-----------------------+-----------------------+ -``` - -There is one column per application. There is a row per SQL ID. The SQL IDs are matched -primarily on the structure of the SQL query run, and then on the order in which they were -run. Be aware that this is truly the structure of the query. Two queries that do similar -things, but on different data are likely to match as the same. An effort is made to -also match between CPU plans and GPU plans so in most cases the same query run on the -CPU and on the GPU will match. - -- Matching Stage IDs Across Applications: - -``` -Matching Stage IDs Across Applications: -+-----------------------+-----------------------+ -|app-20210329165943-0103|app-20210329170243-0018| -+-----------------------+-----------------------+ -|31 |31 | -|32 |32 | -|33 |33 | -|39 |38 | -|40 |40 | -|41 |41 | -+-----------------------+-----------------------+ -``` - -There is one column per application. There is a row per stage ID. If a SQL query matches -between applications, see Matching SQL IDs Across Applications, then an attempt is made -to match stages within that application to each other. This has the same issues with -stages when generating a dot graph. This can be especially helpful when trying to compare -large queries and Spark happened to assign the stage IDs slightly differently, or in some -cases there are a different number of stages because of slight differences in the plan. This -is a best effort, and it is not guaranteed to match up all stages in a plan. - -- Compare Rapids related Spark properties side-by-side: - -``` -Compare Rapids Properties which are set explicitly: -+-------------------------------------------+----------+----------+ -|propertyName |appIndex_1|appIndex_2| -+-------------------------------------------+----------+----------+ -|spark.rapids.memory.pinnedPool.size |null |2g | -|spark.rapids.sql.castFloatToDecimal.enabled|null |true | -|spark.rapids.sql.concurrentGpuTasks |null |2 | -|spark.rapids.sql.decimalType.enabled |null |true | -|spark.rapids.sql.enabled |false |true | -|spark.rapids.sql.explain |null |NOT_ON_GPU| -|spark.rapids.sql.hasNans |null |FALSE | -|spark.rapids.sql.incompatibleOps.enabled |null |true | -|spark.rapids.sql.variableFloatAgg.enabled |null |TRUE | -+-------------------------------------------+----------+----------+ -``` - -- List rapids-4-spark and cuDF jars based on classpath: - -``` -Rapids Accelerator Jar and cuDF Jar: -+--------+------------------------------------------------------------+ -|appIndex|Rapids4Spark jars | -+--------+------------------------------------------------------------+ -|1 |spark://10.10.10.10:43445/jars/cudf-0.19.2-cuda11.jar | -|1 |spark://10.10.10.10:43445/jars/rapids-4-spark_2.12-0.5.0.jar| -|2 |spark://10.10.10.11:41319/jars/cudf-0.19.2-cuda11.jar | -|2 |spark://10.10.10.11:41319/jars/rapids-4-spark_2.12-0.5.0.jar| -+--------+------------------------------------------------------------+ -``` - -- Job, stage and SQL ID information(not in `compare` mode yet): - -``` -+--------+-----+---------+-----+ -|appIndex|jobID|stageIds |sqlID| -+--------+-----+---------+-----+ -|1 |0 |[0] |null | -|1 |1 |[1,2,3,4]|0 | -+--------+-----+---------+-----+ -``` - -- SQL Plan Metrics for Application for each SQL plan node in each SQL: - -These are also called accumulables in Spark. - -``` -SQL Plan Metrics for Application: -+--------+-----+------+-----------------------------------------------------------+-------------+-----------------------+-------------+----------+ -|appIndex|sqlID|nodeID|nodeName |accumulatorId|name |max_value |metricType| -+--------+-----+------+-----------------------------------------------------------+-------------+-----------------------+-------------+----------+ -|1 |0 |1 |GpuColumnarExchange |111 |output rows |1111111111 |sum | -|1 |0 |1 |GpuColumnarExchange |112 |output columnar batches|222222 |sum | -|1 |0 |1 |GpuColumnarExchange |113 |data size |333333333333 |size | -|1 |0 |1 |GpuColumnarExchange |114 |shuffle bytes written |444444444444 |size | -|1 |0 |1 |GpuColumnarExchange |115 |shuffle records written|555555 |sum | -|1 |0 |1 |GpuColumnarExchange |116 |shuffle write time |666666666666 |nsTiming | -``` - -- Print SQL Plans (-p option): -Prints the SQL plan as a text string to a file named `planDescriptions.log`. -For example if your application id is app-20210507103057-0000, then the -filename will be `planDescriptions.log` - -- Generate DOT graph for each SQL (-g option): - -``` -Generated DOT graphs for app app-20210507103057-0000 to /path/. in 17 second(s) -``` - -Once the DOT file is generated, you can install [graphviz](http://www.graphviz.org) to convert the DOT file -as a graph in pdf format using below command: - -```bash -dot -Tpdf ./app-20210507103057-0000-query-0/0.dot > app-20210507103057-0000.pdf -``` - -Or to svg using - -```bash -dot -Tsvg ./app-20210507103057-0000-query-0/0.dot > app-20210507103057-0000.svg -``` - -The pdf or svg file has the SQL plan graph with metrics. The svg file will act a little -more like the Spark UI and include extra information for nodes when hovering over it with -a mouse. - -As a part of this an effort is made to associate parts of the graph with the Spark stage it is a -part of. This is not 100% accurate. Some parts of the plan like `TakeOrderedAndProject` may -be a part of multiple stages and only one of the stages will be selected. `Exchanges` are purposely -left out of the sections associated with a stage because they cover at least 2 stages and possibly -more. In other cases we may not be able to determine what stage something was a part of. In those -cases we mark it as `UNKNOWN STAGE`. This is because we rely on metrics to link a node to a stage. -If a stage hs no metrics, like if the query crashed early, we cannot establish that link. - -- Generate timeline for application (--generate-timeline option): - -The output of this is an [svg](https://en.wikipedia.org/wiki/Scalable_Vector_Graphics) file -named `timeline.svg`. Most web browsers can display this file. It is a -timeline view similar Apache Spark's -[event timeline](https://spark.apache.org/docs/latest/web-ui.html). - -This displays several data sections. - -1. **Tasks** This shows all tasks in the application divided by executor. Please note that this - tries to pack the tasks in the graph. It does not represent actual scheduling on CPU cores. - The tasks are labeled with the time it took for them to run, but there is no breakdown about - different aspects of each task, like there is in Spark's timeline. -2. **STAGES** This shows the stages times reported by Spark. It starts with when the stage was - scheduled and ends when Spark considered the stage done. -3. **STAGE RANGES** This shows the time from the start of the first task to the end of the last - task. Often a stage is scheduled, but there are not enough resources in the cluster to run it. - This helps to show. How long it takes for a task to start running after it is scheduled, and in - many cases how long it took to run all of the tasks in the stage. This is not always true because - Spark can intermix tasks from different stages. -4. **JOBS** This shows the time range reported by Spark from when a job was scheduled to when it - completed. -5. **SQL** This shows the time range reported by Spark from when a SQL statement was scheduled to - when it completed. - -Tasks and stages all are color coordinated to help know what tasks are associated with a given -stage. Jobs and SQL are not color coordinated. - -#### B. Analysis -- Job + Stage level aggregated task metrics -- SQL level aggregated task metrics -- SQL duration, application during, if it contains a Dataset operation, potential problems, executor CPU time percent -- Shuffle Skew Check: (When task's Shuffle Read Size > 3 * Avg Stage-level size) - -Below we will aggregate the task level metrics at different levels -to do some analysis such as detecting possible shuffle skew. - -- Job + Stage level aggregated task metrics: - -``` -### B. Analysis ### - -Job + Stage level aggregated task metrics: -+--------+-------+--------+--------+--------------------+------------+------------+------------+------------+-------------------+------------------------------+---------------------------+-------------------+---------------------+-------------------+---------------------+-------------+----------------------+-----------------------+-------------------------+-----------------------+---------------------------+--------------+--------------------+-------------------------+---------------------+--------------------------+----------------------+----------------------------+---------------------+-------------------+---------------------+----------------+ -|appIndex|ID |numTasks|Duration|diskBytesSpilled_sum|duration_sum|duration_max|duration_min|duration_avg|executorCPUTime_sum|executorDeserializeCPUTime_sum|executorDeserializeTime_sum|executorRunTime_sum|gettingResultTime_sum|input_bytesRead_sum|input_recordsRead_sum|jvmGCTime_sum|memoryBytesSpilled_sum|output_bytesWritten_sum|output_recordsWritten_sum|peakExecutionMemory_max|resultSerializationTime_sum|resultSize_max|sr_fetchWaitTime_sum|sr_localBlocksFetched_sum|sr_localBytesRead_sum|sr_remoteBlocksFetched_sum|sr_remoteBytesRead_sum|sr_remoteBytesReadToDisk_sum|sr_totalBytesRead_sum|sw_bytesWritten_sum|sw_recordsWritten_sum|sw_writeTime_sum| -+--------+-------+--------+--------+--------------------+------------+------------+------------+------------+-------------------+------------------------------+---------------------------+-------------------+---------------------+-------------------+---------------------+-------------+----------------------+-----------------------+-------------------------+-----------------------+---------------------------+--------------+--------------------+-------------------------+---------------------+--------------------------+----------------------+----------------------------+---------------------+-------------------+---------------------+----------------+ -|1 |job_0 |3333 |222222 |0 |11111111 |111111 |111 |1111.1 |6666666 |55555 |55555 |55555555 |0 |222222222222 |22222222222 |111111 |0 |0 |0 |222222222 |1 |11111 |11111 |99999 |22222222222 |2222221 |222222222222 |0 |222222222222 |222222222222 |5555555 |444444 | -``` - - -- SQL level aggregated task metrics: - -``` -SQL level aggregated task metrics: -+--------+------------------------------+-----+--------------------+--------+--------+---------------+---------------+----------------+--------------------+------------+------------+------------+------------+-------------------+------------------------------+---------------------------+-------------------+---------------------+-------------------+---------------------+-------------+----------------------+-----------------------+-------------------------+-----------------------+---------------------------+--------------+--------------------+-------------------------+---------------------+--------------------------+----------------------+----------------------------+---------------------+-------------------+---------------------+----------------+ -|appIndex|appID |sqlID|description |numTasks|Duration|executorCPUTime|executorRunTime|executorCPURatio|diskBytesSpilled_sum|duration_sum|duration_max|duration_min|duration_avg|executorCPUTime_sum|executorDeserializeCPUTime_sum|executorDeserializeTime_sum|executorRunTime_sum|gettingResultTime_sum|input_bytesRead_sum|input_recordsRead_sum|jvmGCTime_sum|memoryBytesSpilled_sum|output_bytesWritten_sum|output_recordsWritten_sum|peakExecutionMemory_max|resultSerializationTime_sum|resultSize_max|sr_fetchWaitTime_sum|sr_localBlocksFetched_sum|sr_localBytesRead_sum|sr_remoteBlocksFetched_sum|sr_remoteBytesRead_sum|sr_remoteBytesReadToDisk_sum|sr_totalBytesRead_sum|sw_bytesWritten_sum|sw_recordsWritten_sum|sw_writeTime_sum| -+--------+------------------------------+-----+--------------------+--------+--------+---------------+---------------+----------------+--------------------+------------+------------+------------+------------+-------------------+------------------------------+---------------------------+-------------------+---------------------+-------------------+---------------------+-------------+----------------------+-----------------------+-------------------------+-----------------------+---------------------------+--------------+--------------------+-------------------------+---------------------+--------------------------+----------------------+----------------------------+---------------------+-------------------+---------------------+----------------+ -|1 |application_1111111111111_0001|0 |show at :11|1111 |222222 |6666666 |55555555 |55.55 |0 |13333333 |111111 |999 |3333.3 |6666666 |55555 |66666 |11111111 |0 |111111111111 |11111111111 |111111 |0 |0 |0 |888888888 |8 |11111 |11111 |99999 |11111111111 |2222222 |222222222222 |0 |222222222222 |444444444444 |5555555 |444444 | -``` - -- SQL duration, application during, if it contains a Dataset operation, potential problems, executor CPU time percent: - -``` -SQL Duration and Executor CPU Time Percent -+--------+------------------------------+-----+------------+-------------------+------------+------------------+-------------------------+ -|appIndex|App ID |sqlID|SQL Duration|Contains Dataset Op|App Duration|Potential Problems|Executor CPU Time Percent| -+--------+------------------------------+-----+------------+-------------------+------------+------------------+-------------------------+ -|1 |application_1603128018386_7759|0 |11042 |false |119990 |null |68.48 | -+--------+------------------------------+-----+------------+-------------------+------------+------------------+-------------------------+ -``` - -- Shuffle Skew Check: - -``` -Shuffle Skew Check: (When task's Shuffle Read Size > 3 * Avg Stage-level size) -+--------+-------+--------------+------+-------+---------------+--------------+-----------------+----------------+----------------+----------+----------------------------------------------------------------------------------------------------+ -|appIndex|stageId|stageAttemptId|taskId|attempt|taskDurationSec|avgDurationSec|taskShuffleReadMB|avgShuffleReadMB|taskPeakMemoryMB|successful|reason | -+--------+-------+--------------+------+-------+---------------+--------------+-----------------+----------------+----------------+----------+----------------------------------------------------------------------------------------------------+ -|1 |2 |0 |2222 |0 |111.11 |7.7 |2222.22 |111.11 |0.01 |false |ExceptionFailure(ai.rapids.cudf.CudfException,cuDF failure at: /dddd/xxxxxxx/ccccc/bbbbbbbbb/aaaaaaa| -|1 |2 |0 |2224 |1 |222.22 |8.8 |3333.33 |111.11 |0.01 |false |ExceptionFailure(ai.rapids.cudf.CudfException,cuDF failure at: /dddd/xxxxxxx/ccccc/bbbbbbbbb/aaaaaaa| -+--------+-------+--------------+------+-------+---------------+--------------+-----------------+----------------+----------------+----------+----------------------------------------------------------------------------------------------------+ -``` - -#### C. Health Check -- List failed tasks, stages and jobs -- Removed BlockManagers and Executors -- SQL Plan HealthCheck - -Below are examples. -- Print failed tasks: - -``` -Failed tasks: -+--------+-------+--------------+------+-------+----------------------------------------------------------------------------------------------------+ -|appIndex|stageId|stageAttemptId|taskId|attempt|failureReason | -+--------+-------+--------------+------+-------+----------------------------------------------------------------------------------------------------+ -|3 |4 |0 |2842 |0 |ExceptionFailure(ai.rapids.cudf.CudfException,cuDF failure at: /home/jenkins/agent/workspace/jenkins| -|3 |4 |0 |2858 |0 |TaskKilled(another attempt succeeded,List(AccumulableInfo(453,None,Some(22000),None,false,true,None)| -|3 |4 |0 |2884 |0 |TaskKilled(another attempt succeeded,List(AccumulableInfo(453,None,Some(21148),None,false,true,None)| -|3 |4 |0 |2908 |0 |TaskKilled(another attempt succeeded,List(AccumulableInfo(453,None,Some(20420),None,false,true,None)| -|3 |4 |0 |3410 |1 |ExceptionFailure(ai.rapids.cudf.CudfException,cuDF failure at: /home/jenkins/agent/workspace/jenkins| -|4 |1 |0 |1948 |1 |TaskKilled(another attempt succeeded,List(AccumulableInfo(290,None,Some(1107),None,false,true,None),| -+--------+-------+--------------+------+-------+----------------------------------------------------------------------------------------------------+ -``` - -- Print failed stages: - -``` -Failed stages: -+--------+-------+---------+-------------------------------------+--------+---------------------------------------------------+ -|appIndex|stageId|attemptId|name |numTasks|failureReason | -+--------+-------+---------+-------------------------------------+--------+---------------------------------------------------+ -|3 |4 |0 |attachTree at Spark300Shims.scala:624|1000 |Job 0 cancelled as part of cancellation of all jobs| -+--------+-------+---------+-------------------------------------+--------+---------------------------------------------------+ -``` - -- Print failed jobs: - -``` -Failed jobs: -+--------+-----+---------+------------------------------------------------------------------------+ -|appIndex|jobID|jobResult|failureReason | -+--------+-----+---------+------------------------------------------------------------------------+ -|3 |0 |JobFailed|java.lang.Exception: Job 0 cancelled as part of cancellation of all j...| -+--------+-----+---------+------------------------------------------------------------------------+ -``` - -- SQL Plan HealthCheck: - - Prints possibly unsupported query plan nodes such as `$Lambda` key word means dataset API. - -``` -+--------+-----+------+--------+---------------------------------------------------------------------------------------------------+ -|appIndex|sqlID|nodeID|nodeName|nodeDescription | -+--------+-----+------+--------+---------------------------------------------------------------------------------------------------+ -|3 |1 |8 |Filter |Filter $line21.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$Lambda$4578/0x00000008019f1840@4b63e04c.apply| -+--------+-----+------+--------+---------------------------------------------------------------------------------------------------+ -``` - -## Profiling tool options - -```bash -RAPIDS Accelerator for Apache Spark Profiling tool - -Usage: java -cp rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/* - com.nvidia.spark.rapids.tool.profiling.ProfileMain [options] - - - --combined Collect mode but combine all applications into - the same tables. - -c, --compare Compare Applications (Note this may require - more memory if comparing a large number of - applications). Default is false. - --csv Output each table to a CSV file as well - creating the summary text file. - -f, --filter-criteria Filter newest or oldest N eventlogs for - processing.eg: 100-newest-filesystem (for - processing newest 100 event logs). eg: - 100-oldest-filesystem (for processing oldest - 100 event logs) - -g, --generate-dot Generate query visualizations in DOT format. - Default is false - --generate-timeline Write an SVG graph out for the full - application timeline. - -m, --match-event-logs Filter event logs whose filenames contain the - input string - -n, --num-output-rows Number of output rows for each Application. - Default is 1000 - --num-threads Number of thread to use for parallel - processing. The default is the number of cores - on host divided by 4. - -o, --output-directory Base output directory. Default is current - directory for the default filesystem. The - final output will go into a subdirectory - called rapids_4_spark_profile. It will - overwrite any existing files with the same - name. - -p, --print-plans Print the SQL plans to a file named - 'planDescriptions.log'. - Default is false. - -t, --timeout Maximum time in seconds to wait for the event - logs to be processed. Default is 24 hours - (86400 seconds) and must be greater than 3 - seconds. If it times out, it will report what - it was able to process up until the timeout. - -h, --help Show help message - - trailing arguments: - eventlog (required) Event log filenames(space separated) or directories - containing event logs. eg: s3a:///eventlog1 - /path/to/eventlog2 -``` - -## Profiling tool metrics definitions - -All the metrics definitions can be found in the -[executor task metrics doc](https://spark.apache.org/docs/latest/monitoring.html#executor-task-metrics) / -[executor metrics doc](https://spark.apache.org/docs/latest/monitoring.html#executor-metrics) or -the [SPARK webUI doc](https://spark.apache.org/docs/latest/web-ui.html#content). \ No newline at end of file From 451f65d7b10f6143af56fd255d22ad7a16194fac Mon Sep 17 00:00:00 2001 From: Rodney Howeedy Date: Fri, 22 Oct 2021 13:33:16 -0600 Subject: [PATCH 5/9] Update and rename profiling-tool documentation for spark-rapids 21.10 release of gh-pages. Signed-off-by: Rodney Howeedy --- docs/spark-profiling-tool.md | 526 +++++++++++++++++++++++++++++++++++ 1 file changed, 526 insertions(+) create mode 100644 docs/spark-profiling-tool.md diff --git a/docs/spark-profiling-tool.md b/docs/spark-profiling-tool.md new file mode 100644 index 00000000000..e61d0ac4574 --- /dev/null +++ b/docs/spark-profiling-tool.md @@ -0,0 +1,526 @@ +--- +layout: page +title: Spark Profiling tool +nav_order: 9 +--- +# Spark Profiling tool + +The Profiling tool analyzes both CPU or GPU generated event logs and generates information +which can be used for debugging and profiling Apache Spark applications. +The output information contains the Spark version, executor details, properties, etc. + +* TOC +{:toc} + +## How to use the Profiling tool + +### Prerequisites +- Java 8 or above, Spark 3.0.1+ jars +- Spark event log(s) from Spark 2.0 or above version. Supports both rolled and compressed event logs + with `.lz4`, `.lzf`, `.snappy` and `.zstd` suffixes as well as + Databricks-specific rolled and compressed(.gz) event logs. +- The tool does not support nested directories. + Event log files or event log directories should be at the top level when specifying a directory. + +Note: Spark event logs can be downloaded from Spark UI using a "Download" button on the right side, +or can be found in the location specified by `spark.eventLog.dir`. See the +[Apache Spark Monitoring](http://spark.apache.org/docs/latest/monitoring.html) documentation for +more information. + +### Step 1 Download the tools jar and Apache Spark 3 distribution +The Profiling tool requires the Spark 3.x jars to be able to run but do not need an Apache Spark run time. +If you do not already have Spark 3.x installed, +you can download the Spark distribution to any machine and include the jars in the classpath. +- Download the jar file from [Maven repository](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/21.10.0/) +- [Download Apache Spark 3.x](http://spark.apache.org/downloads.html) - Spark 3.1.1 for Apache Hadoop is recommended +If you want to compile the jars, please refer to the instructions [here](./spark-qualification-tool.md#How-to-compile-the-tools-jar). + +### Step 2 How to run the Profiling tool +This tool parses the Spark CPU or GPU event log(s) and creates an output report. +We need to extract the Spark distribution into a local directory if necessary. +Either set `SPARK_HOME` to point to that directory or just put the path inside of the +classpath `java -cp toolsJar:pathToSparkJars/*:...` when you run the Profiling tool. +Acceptable input event log paths are files or directories containing spark events logs +in the local filesystem, HDFS, S3 or mixed. +Please note, if processing a lot of event logs use combined or compare mode. +Both these modes may need you to increase the java heap size using `-Xmx` option. +For instance, to specify 30 GB heap size `java -Xmx30g`. + +There are 3 modes of operation for the Profiling tool: + 1. Collection Mode: + Collection mode is the default mode when no other options are specified it simply collects information + on each application individually and outputs a file per application + + ```bash + Usage: java -cp rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/* + com.nvidia.spark.rapids.tool.profiling.ProfileMain [options] + + ``` + + 2. Combined Mode: + Combined mode is collection mode but then combines all the applications + together and you get one file for all applications. + + ```bash + Usage: java -cp rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/* + com.nvidia.spark.rapids.tool.profiling.ProfileMain --combined + + ``` + + 3. Compare Mode: + Compare mode will combine all the applications information in the same tables into a single file + and also adds in tables to compare stages and sql ids across all of those applications. + The Compare mode will use more memory if comparing lots of applications. + + ```bash + Usage: java -cp rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/* + com.nvidia.spark.rapids.tool.profiling.ProfileMain --compare + + ``` + Note that if you are on an HDFS cluster the default filesystem is likely HDFS for both the input and output + so if you want to point to the local filesystem be sure to include `file:` in the path. + + Example running on files in HDFS: (include $HADOOP_CONF_DIR in classpath) + + ```bash + java -cp ~/rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/*:$HADOOP_CONF_DIR/ \ + com.nvidia.spark.rapids.tool.profiling.ProfileMain /eventlogDir + ``` + +## Understanding Profiling tool detailed output and examples +The default output location is the current directory. +The output location can be changed using the `--output-directory` option. +The output goes into a sub-directory named `rapids_4_spark_profile/` inside that output location. +If running in normal collect mode, it processes event log individually and outputs files for each application under +a directory named `rapids_4_spark_profile/{APPLICATION_ID}`. It creates a summary text file named `profile.log`. +If running combine mode the output is put under a directory named `rapids_4_spark_profile/combined/` and creates a summar +text file named `rapids_4_spark_tools_combined.log`. +If running compare mode the output is put under a directory named `rapids_4_spark_profile/compare/` and creates a summary +text file named `rapids_4_spark_tools_compare.log`. +The output will go into your default filesystem, it supports local filesystem or HDFS. +Note that if you are on an HDFS cluster the default filesystem is likely HDFS for both the input and output +so if you want to point to the local filesystem be sure to include `file:` in the path. +There are separate files that are generated under the same sub-directory when using the options to generate query +visualizations or printing the SQL plans. +Optionally if the `--csv` option is specified then it creates a csv file for each table for each application in the +corresponding sub-directory. + +There is a 100 characters limit for each output column. +If the result of the column exceeds this limit, it is suffixed with ... for that column. + +ResourceProfile ids are parsed for the event logs that are from Spark 3.1 or later. +A ResourceProfile allows the user to specify executor and task requirements +for an RDD that will get applied during a stage. +This allows the user to change the resource requirements between stages. + +Run `--help` for more information. + +#### A. Collect Information or Compare Information(if more than 1 event logs are as input and option -c is specified) +- Application information +- Data Source information +- Executors information +- Job, stage and SQL ID information +- Rapids related parameters +- Rapids Accelerator Jar and cuDF Jar +- SQL Plan Metrics +- Compare Mode: Matching SQL IDs Across Applications +- Compare Mode: Matching Stage IDs Across Applications +- Optionally : SQL Plan for each SQL query +- Optionally : Generates DOT graphs for each SQL query +- Optionally : Generates timeline graph for application + +For example, GPU run vs CPU run performance comparison or different runs with different parameters. + +We can input multiple Spark event logs and this tool can compare environments, executors, Rapids related Spark parameters, + +- Compare the durations/versions/gpuMode on or off: + + +``` +### A. Information Collected ### +Application Information: + ++--------+-----------+-----------------------+---------+-------------+-------------+--------+-----------+------------+------------- +|appIndex|appName |appId |sparkUser|startTime |endTime |duration|durationStr|sparkVersion|pluginEnabled| ++--------+-----------+-----------------------+---------+-------------+-------------+--------+-----------+------------+------------- +|1 |Spark shell|app-20210329165943-0103|user1 |1617037182848|1617037490515|307667 |5.1 min |3.0.1 |false | +|2 |Spark shell|app-20210329170243-0018|user1 |1617037362324|1617038578035|1215711 |20 min |3.0.1 |true | ++--------+-----------+-----------------------+---------+-------------+-------------+--------+-----------+------------+-------------+ +``` + +- Executor information: + +``` +Executor Information: ++--------+-----------------+------------+-------------+-----------+------------+-------------+--------------+------------------+---------------+-------+-------+ +|appIndex|resourceProfileId|numExecutors|executorCores|maxMem |maxOnHeapMem|maxOffHeapMem|executorMemory|numGpusPerExecutor|executorOffHeap|taskCpu|taskGpu| ++--------+-----------------+------------+-------------+-----------+------------+-------------+--------------+------------------+---------------+-------+-------+ +|1 |0 |1 |4 |11264537395|11264537395 |0 |20480 |1 |0 |1 |0.0 | +|1 |1 |2 |2 |3247335014 |3247335014 |0 |6144 |2 |0 |2 |2.0 | ++--------+-----------------+------------+-------------+-----------+------------+-------------+-------------+--------------+------------------+---------------+-------+-------+ +``` + +- Data Source information +The details of this output differ between using a Spark Data Source V1 and Data Source V2 reader. +The Data Source V2 truncates the schema, so if you see `...`, then +the full schema is not available. + +``` +Data Source Information: ++--------+-----+-------+---------------------------------------------------------------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------------------------+ +|appIndex|sqlID|format |location |pushedFilters |schema | ++--------+-----+-------+---------------------------------------------------------------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------------------------+ +|1 |0 |Text |InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/integration_tests/src/test/resources/trucks-comments.csv]|[] |value:string | +|1 |1 |csv |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/integration_tests/src/test/re... |PushedFilters: []|_c0:string | +|1 |2 |parquet|Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/lotscolumnsout] |PushedFilters: []|loan_id:bigint,monthly_reporting_period:string,servicer:string,interest_rate:double,curren...| +|1 |3 |parquet|Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/lotscolumnsout] |PushedFilters: []|loan_id:bigint,monthly_reporting_period:string,servicer:string,interest_rate:double,curren...| +|1 |4 |orc |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/logscolumsout.orc] |PushedFilters: []|loan_id:bigint,monthly_reporting_period:string,servicer:string,interest_rate:double,curren...| +|1 |5 |orc |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/logscolumsout.orc] |PushedFilters: []|loan_id:bigint,monthly_reporting_period:string,servicer:string,interest_rate:double,curren...| +|1 |6 |json |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/lotsofcolumnsout.json] |PushedFilters: []|adj_remaining_months_to_maturity:double,asset_recovery_costs:double,credit_enhancement_pro...| +|1 |7 |json |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/lotsofcolumnsout.json] |PushedFilters: []|adj_remaining_months_to_maturity:double,asset_recovery_costs:double,credit_enhancement_pro...| +|1 |8 |json |Location: InMemoryFileIndex[file:/home/user1/workspace/spark-rapids-another/lotsofcolumnsout.json] |PushedFilters: []|adj_remaining_months_to_maturity:double,asset_recovery_costs:double,credit_enhancement_pro...| ++--------+-----+-------+---------------------------------------------------------------------------------------------------------------------------+-----------------+---------------------------------------------------------------------------------------------+ +``` + +- Matching SQL IDs Across Applications: + +``` +Matching SQL IDs Across Applications: ++-----------------------+-----------------------+ +|app-20210329165943-0103|app-20210329170243-0018| ++-----------------------+-----------------------+ +|0 |0 | +|1 |1 | +|2 |2 | +|3 |3 | +|4 |4 | ++-----------------------+-----------------------+ +``` + +There is one column per application. There is a row per SQL ID. The SQL IDs are matched +primarily on the structure of the SQL query run, and then on the order in which they were +run. Be aware that this is truly the structure of the query. Two queries that do similar +things, but on different data are likely to match as the same. An effort is made to +also match between CPU plans and GPU plans so in most cases the same query run on the +CPU and on the GPU will match. + +- Matching Stage IDs Across Applications: + +``` +Matching Stage IDs Across Applications: ++-----------------------+-----------------------+ +|app-20210329165943-0103|app-20210329170243-0018| ++-----------------------+-----------------------+ +|31 |31 | +|32 |32 | +|33 |33 | +|39 |38 | +|40 |40 | +|41 |41 | ++-----------------------+-----------------------+ +``` + +There is one column per application. There is a row per stage ID. If a SQL query matches +between applications, see Matching SQL IDs Across Applications, then an attempt is made +to match stages within that application to each other. This has the same issues with +stages when generating a dot graph. This can be especially helpful when trying to compare +large queries and Spark happened to assign the stage IDs slightly differently, or in some +cases there are a different number of stages because of slight differences in the plan. This +is a best effort, and it is not guaranteed to match up all stages in a plan. + +- Compare Rapids related Spark properties side-by-side: + +``` +Compare Rapids Properties which are set explicitly: ++-------------------------------------------+----------+----------+ +|propertyName |appIndex_1|appIndex_2| ++-------------------------------------------+----------+----------+ +|spark.rapids.memory.pinnedPool.size |null |2g | +|spark.rapids.sql.castFloatToDecimal.enabled|null |true | +|spark.rapids.sql.concurrentGpuTasks |null |2 | +|spark.rapids.sql.decimalType.enabled |null |true | +|spark.rapids.sql.enabled |false |true | +|spark.rapids.sql.explain |null |NOT_ON_GPU| +|spark.rapids.sql.hasNans |null |FALSE | +|spark.rapids.sql.incompatibleOps.enabled |null |true | +|spark.rapids.sql.variableFloatAgg.enabled |null |TRUE | ++-------------------------------------------+----------+----------+ +``` + +- List rapids-4-spark and cuDF jars based on classpath: + +``` +Rapids Accelerator Jar and cuDF Jar: ++--------+------------------------------------------------------------+ +|appIndex|Rapids4Spark jars | ++--------+------------------------------------------------------------+ +|1 |spark://10.10.10.10:43445/jars/cudf-0.19.2-cuda11.jar | +|1 |spark://10.10.10.10:43445/jars/rapids-4-spark_2.12-0.5.0.jar| +|2 |spark://10.10.10.11:41319/jars/cudf-0.19.2-cuda11.jar | +|2 |spark://10.10.10.11:41319/jars/rapids-4-spark_2.12-0.5.0.jar| ++--------+------------------------------------------------------------+ +``` + +- Job, stage and SQL ID information(not in `compare` mode yet): + +``` ++--------+-----+---------+-----+ +|appIndex|jobID|stageIds |sqlID| ++--------+-----+---------+-----+ +|1 |0 |[0] |null | +|1 |1 |[1,2,3,4]|0 | ++--------+-----+---------+-----+ +``` + +- SQL Plan Metrics for Application for each SQL plan node in each SQL: + +These are also called accumulables in Spark. + +``` +SQL Plan Metrics for Application: ++--------+-----+------+-----------------------------------------------------------+-------------+-----------------------+-------------+----------+ +|appIndex|sqlID|nodeID|nodeName |accumulatorId|name |max_value |metricType| ++--------+-----+------+-----------------------------------------------------------+-------------+-----------------------+-------------+----------+ +|1 |0 |1 |GpuColumnarExchange |111 |output rows |1111111111 |sum | +|1 |0 |1 |GpuColumnarExchange |112 |output columnar batches|222222 |sum | +|1 |0 |1 |GpuColumnarExchange |113 |data size |333333333333 |size | +|1 |0 |1 |GpuColumnarExchange |114 |shuffle bytes written |444444444444 |size | +|1 |0 |1 |GpuColumnarExchange |115 |shuffle records written|555555 |sum | +|1 |0 |1 |GpuColumnarExchange |116 |shuffle write time |666666666666 |nsTiming | +``` + +- Print SQL Plans (-p option): +Prints the SQL plan as a text string to a file named `planDescriptions.log`. +For example if your application id is app-20210507103057-0000, then the +filename will be `planDescriptions.log` + +- Generate DOT graph for each SQL (-g option): + +``` +Generated DOT graphs for app app-20210507103057-0000 to /path/. in 17 second(s) +``` + +Once the DOT file is generated, you can install [graphviz](http://www.graphviz.org) to convert the DOT file +as a graph in pdf format using below command: + +```bash +dot -Tpdf ./app-20210507103057-0000-query-0/0.dot > app-20210507103057-0000.pdf +``` + +Or to svg using + +```bash +dot -Tsvg ./app-20210507103057-0000-query-0/0.dot > app-20210507103057-0000.svg +``` + +The pdf or svg file has the SQL plan graph with metrics. The svg file will act a little +more like the Spark UI and include extra information for nodes when hovering over it with +a mouse. + +As a part of this an effort is made to associate parts of the graph with the Spark stage it is a +part of. This is not 100% accurate. Some parts of the plan like `TakeOrderedAndProject` may +be a part of multiple stages and only one of the stages will be selected. `Exchanges` are purposely +left out of the sections associated with a stage because they cover at least 2 stages and possibly +more. In other cases we may not be able to determine what stage something was a part of. In those +cases we mark it as `UNKNOWN STAGE`. This is because we rely on metrics to link a node to a stage. +If a stage hs no metrics, like if the query crashed early, we cannot establish that link. + +- Generate timeline for application (--generate-timeline option): + +The output of this is an [svg](https://en.wikipedia.org/wiki/Scalable_Vector_Graphics) file +named `timeline.svg`. Most web browsers can display this file. It is a +timeline view similar Apache Spark's +[event timeline](https://spark.apache.org/docs/latest/web-ui.html). + +This displays several data sections. + +1. **Tasks** This shows all tasks in the application divided by executor. Please note that this + tries to pack the tasks in the graph. It does not represent actual scheduling on CPU cores. + The tasks are labeled with the time it took for them to run, but there is no breakdown about + different aspects of each task, like there is in Spark's timeline. +2. **STAGES** This shows the stages times reported by Spark. It starts with when the stage was + scheduled and ends when Spark considered the stage done. +3. **STAGE RANGES** This shows the time from the start of the first task to the end of the last + task. Often a stage is scheduled, but there are not enough resources in the cluster to run it. + This helps to show. How long it takes for a task to start running after it is scheduled, and in + many cases how long it took to run all of the tasks in the stage. This is not always true because + Spark can intermix tasks from different stages. +4. **JOBS** This shows the time range reported by Spark from when a job was scheduled to when it + completed. +5. **SQL** This shows the time range reported by Spark from when a SQL statement was scheduled to + when it completed. + +Tasks and stages all are color coordinated to help know what tasks are associated with a given +stage. Jobs and SQL are not color coordinated. + +#### B. Analysis +- Job + Stage level aggregated task metrics +- SQL level aggregated task metrics +- SQL duration, application during, if it contains a Dataset operation, potential problems, executor CPU time percent +- Shuffle Skew Check: (When task's Shuffle Read Size > 3 * Avg Stage-level size) + +Below we will aggregate the task level metrics at different levels +to do some analysis such as detecting possible shuffle skew. + +- Job + Stage level aggregated task metrics: + +``` +### B. Analysis ### + +Job + Stage level aggregated task metrics: ++--------+-------+--------+--------+--------------------+------------+------------+------------+------------+-------------------+------------------------------+---------------------------+-------------------+---------------------+-------------------+---------------------+-------------+----------------------+-----------------------+-------------------------+-----------------------+---------------------------+--------------+--------------------+-------------------------+---------------------+--------------------------+----------------------+----------------------------+---------------------+-------------------+---------------------+----------------+ +|appIndex|ID |numTasks|Duration|diskBytesSpilled_sum|duration_sum|duration_max|duration_min|duration_avg|executorCPUTime_sum|executorDeserializeCPUTime_sum|executorDeserializeTime_sum|executorRunTime_sum|gettingResultTime_sum|input_bytesRead_sum|input_recordsRead_sum|jvmGCTime_sum|memoryBytesSpilled_sum|output_bytesWritten_sum|output_recordsWritten_sum|peakExecutionMemory_max|resultSerializationTime_sum|resultSize_max|sr_fetchWaitTime_sum|sr_localBlocksFetched_sum|sr_localBytesRead_sum|sr_remoteBlocksFetched_sum|sr_remoteBytesRead_sum|sr_remoteBytesReadToDisk_sum|sr_totalBytesRead_sum|sw_bytesWritten_sum|sw_recordsWritten_sum|sw_writeTime_sum| ++--------+-------+--------+--------+--------------------+------------+------------+------------+------------+-------------------+------------------------------+---------------------------+-------------------+---------------------+-------------------+---------------------+-------------+----------------------+-----------------------+-------------------------+-----------------------+---------------------------+--------------+--------------------+-------------------------+---------------------+--------------------------+----------------------+----------------------------+---------------------+-------------------+---------------------+----------------+ +|1 |job_0 |3333 |222222 |0 |11111111 |111111 |111 |1111.1 |6666666 |55555 |55555 |55555555 |0 |222222222222 |22222222222 |111111 |0 |0 |0 |222222222 |1 |11111 |11111 |99999 |22222222222 |2222221 |222222222222 |0 |222222222222 |222222222222 |5555555 |444444 | +``` + + +- SQL level aggregated task metrics: + +``` +SQL level aggregated task metrics: ++--------+------------------------------+-----+--------------------+--------+--------+---------------+---------------+----------------+--------------------+------------+------------+------------+------------+-------------------+------------------------------+---------------------------+-------------------+---------------------+-------------------+---------------------+-------------+----------------------+-----------------------+-------------------------+-----------------------+---------------------------+--------------+--------------------+-------------------------+---------------------+--------------------------+----------------------+----------------------------+---------------------+-------------------+---------------------+----------------+ +|appIndex|appID |sqlID|description |numTasks|Duration|executorCPUTime|executorRunTime|executorCPURatio|diskBytesSpilled_sum|duration_sum|duration_max|duration_min|duration_avg|executorCPUTime_sum|executorDeserializeCPUTime_sum|executorDeserializeTime_sum|executorRunTime_sum|gettingResultTime_sum|input_bytesRead_sum|input_recordsRead_sum|jvmGCTime_sum|memoryBytesSpilled_sum|output_bytesWritten_sum|output_recordsWritten_sum|peakExecutionMemory_max|resultSerializationTime_sum|resultSize_max|sr_fetchWaitTime_sum|sr_localBlocksFetched_sum|sr_localBytesRead_sum|sr_remoteBlocksFetched_sum|sr_remoteBytesRead_sum|sr_remoteBytesReadToDisk_sum|sr_totalBytesRead_sum|sw_bytesWritten_sum|sw_recordsWritten_sum|sw_writeTime_sum| ++--------+------------------------------+-----+--------------------+--------+--------+---------------+---------------+----------------+--------------------+------------+------------+------------+------------+-------------------+------------------------------+---------------------------+-------------------+---------------------+-------------------+---------------------+-------------+----------------------+-----------------------+-------------------------+-----------------------+---------------------------+--------------+--------------------+-------------------------+---------------------+--------------------------+----------------------+----------------------------+---------------------+-------------------+---------------------+----------------+ +|1 |application_1111111111111_0001|0 |show at :11|1111 |222222 |6666666 |55555555 |55.55 |0 |13333333 |111111 |999 |3333.3 |6666666 |55555 |66666 |11111111 |0 |111111111111 |11111111111 |111111 |0 |0 |0 |888888888 |8 |11111 |11111 |99999 |11111111111 |2222222 |222222222222 |0 |222222222222 |444444444444 |5555555 |444444 | +``` + +- SQL duration, application during, if it contains a Dataset operation, potential problems, executor CPU time percent: + +``` +SQL Duration and Executor CPU Time Percent ++--------+------------------------------+-----+------------+-------------------+------------+------------------+-------------------------+ +|appIndex|App ID |sqlID|SQL Duration|Contains Dataset Op|App Duration|Potential Problems|Executor CPU Time Percent| ++--------+------------------------------+-----+------------+-------------------+------------+------------------+-------------------------+ +|1 |application_1603128018386_7759|0 |11042 |false |119990 |null |68.48 | ++--------+------------------------------+-----+------------+-------------------+------------+------------------+-------------------------+ +``` + +- Shuffle Skew Check: + +``` +Shuffle Skew Check: (When task's Shuffle Read Size > 3 * Avg Stage-level size) ++--------+-------+--------------+------+-------+---------------+--------------+-----------------+----------------+----------------+----------+----------------------------------------------------------------------------------------------------+ +|appIndex|stageId|stageAttemptId|taskId|attempt|taskDurationSec|avgDurationSec|taskShuffleReadMB|avgShuffleReadMB|taskPeakMemoryMB|successful|reason | ++--------+-------+--------------+------+-------+---------------+--------------+-----------------+----------------+----------------+----------+----------------------------------------------------------------------------------------------------+ +|1 |2 |0 |2222 |0 |111.11 |7.7 |2222.22 |111.11 |0.01 |false |ExceptionFailure(ai.rapids.cudf.CudfException,cuDF failure at: /dddd/xxxxxxx/ccccc/bbbbbbbbb/aaaaaaa| +|1 |2 |0 |2224 |1 |222.22 |8.8 |3333.33 |111.11 |0.01 |false |ExceptionFailure(ai.rapids.cudf.CudfException,cuDF failure at: /dddd/xxxxxxx/ccccc/bbbbbbbbb/aaaaaaa| ++--------+-------+--------------+------+-------+---------------+--------------+-----------------+----------------+----------------+----------+----------------------------------------------------------------------------------------------------+ +``` + +#### C. Health Check +- List failed tasks, stages and jobs +- Removed BlockManagers and Executors +- SQL Plan HealthCheck + +Below are examples. +- Print failed tasks: + +``` +Failed tasks: ++--------+-------+--------------+------+-------+----------------------------------------------------------------------------------------------------+ +|appIndex|stageId|stageAttemptId|taskId|attempt|failureReason | ++--------+-------+--------------+------+-------+----------------------------------------------------------------------------------------------------+ +|3 |4 |0 |2842 |0 |ExceptionFailure(ai.rapids.cudf.CudfException,cuDF failure at: /home/jenkins/agent/workspace/jenkins| +|3 |4 |0 |2858 |0 |TaskKilled(another attempt succeeded,List(AccumulableInfo(453,None,Some(22000),None,false,true,None)| +|3 |4 |0 |2884 |0 |TaskKilled(another attempt succeeded,List(AccumulableInfo(453,None,Some(21148),None,false,true,None)| +|3 |4 |0 |2908 |0 |TaskKilled(another attempt succeeded,List(AccumulableInfo(453,None,Some(20420),None,false,true,None)| +|3 |4 |0 |3410 |1 |ExceptionFailure(ai.rapids.cudf.CudfException,cuDF failure at: /home/jenkins/agent/workspace/jenkins| +|4 |1 |0 |1948 |1 |TaskKilled(another attempt succeeded,List(AccumulableInfo(290,None,Some(1107),None,false,true,None),| ++--------+-------+--------------+------+-------+----------------------------------------------------------------------------------------------------+ +``` + +- Print failed stages: + +``` +Failed stages: ++--------+-------+---------+-------------------------------------+--------+---------------------------------------------------+ +|appIndex|stageId|attemptId|name |numTasks|failureReason | ++--------+-------+---------+-------------------------------------+--------+---------------------------------------------------+ +|3 |4 |0 |attachTree at Spark300Shims.scala:624|1000 |Job 0 cancelled as part of cancellation of all jobs| ++--------+-------+---------+-------------------------------------+--------+---------------------------------------------------+ +``` + +- Print failed jobs: + +``` +Failed jobs: ++--------+-----+---------+------------------------------------------------------------------------+ +|appIndex|jobID|jobResult|failureReason | ++--------+-----+---------+------------------------------------------------------------------------+ +|3 |0 |JobFailed|java.lang.Exception: Job 0 cancelled as part of cancellation of all j...| ++--------+-----+---------+------------------------------------------------------------------------+ +``` + +- SQL Plan HealthCheck: + + Prints possibly unsupported query plan nodes such as `$Lambda` key word means dataset API. + +``` ++--------+-----+------+--------+---------------------------------------------------------------------------------------------------+ +|appIndex|sqlID|nodeID|nodeName|nodeDescription | ++--------+-----+------+--------+---------------------------------------------------------------------------------------------------+ +|3 |1 |8 |Filter |Filter $line21.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$Lambda$4578/0x00000008019f1840@4b63e04c.apply| ++--------+-----+------+--------+---------------------------------------------------------------------------------------------------+ +``` + +## Profiling tool options + +```bash +RAPIDS Accelerator for Apache Spark Profiling tool + +Usage: java -cp rapids-4-spark-tools_2.12-.jar:$SPARK_HOME/jars/* + com.nvidia.spark.rapids.tool.profiling.ProfileMain [options] + + + --combined Collect mode but combine all applications into + the same tables. + -c, --compare Compare Applications (Note this may require + more memory if comparing a large number of + applications). Default is false. + --csv Output each table to a CSV file as well + creating the summary text file. + -f, --filter-criteria Filter newest or oldest N eventlogs for + processing.eg: 100-newest-filesystem (for + processing newest 100 event logs). eg: + 100-oldest-filesystem (for processing oldest + 100 event logs) + -g, --generate-dot Generate query visualizations in DOT format. + Default is false + --generate-timeline Write an SVG graph out for the full + application timeline. + -m, --match-event-logs Filter event logs whose filenames contain the + input string + -n, --num-output-rows Number of output rows for each Application. + Default is 1000 + --num-threads Number of thread to use for parallel + processing. The default is the number of cores + on host divided by 4. + -o, --output-directory Base output directory. Default is current + directory for the default filesystem. The + final output will go into a subdirectory + called rapids_4_spark_profile. It will + overwrite any existing files with the same + name. + -p, --print-plans Print the SQL plans to a file named + 'planDescriptions.log'. + Default is false. + -t, --timeout Maximum time in seconds to wait for the event + logs to be processed. Default is 24 hours + (86400 seconds) and must be greater than 3 + seconds. If it times out, it will report what + it was able to process up until the timeout. + -h, --help Show help message + + trailing arguments: + eventlog (required) Event log filenames(space separated) or directories + containing event logs. eg: s3a:///eventlog1 + /path/to/eventlog2 +``` + +## Profiling tool metrics definitions + +All the metrics definitions can be found in the +[executor task metrics doc](https://spark.apache.org/docs/latest/monitoring.html#executor-task-metrics) / +[executor metrics doc](https://spark.apache.org/docs/latest/monitoring.html#executor-metrics) or +the [SPARK webUI doc](https://spark.apache.org/docs/latest/web-ui.html#content). \ No newline at end of file From a265bdb213eade5ed1d1a80fae1d782d7a7a0636 Mon Sep 17 00:00:00 2001 From: Rodney Howeedy Date: Mon, 25 Oct 2021 09:27:15 -0600 Subject: [PATCH 6/9] Corrections and updates to supported platforms and GPUs. Signed-off-by: Rodney Howeedy --- docs/FAQ.md | 3 ++- docs/download.md | 2 +- docs/get-started/getting-started-on-prem.md | 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/FAQ.md b/docs/FAQ.md index 3875a94b0cb..d8341aa78df 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -19,9 +19,10 @@ process, we try to stay on top of these changes and release updates as quickly a The RAPIDS Accelerator for Apache Spark officially supports: - [Apache Spark](get-started/getting-started-on-prem.md) -- [AWS EMR 6.2.0, 6.3.0](get-started/getting-started-aws-emr.md) +- [AWS EMR 6.2+](get-started/getting-started-aws-emr.md) - [Databricks Runtime 7.3, 8.2](get-started/getting-started-databricks.md) - [Google Cloud Dataproc 2.0](get-started/getting-started-gcp.md) +- [Cloudera CDP 7.1.6+] Most distributions based on a supported Apache Spark version should work, but because the plugin replaces parts of the physical plan that Apache Spark considers to be internal the code for those diff --git a/docs/download.md b/docs/download.md index 23675ad0289..288a667efa1 100644 --- a/docs/download.md +++ b/docs/download.md @@ -23,7 +23,7 @@ Hardware Requirements: The plugin is tested on the following architectures: - GPU Architecture: NVIDIA V100, T4 and A10/A30/A100 GPUs + GPU Architecture: NVIDIA V100, T4 and A2/A10/A30/A100 GPUs Software Requirements: diff --git a/docs/get-started/getting-started-on-prem.md b/docs/get-started/getting-started-on-prem.md index 8d907669a24..f2b069e8d61 100644 --- a/docs/get-started/getting-started-on-prem.md +++ b/docs/get-started/getting-started-on-prem.md @@ -52,7 +52,7 @@ Download the RAPIDS Accelerator for Apache Spark plugin jar. Then download the v jar that your version of the accelerator depends on. Each cudf jar is for a specific version of CUDA and will not run on other versions. The jars use a maven classifier to keep them separate. -- CUDA 11.0/11.1/11.2 => classifier cuda11 +- CUDA 11.x => classifier cuda11 For example, here is a sample version of the jars and cudf with CUDA 11.0 support: - cudf-21.10.0-cuda11.jar From 00cb280905a7d3a8a10136ded4bfc614d85a710b Mon Sep 17 00:00:00 2001 From: Rodney Howeedy Date: Mon, 25 Oct 2021 09:28:44 -0600 Subject: [PATCH 7/9] gh-pages formatting correction. Signed-off-by: Rodney Howeedy --- docs/FAQ.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/FAQ.md b/docs/FAQ.md index d8341aa78df..0ddc1471130 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -22,7 +22,7 @@ The RAPIDS Accelerator for Apache Spark officially supports: - [AWS EMR 6.2+](get-started/getting-started-aws-emr.md) - [Databricks Runtime 7.3, 8.2](get-started/getting-started-databricks.md) - [Google Cloud Dataproc 2.0](get-started/getting-started-gcp.md) -- [Cloudera CDP 7.1.6+] +- Cloudera CDP 7.1.6+ Most distributions based on a supported Apache Spark version should work, but because the plugin replaces parts of the physical plan that Apache Spark considers to be internal the code for those From 19d12bdb86dbb27c941b4772ac9bd439f058aa7f Mon Sep 17 00:00:00 2001 From: Rodney Howeedy Date: Mon, 25 Oct 2021 12:12:20 -0600 Subject: [PATCH 8/9] Correct code to invoke cache-serializer. Signed-off-by: Rodney Howeedy --- docs/additional-functionality/cache-serializer.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/additional-functionality/cache-serializer.md b/docs/additional-functionality/cache-serializer.md index e45fd387013..09a3f37dfc6 100644 --- a/docs/additional-functionality/cache-serializer.md +++ b/docs/additional-functionality/cache-serializer.md @@ -32,7 +32,7 @@ nav_order: 2 To use this serializer please run Spark with the following conf. ``` - spark-shell --conf spark-shell --conf spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer + spark-shell --conf spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer ``` From 4bc8daa1df0dd53b72ebe6aebd311c6334c6e121 Mon Sep 17 00:00:00 2001 From: Rodney Howeedy Date: Mon, 25 Oct 2021 13:07:28 -0600 Subject: [PATCH 9/9] Correct case references across retroactive release download.md. Signed-off-by: Rodney Howeedy --- docs/download.md | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/docs/download.md b/docs/download.md index 288a667efa1..4350064ee09 100644 --- a/docs/download.md +++ b/docs/download.md @@ -29,7 +29,7 @@ Software Requirements: OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS 8 - CUDA & Nvidia Drivers*: 11.0-11.4 & v450.80.02+ + CUDA & NVIDIA Drivers*: 11.0-11.4 & v450.80.02+ Apache Spark 3.0.1, 3.0.2, 3.0.3, 3.1.1, 3.1.2, 3.2.0, Cloudera CDP 7.1.6, 7.1.7, Databricks 7.3 ML LTS or 8.2 ML Runtime, and GCP Dataproc 2.0 @@ -87,7 +87,7 @@ Software Requirements: OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS 8 - CUDA & Nvidia Drivers*: 11.0-11.4 & v450.80.02+ + CUDA & NVIDIA Drivers*: 11.0-11.4 & v450.80.02+ Apache Spark 3.0.1, 3.0.2, 3.0.3, 3.1.1, 3.1.2, Cloudera CDP 7.1.6, 7.1.7, Databricks 7.3 ML LTS or 8.2 ML Runtime, and GCP Dataproc 2.0 @@ -142,7 +142,7 @@ Software Requirements: OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS 8 - CUDA & Nvidia Drivers*: 11.0 or 11.2 & v450.80.02+ + CUDA & NVIDIA Drivers*: 11.0 or 11.2 & v450.80.02+ Apache Spark 3.0.1, 3.0.2, 3.1.1, 3.1.2, Cloudera CDP 7.1.7, Databricks 7.3 ML LTS or 8.2 ML Runtime, and GCP Dataproc 2.0 @@ -184,7 +184,7 @@ Software Requirements: OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS 8 - CUDA & Nvidia Drivers*: 11.0 or 11.2 & v450.80.02+ + CUDA & NVIDIA Drivers*: 11.0 or 11.2 & v450.80.02+ Apache Spark 3.0.1, 3.0.2, 3.1.1, 3.1.2, Cloudera CDP 7.1.7, and GCP Dataproc 2.0 @@ -230,7 +230,7 @@ Software Requirements: OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS 8 - CUDA & Nvidia Drivers*: 11.0 or 11.2 & v450.80.02+ + CUDA & NVIDIA Drivers*: 11.0 or 11.2 & v450.80.02+ Apache Spark 3.0.1, 3.0.2, 3.1.1, 3.1.2, Cloudera CDP 7.1.7, Databricks 8.2 ML Runtime, and GCP Dataproc 2.0 @@ -296,7 +296,7 @@ Software Requirements: OS: Ubuntu 18.04, Ubuntu 20.04 or CentOS 7, CentOS8 - CUDA & Nvidia Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+ + CUDA & NVIDIA Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+ Apache Spark 3.0.0, 3.0.1, 3.0.2, 3.1.1, Databricks 7.3 ML LTS Runtime, or GCP Dataproc 2.0 @@ -344,7 +344,7 @@ Software Requirements: OS: Ubuntu 16.04, Ubuntu 18.04 or CentOS 7 - CUDA & Nvidia Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+ + CUDA & NVIDIA Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+ Apache Spark 3.0, 3.0.1, 3.0.2, 3.1.1, Databricks 7.3 ML LTS Runtime, or GCP Dataproc 2.0 @@ -364,7 +364,7 @@ The list of all supported operations is provided [here](supported_ops.md). For a detailed list of changes, please refer to the [CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md). -**_Note:_** Using Nvidia driver release 450.80.02, 450.102.04 or 460.32.03 in combination with the +**_Note:_** Using NVIDIA driver release 450.80.02, 450.102.04 or 460.32.03 in combination with the CUDA 10.1 or 10.2 toolkit may result in long read times when reading a file that is snappy compressed. In those cases we recommend either running with the CUDA 11.0 toolkit or using a newer driver. This issue is resolved in the 0.5.0 and higher releases. @@ -386,7 +386,7 @@ Software Requirements: OS: Ubuntu 16.04, Ubuntu 18.04 or CentOS 7 - CUDA & Nvidia Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+ + CUDA & NVIDIA Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+ Apache Spark 3.0, 3.0.1, 3.0.2, 3.1.1, Databricks 7.3 ML LTS Runtime, or GCP Dataproc 2.0 @@ -418,7 +418,7 @@ The list of all supported operations is provided [here](supported_ops.md). For a detailed list of changes, please refer to the [CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md). -**_Note:_** Using Nvidia driver release 450.80.02, 450.102.04 or 460.32.03 in combination with the +**_Note:_** Using NVIDIA driver release 450.80.02, 450.102.04 or 460.32.03 in combination with the CUDA 10.1 or 10.2 toolkit may result in long read times when reading a file that is snappy compressed. In those cases we recommend either running with the CUDA 11.0 toolkit or using a newer driver. This issue is resolved in the 0.5.0 and higher releases. @@ -440,7 +440,7 @@ Software Requirements: OS: Ubuntu 16.04, Ubuntu 18.04 or CentOS 7 - CUDA & Nvidia Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+ + CUDA & NVIDIA Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+ Apache Spark 3.0, 3.0.1, Databricks 7.3 ML LTS Runtime, or GCP Dataproc 2.0 @@ -469,7 +469,7 @@ The list of all supported operations is provided [here](supported_ops.md). For a detailed list of changes, please refer to the [CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md). -**_Note:_** Using Nvidia driver release 450.80.02, 450.102.04 or 460.32.03 in combination with the +**_Note:_** Using NVIDIA driver release 450.80.02, 450.102.04 or 460.32.03 in combination with the CUDA 10.1 or 10.2 toolkit may result in long read times when reading a file that is snappy compressed. In those cases we recommend either running with the CUDA 11.0 toolkit or using a newer driver. This issue is resolved in the 0.5.0 and higher releases. @@ -491,7 +491,7 @@ Software Requirements: OS: Ubuntu 16.04, Ubuntu 18.04 or CentOS 7 - CUDA & Nvidia Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+ + CUDA & NVIDIA Drivers: 10.1.2 & v418.87+, 10.2 & v440.33+ or 11.0 & v450.36+ Apache Spark 3.0, 3.0.1 @@ -523,7 +523,7 @@ The list of all supported operations is provided For a detailed list of changes, please refer to the [CHANGELOG](https://github.com/NVIDIA/spark-rapids/blob/main/CHANGELOG.md). -**_Note:_** Using Nvidia driver release 450.80.02, 450.102.04 or 460.32.03 in combination with the +**_Note:_** Using NVIDIA driver release 450.80.02, 450.102.04 or 460.32.03 in combination with the CUDA 10.1 or 10.2 toolkit may result in long read times when reading a file that is snappy compressed. In those cases we recommend either running with the CUDA 11.0 toolkit or using a newer driver. This issue is resolved in the 0.5.0 and higher releases.