Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Update readme, ipynb files for 24.02 version [skip ci] #373

Merged
merged 15 commits into from
Apr 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/auto-merge.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022-2023, NVIDIA CORPORATION.
# Copyright (c) 2022-2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -18,7 +18,7 @@ name: auto-merge HEAD to BASE
on:
pull_request_target:
branches:
- branch-23.12
- branch-24.02
types: [closed]

jobs:
Expand All @@ -29,14 +29,14 @@ jobs:
steps:
- uses: actions/checkout@v3
with:
ref: branch-23.12 # force to fetch from latest upstream instead of PR ref
ref: branch-24.02 # force to fetch from latest upstream instead of PR ref

- name: auto-merge job
uses: ./.github/workflows/auto-merge
env:
OWNER: NVIDIA
REPO_NAME: spark-rapids-examples
HEAD: branch-23.12
BASE: branch-24.02
HEAD: branch-24.02
BASE: branch-24.04
AUTOMERGE_TOKEN: ${{ secrets.AUTOMERGE_TOKEN }} # use to merge PR

3 changes: 1 addition & 2 deletions .github/workflows/markdown-links-check.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022, NVIDIA CORPORATION.
# Copyright (c) 2022-2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -30,6 +30,5 @@ jobs:
with:
max-depth: -1
use-verbose-mode: 'yes'
check-modified-files-only: 'yes'
config-file: '.github/workflows/markdown-links-check/markdown-links-check-config.json'
base-branch: 'main'
Original file line number Diff line number Diff line change
@@ -1,4 +1,18 @@
{
"ignorePatterns": [
{
"pattern": "/docs"
},
{
"pattern": "/datasets"
},
{
"pattern": "/dockerfile"
},
{
"pattern": "/examples"
}
],
"timeout": "15s",
"retryOn429": true,
"retryCount":30,
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ can be built for running on GPU with RAPIDS Accelerator in this repo:
| 3 | XGBoost | Taxi (Scala) | End-to-end ETL + XGBoost example to predict taxi trip fare amount with [NYC taxi trips data set](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page)
| 4 | ML/DL | PCA End-to-End | Spark MLlib based PCA example to train and transform with a synthetic dataset
| 5 | UDF | cuSpatial - Point in Polygon | Spark cuSpatial example for Point in Polygon function using NYC Taxi pickup location dataset
| 6 | UDF | URL Decode | Decodes URL-encoded strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable/)
| 7 | UDF | URL Encode | URL-encodes strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable/)
| 6 | UDF | URL Decode | Decodes URL-encoded strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy/)
| 7 | UDF | URL Encode | URL-encodes strings using the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy/)
| 8 | UDF | [CosineSimilarity](./examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/java/CosineSimilarity.java) | Computes the cosine similarity between two float vectors using [native code](./examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src)
| 9 | UDF | [StringWordCount](./examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java) | Implements a Hive simple UDF using [native code](./examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/src) to count words in strings
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Navigate to your home directory in the UI and select **Create** > **File** from
create an `init.sh` scripts with contents:
```bash
#!/bin/bash
sudo wget -O /databricks/jars/rapids-4-spark_2.12-23.12.1.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar
sudo wget -O /databricks/jars/rapids-4-spark_2.12-24.02.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar
```
1. Select the Databricks Runtime Version from one of the supported runtimes specified in the
Prerequisites section.
Expand Down Expand Up @@ -68,7 +68,7 @@ create an `init.sh` scripts with contents:
```bash
spark.rapids.sql.python.gpu.enabled true
spark.python.daemon.module rapids.daemon_databricks
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-23.12.1.jar:/databricks/spark/python
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-24.02.0.jar:/databricks/spark/python
```
Note that since python memory pool require installing the cudf library, so you need to install cudf library in
each worker nodes `pip install cudf-cu11 --extra-index-url=https://pypi.nvidia.com` or disable python memory pool
Expand Down
2 changes: 1 addition & 1 deletion docs/get-started/xgboost-examples/csp/databricks/init.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
sudo rm -f /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-gpu_2.12--ml.dmlc__xgboost4j-gpu_2.12__1.5.2.jar
sudo rm -f /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-spark-gpu_2.12--ml.dmlc__xgboost4j-spark-gpu_2.12__1.5.2.jar

sudo wget -O /databricks/jars/rapids-4-spark_2.12-23.12.1.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar
sudo wget -O /databricks/jars/rapids-4-spark_2.12-24.02.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar
sudo wget -O /databricks/jars/xgboost4j-gpu_2.12-1.7.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-gpu_2.12/1.7.1/xgboost4j-gpu_2.12-1.7.1.jar
sudo wget -O /databricks/jars/xgboost4j-spark-gpu_2.12-1.7.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-spark-gpu_2.12/1.7.1/xgboost4j-spark-gpu_2.12-1.7.1.jar
ls -ltr
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ export SPARK_DOCKER_IMAGE=<gpu spark docker image repo and name>
export SPARK_DOCKER_TAG=<spark docker image tag>

pushd ${SPARK_HOME}
wget https://github.com/NVIDIA/spark-rapids-examples/raw/branch-23.12/dockerfile/Dockerfile
wget https://github.com/NVIDIA/spark-rapids-examples/raw/branch-24.02/dockerfile/Dockerfile

# Optionally install additional jars into ${SPARK_HOME}/jars/

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ For simplicity export the location to these jars. All examples assume the packag
### Download the jars

Download the RAPIDS Accelerator for Apache Spark plugin jar
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar)
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar)

### Build XGBoost Python Examples

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ For simplicity export the location to these jars. All examples assume the packag
### Download the jars

1. Download the RAPIDS Accelerator for Apache Spark plugin jar
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar)
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar)

### Build XGBoost Scala Examples

Expand Down
2 changes: 1 addition & 1 deletion examples/ML+DL-Examples/Spark-DL/criteo_train/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ _Please note: The following demo is dedicated for DGX-2 machine(with V100 GPUs).
## Dataset

The dataset used here is from Criteo clicklog dataset.
It's preprocessed by [DLRM](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Recommendation/DLRM/preproc)
It's preprocessed by [DLRM](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Recommendation/DLRM_and_DCNv2/preproc)
ETL job on Spark. We also provide a small size sample data in sample_data folder.
All 40 columns(1 label + 39 features) are already numeric.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"\n",
"This notebook contains the same content as \"criteo_keras.py\" but in a notebook(interactive) form.\n",
"\n",
"The dataset used here is from Criteo clicklog dataset. It's preprocessed by DLRM(https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Recommendation/DLRM/preproc) ETL job on Spark.\n",
"The dataset used here is from Criteo clicklog dataset. It's preprocessed by DLRM(https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/Recommendation/DLRM_and_DCNv2/preproc) ETL job on Spark.\n",
"\n",
"We provide a small size sample data in `sample_data` folder.\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion examples/ML+DL-Examples/Spark-cuML/pca/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
ARG CUDA_VER=11.8.0
FROM nvidia/cuda:${CUDA_VER}-devel-ubuntu20.04
# Please do not update the BRANCH_VER version
ARG BRANCH_VER=23.12
ARG BRANCH_VER=24.02

RUN apt-get update
RUN apt-get install -y wget ninja-build git
Expand Down
6 changes: 3 additions & 3 deletions examples/ML+DL-Examples/Spark-cuML/pca/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ User can also download the release jar from Maven central:

[rapids-4-spark-ml_2.12-22.02.0-cuda11.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-ml_2.12/22.02.0/rapids-4-spark-ml_2.12-22.02.0-cuda11.jar)

[rapids-4-spark_2.12-23.12.1.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar)
[rapids-4-spark_2.12-24.02.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar)

Note: This demo could only work with v22.02.0 spark-ml version, and only compatible with spark-rapids versions prior to 23.12.1 . Please do not update the version in release.
Note: This demo could only work with v22.02.0 spark-ml version, and only compatible with spark-rapids versions prior to 24.02.0 . Please do not update the version in release.

## Sample code

Expand Down Expand Up @@ -49,7 +49,7 @@ It is assumed that a Standalone Spark cluster has been set up, the `SPARK_MASTER

``` bash
RAPIDS_ML_JAR=PATH_TO_rapids-4-spark-ml_2.12-22.02.0-cuda11.jar
PLUGIN_JAR=PATH_TO_rapids-4-spark_2.12-23.12.1.jar
PLUGIN_JAR=PATH_TO_rapids-4-spark_2.12-24.02.0.jar

jupyter toree install \
--spark_home=${SPARK_HOME} \
Expand Down
2 changes: 1 addition & 1 deletion examples/ML+DL-Examples/Spark-cuML/pca/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
<groupId>com.nvidia</groupId>
<artifactId>PCAExample</artifactId>
<packaging>jar</packaging>
<version>23.12.0-SNAPSHOT</version>
<version>24.02.0-SNAPSHOT</version>

<properties>
<maven.compiler.source>8</maven.compiler.source>
Expand Down
4 changes: 2 additions & 2 deletions examples/ML+DL-Examples/Spark-cuML/pca/spark-submit.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

# Note that the last rapids-4-spark-ml release version is 22.02.0, snapshot version is 23.04.0-SNPASHOT, please do not update the version in release
ML_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark-ml_2.12/22.02.0/rapids-4-spark-ml_2.12-22.02.0.jar
PLUGIN_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar
PLUGIN_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar
Note: The last rapids-4-spark-ml release version is 22.02.0, snapshot version is 23.04.0-SNPASHOT.

$SPARK_HOME/bin/spark-submit \
Expand All @@ -40,4 +40,4 @@ $SPARK_HOME/bin/spark-submit \
--conf spark.network.timeout=1000s \
--jars $ML_JAR,$PLUGIN_JAR \
--class com.nvidia.spark.examples.pca.Main \
/workspace/target/PCAExample-23.12.1-SNAPSHOT.jar
/workspace/target/PCAExample-24.02.0-SNAPSHOT.jar
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
"import os\n",
"# Change to your cluster ip:port and directories\n",
"SPARK_MASTER_URL = os.getenv(\"SPARK_MASTER_URL\", \"spark:your-ip:port\")\n",
"RAPIDS_JAR = os.getenv(\"RAPIDS_JAR\", \"/your-path/rapids-4-spark_2.12-23.12.1.jar\")\n"
"RAPIDS_JAR = os.getenv(\"RAPIDS_JAR\", \"/your-path/rapids-4-spark_2.12-24.02.0.jar\")\n"
]
},
{
Expand Down
17 changes: 16 additions & 1 deletion examples/UDF-Examples/RAPIDS-accelerated-UDFs/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2021-2023, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2021-2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -70,3 +70,18 @@ RUN cd /tmp \
&& make install -j${PARALLEL_LEVEL} \
&& cd /tmp && rm -rf /tmp/cmake-$CMAKE_VERSION*

# Install ccache
ARG CCACHE_VERSION=4.6
RUN cd /tmp && wget --quiet https://github.com/ccache/ccache/releases/download/v${CCACHE_VERSION}/ccache-${CCACHE_VERSION}.tar.gz && \
tar zxf ccache-${CCACHE_VERSION}.tar.gz && \
rm ccache-${CCACHE_VERSION}.tar.gz && \
cd ccache-${CCACHE_VERSION} && \
mkdir build && \
cd build && \
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DZSTD_FROM_INTERNET=ON \
-DREDIS_STORAGE_BACKEND=OFF && \
cmake --build . --parallel ${PARALLEL_LEVEL} --target install && \
cd ../.. && \
rm -rf ccache-${CCACHE_VERSION}
56 changes: 38 additions & 18 deletions examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ which provides a single method we need to override called
evaluateColumnar returns a cudf ColumnVector, because the GPU get its speed by performing operations
on many rows at a time. In the `evaluateColumnar` function, there is a cudf implementation of URL
decode that we're leveraging, so we don't need to write any native C++ code. This is all done
through the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable). The benefit to
through the [Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy). The benefit to
implement via the Java API is ease of development, but the memory model is not friendly for doing
GPU operations because the JVM makes the assumption that everything we're trying to do is in heap
memory. We need to free the GPU resources in a timely manner with try-finally blocks. Note that we
Expand All @@ -27,10 +27,10 @@ involving the RAPIDS accelerated UDF falls back to the CPU.

- [URLDecode](src/main/scala/com/nvidia/spark/rapids/udf/scala/URLDecode.scala)
decodes URL-encoded strings using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
- [URLEncode](src/main/scala/com/nvidia/spark/rapids/udf/scala/URLEncode.scala)
URL-encodes strings using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)

## Spark Java UDF Examples

Expand All @@ -53,10 +53,10 @@ significant effort.

- [URLDecode](src/main/java/com/nvidia/spark/rapids/udf/java/URLDecode.java)
decodes URL-encoded strings using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
- [URLEncode](src/main/java/com/nvidia/spark/rapids/udf/java/URLEncode.java)
URL-encodes strings using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
- [CosineSimilarity](src/main/java/com/nvidia/spark/rapids/udf/java/CosineSimilarity.java)
computes the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity)
between two float vectors using [native code](src/main/cpp/src)
Expand All @@ -67,11 +67,11 @@ Below are some examples for implementing RAPIDS accelerated Hive UDF via JNI and

- [URLDecode](src/main/java/com/nvidia/spark/rapids/udf/hive/URLDecode.java)
implements a Hive simple UDF using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
to decode URL-encoded strings
- [URLEncode](src/main/java/com/nvidia/spark/rapids/udf/hive/URLEncode.java)
implements a Hive generic UDF using the
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/stable)
[Java APIs of RAPIDS cudf](https://docs.rapids.ai/api/cudf-java/legacy)
to URL-encode strings
- [StringWordCount](src/main/java/com/nvidia/spark/rapids/udf/hive/StringWordCount.java)
implements a Hive simple UDF using
Expand Down Expand Up @@ -118,8 +118,6 @@ and other settings. See the top of the `Dockerfile` for details.

First install docker and [nvidia-docker](https://github.com/NVIDIA/nvidia-docker)

Run the following commands to build and start a docker

```bash
cd spark-rapids-examples/examples/UDF-Examples/RAPIDS-accelerated-UDFs
docker build -t my-local:my-udf-example-ubuntu .
Expand All @@ -133,11 +131,34 @@ In the Docker container, clone the code and compile.
```bash
git clone https://github.com/NVIDIA/spark-rapids-examples.git
cd spark-rapids-examples/examples/UDF-Examples/RAPIDS-accelerated-UDFs
export LOCAL_CCACHE_DIR="$HOME/.ccache"
mkdir -p $LOCAL_CCACHE_DIR
export CCACHE_DIR="$LOCAL_CCACHE_DIR"
export CMAKE_C_COMPILER_LAUNCHER="ccache"
export CMAKE_CXX_COMPILER_LAUNCHER="ccache"
export CMAKE_CUDA_COMPILER_LAUNCHER="ccache"
export CMAKE_CXX_LINKER_LAUNCHER="ccache
mvn clean package -Pudf-native-examples
```

The build could take a long time (e.g.: 1.5 hours). Then the rapids-4-spark-udf-examples*.jar is
The Docker container has installed ccache 4.6 to accelerate the incremental building.
You can change the LOCAL_CCACHE_DIR to a mounted folder so that the cache can persist.
If you don't want to use ccache, you can remove or unset the ccache environment variables.

```bash
unset CCACHE_DIR
unset CMAKE_C_COMPILER_LAUNCHER
unset CMAKE_CXX_COMPILER_LAUNCHER
unset CMAKE_CUDA_COMPILER_LAUNCHER
unset CMAKE_CXX_LINKER_LAUNCHER
```

The first build could take a long time (e.g.: 1.5 hours). Then the rapids-4-spark-udf-examples*.jar is
generated under RAPIDS-accelerated-UDFs/target directory.
The following build can benefit from ccache if you enable it.

If you want to enable building with ccache on your own system,
please refer to the commands which build ccache from the source code in the Dockerfile.

### Run all the examples including native examples in the docker

Expand All @@ -163,7 +184,7 @@ then do the following inside the Docker container.

### Get jars from Maven Central

[rapids-4-spark_2.12-23.12.1.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar)
[rapids-4-spark_2.12-24.02.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.02.0/rapids-4-spark_2.12-24.02.0.jar)


### Launch a local mode Spark
Expand Down Expand Up @@ -192,18 +213,17 @@ schema = StructType([
StructField("c2", IntegerType()),
])
data = [
("s1",1),
("s2",2),
("s1",3),
("s2",3),
("s1",3),
("a b c d",1),
("",2),
(None,3),
("the quick brown fox jumped over the lazy dog",3),
]
df = spark.createDataFrame(
SparkContext.getOrCreate().parallelize(data, numSlices=2),
schema)
df.createOrReplaceTempView("tab")

spark.sql("CREATE TEMPORARY FUNCTION {} AS '{}'".format("wordcount", "com.nvidia.spark.rapids.udf.hive.StringWordCount"))
spark.sql("select wordcount(c1) from tab group by c1").show()
spark.sql("select wordcount(c1) from tab group by c1").explain()
spark.sql("select c1, wordcount(c1) from tab").show()
spark.sql("select c1, wordcount(c1) from tab").explain()
```
4 changes: 2 additions & 2 deletions examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
user defined functions for use with the RAPIDS Accelerator
for Apache Spark
</description>
<version>23.12.0-SNAPSHOT</version>
<version>24.02.0-SNAPSHOT</version>

<properties>
<maven.compiler.source>1.8</maven.compiler.source>
Expand All @@ -37,7 +37,7 @@
<cuda.version>cuda11</cuda.version>
<scala.binary.version>2.12</scala.binary.version>
<!-- Depends on release version, Snapshot version is not published to the Maven Central -->
<rapids4spark.version>23.12.1</rapids4spark.version>
<rapids4spark.version>24.02.0</rapids4spark.version>
<spark.version>3.1.1</spark.version>
<scala.version>2.12.15</scala.version>
<udf.native.build.path>${project.build.directory}/cpp-build</udf.native.build.path>
Expand Down
Loading
Loading