Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge branch-24.04 into main #31

Merged
merged 99 commits into from
Mar 10, 2024
Merged

Merge branch-24.04 into main #31

merged 99 commits into from
Mar 10, 2024

Conversation

nvauto
Copy link
Collaborator

@nvauto nvauto commented Mar 10, 2024

Change version to 24.04.0

Note: merge this PR with Create a merge commit to merge

nvauto and others added 30 commits January 24, 2024 17:20
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
To fix: NVIDIA#10256

Bump up dependency version to 24.04.0-SNAPSHOT

Signed-off-by: Tim Liu <[email protected]>
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
revans2 and others added 26 commits February 28, 2024 08:27
…#10466)

* remove leading space for json path in GetJsonObject

Signed-off-by: Haoyang Li <[email protected]>

* Update comments

Signed-off-by: Haoyang Li <[email protected]>

* Use JsonPathParser to normalize path

Signed-off-by: Haoyang Li <[email protected]>

* Update compatibility doc

Signed-off-by: Haoyang Li <[email protected]>

* clean up

Signed-off-by: Haoyang Li <[email protected]>

* Fallback json paths containing  in GetJsonObject

Signed-off-by: Haoyang Li <[email protected]>

* cache normalizeJsonPath and prevent memory leak

Signed-off-by: Haoyang Li <[email protected]>

* clean up

Signed-off-by: Haoyang Li <[email protected]>

* ready to merge

Signed-off-by: Haoyang Li <[email protected]>

* Use parser to check whether to fallback

Signed-off-by: Haoyang Li <[email protected]>

* Add a special case

Signed-off-by: Haoyang Li <[email protected]>

---------

Signed-off-by: Haoyang Li <[email protected]>
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
* Move 351 shims into noSnapshot buildvers

Move 351 shims into noSnapshot buildvers as spark has release it.

Follow up of NVIDIA#10465 (comment)

Signed-off-by: Tim Liu <[email protected]>

* 351 shim for scala 2.13

Signed-off-by: Tim Liu <[email protected]>

---------

Signed-off-by: Tim Liu <[email protected]>
…0500)

Fixes NVIDIA#8208.

This commit adds support for `WindowGroupLimitExec` to run on GPU.  This optimization was added in Apache Spark 3.5, to reduce the number of rows that participate in shuffles, for queries that contain filters on the result of ranking functions. For example:

```sql
SELECT foo, bar FROM (
  SELECT foo, bar, 
         RANK() OVER (PARTITION BY foo ORDER BY bar) AS rnk
  FROM mytable )
WHERE rnk < 10
```

Such a query would require a shuffle to bring all rows in a window-group to be made available in the same task.
In Spark 3.5, an optimization was added in [SPARK-37099](https://issues.apache.org/jira/browse/SPARK-37099) to take advantage of the `rnk < 10` predicate to reduce shuffle load.
Specifically, since only 9 (i.e. 10-1) ranks participate in the window function, only those many rows need be shuffled into the task, per input batch.  By pre-filtering rows that can't possibly satisfy the condition, the number of shuffled records can be reduced.

The GPU implementation (i.e. `GpuWindowGroupLimitExec`) differs slightly from the CPU implementation, because it needs to execute on the entire input column batch.  As a result, `GpuWindowGroupLimitExec` runs the rank scan on each input batch, and then filters out ranks that exceed the limit specified in the predicate (`rnk < 10`). After the shuffle, the `RANK()` is calculated again by `GpuRunningWindowExec`, to produce the final result.

The current implementation addresses `RANK()` and `DENSE_RANK` window functions.  Other ranking functions (like `ROW_NUMBER()`) can be added at a later date.

Signed-off-by: MithunR <[email protected]>
This PR adds a new metric for the preprojection in GpuExand.

Signed-off-by: Firestarman <[email protected]>
* Update rapids jni and private dependency version to 24.02.1 (NVIDIA#10511)

Signed-off-by: Tim Liu <[email protected]>

* Add missed shims for scala2.13 (NVIDIA#10465)

* Add missed shims for scala2.13

Signed-off-by: Tim Liu <[email protected]>

* Add 351 snapshot shim for the scala2.13 version of plugin jar

Signed-off-by: Tim Liu <[email protected]>

* Remove 351 snapshot shim as spark 3.5.1 has been released

Signed-off-by: Tim Liu <[email protected]>

* Remove scala2.13 351 snapshot shim

Signed-off-by: Tim Liu <[email protected]>

* Remove 351 shim's jason string

Ran `mvn generate-sources -Dshimplify=true -Dshimplify.move=true -Dshimplify.remove.shim=351`

to remove 351 shim's jason string, and fix some unnecessary empty lines that were introduced

Signed-off-by: Tim Liu <[email protected]>

* Update Copyright 2024

Auto copyright by below scripts
```
export SPARK_RAPIDS_AUTO_COPYRIGHTER=ON

./scripts/auto-copyrighter.sh $(git diff --name-only origin/branch-24.04..HEAD)
```

Signed-off-by: Tim Liu <[email protected]>

* Revert "Update Copyright 2024"

This reverts commit 8482847.

* Revert "Remove 351 shim's jason string"

This reverts commit 78d1f00.

* skip 351 from strict checking

* Alien scala2.13/pom.xml to scala2.12 one

Run the script `bash build/make-scala-version-build-files.sh 2.13`

Signed-off-by: Tim Liu <[email protected]>

* pretend 351 is a snapshot in 24.02

Signed-off-by: Gera Shegalov <[email protected]>

* pretend 351 is a SNAPSHOT version

* Revert change of build/shimplify.py

Signed-off-by: Tim Liu <[email protected]>

---------

Signed-off-by: Tim Liu <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Co-authored-by: Raza Jafri <[email protected]>
Co-authored-by: Gera Shegalov <[email protected]>

* Update changelog for v24.02.0 release (NVIDIA#10525)

Signed-off-by: Tim Liu <[email protected]>

---------

Signed-off-by: Tim Liu <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Co-authored-by: Raza Jafri <[email protected]>
Co-authored-by: Gera Shegalov <[email protected]>
Fix merge conflict from branch-24.02
Update to latest branch-24.02 [skip ci]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
* Distinct inner join

Signed-off-by: Jason Lowe <[email protected]>

* Distinct left join

Signed-off-by: Jason Lowe <[email protected]>

* Update to new API

* Fix test

---------

Signed-off-by: Jason Lowe <[email protected]>
…ment (NVIDIA#10564)

* WIP

Signed-off-by: Gera Shegalov <[email protected]>

* WIP

Signed-off-by: Gera Shegalov <[email protected]>

* Enable specifying the pytest using file_or_dir args

```bash
TEST_PARALLEL=0 \
SPARK_HOME=~/dist/spark-3.1.1-bin-hadoop3.2 \
TEST_FILE_OR_DIR=~/gits/NVIDIA/spark-rapids/integration_tests/src/main/python/arithmetic_ops_test.py::test_addition  \
./integration_tests/run_pyspark_from_build.sh --collect-only

<Module src/main/python/arithmetic_ops_test.py>
  <Function test_addition[Byte]>
  <Function test_addition[Short]>
  <Function test_addition[Integer]>
  <Function test_addition[Long]>
  <Function test_addition[Float]>
  <Function test_addition[Double]>
  <Function test_addition[Decimal(7,3)]>
  <Function test_addition[Decimal(12,2)]>
  <Function test_addition[Decimal(18,0)]>
  <Function test_addition[Decimal(20,2)]>
  <Function test_addition[Decimal(30,2)]>
  <Function test_addition[Decimal(36,5)]>
  <Function test_addition[Decimal(38,10)]>
  <Function test_addition[Decimal(38,0)]>
  <Function test_addition[Decimal(7,7)]>
  <Function test_addition[Decimal(7,-3)]>
  <Function test_addition[Decimal(36,-5)]>
  <Function test_addition[Decimal(38,-10)]>
```

Signed-off-by: Gera Shegalov <[email protected]>
Co-authored-by: Raza Jafri <[email protected]>

* Changing to TESTS=module::method

Signed-off-by: Gera Shegalov <[email protected]>

---------

Signed-off-by: Gera Shegalov <[email protected]>
Co-authored-by: Raza Jafri <[email protected]>
Signed-off-by: jenkins <jenkins@localhost>
@nvauto nvauto requested a review from NvTimLiu as a code owner March 10, 2024 13:19
@nvauto
Copy link
Collaborator Author

nvauto commented Mar 10, 2024

[Skip CI] as branch-24.04 already PASS build

@nvauto nvauto merged commit c1e3abb into main Mar 10, 2024
@nvauto nvauto deleted the merge-branch-24.04-to-main branch March 10, 2024 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.