Merge branch-24.08 into main #115

nvauto · 2024-07-14T03:48:36Z

Change version to 24.08.0

Note: merge this PR with Create a merge commit to merge

Keep dependencies (JNI + private) as 24.06-SNAPSHOT until they're available. Filed an issue (NVIDIA#10867) to remind us to bump up dependencies to 24.08.0-SNAPSHOT. Signed-off-by: Tim Liu <[email protected]>

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Signed-off-by: Zach Puller <[email protected]>

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

* Fixed Databricks build * Signing off Signed-off-by: Raza Jafri <[email protected]> * Removed unused import --------- Signed-off-by: Raza Jafri <[email protected]>

…IA#10871) Add classloader diagnostics to initShuffleManager error message --------- Signed-off-by: Zach Puller <[email protected]> Co-authored-by: Jason Lowe <[email protected]> Co-authored-by: Gera Shegalov <[email protected]> Co-authored-by: Alessandro Bellina <[email protected]>

…ricks] (NVIDIA#10945) * Revert "Revert "Add Support for Multiple Filtering Keys for Subquery Broadcas…" This reverts commit bb05b17. * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>

Closes NVIDIA#10875 Contributes to NVIDIA#10773 Unjar, cache, and share the test jar content among all test suites from the same jar Test: ```bash mvn package -Dbuildver=330 -pl tests -am -Dsuffixes='.*\.RapidsJsonSuite' ``` Signed-off-by: Gera Shegalov <[email protected]>

…A#10944) * Added shim for BatchScanExec to support Spark 4.0 Signed-off-by: Raza Jafri <[email protected]> * fixed the failing shim --------- Signed-off-by: Raza Jafri <[email protected]>

…hange. (NVIDIA#10863) * Account for `CommandUtils.uncacheTableOrView` signature change. Fixes NVIDIA#10710. This commit accounts for the changes in the signature of `CommandUtils.uncacheTableOrView` in Apache Spark 4.0. (See [SPARK-47191](apache/spark#45289).) Signed-off-by: MithunR <[email protected]> * Removed unnecessary base class. --------- Signed-off-by: MithunR <[email protected]>

This is a new feature adding the parquet support for GpuInsertIntoHiveTable, who only supports text write now. And this feature is tested by the new added tests in this PR. --------- Signed-off-by: Firestarman <[email protected]> Co-authored-by: Jason Lowe <[email protected]>

…ange. (NVIDIA#10857) * Account for PartitionedFileUtil.splitFiles signature change. Fixes NVIDIA#10299. In Apache Spark 4.0, the signature of `PartitionedFileUtil.splitFiles` was changed to remove unused parameters (apache/spark@eabea643c74). This causes the Spark RAPIDS plugin build to break with Spark 4.0. This commit introduces a shim to account for the signature change. Signed-off-by: MithunR <[email protected]> * Common base for PartitionFileUtilsShims. Signed-off-by: MithunR <[email protected]> * Reusing existing PartitionedFileUtilsShims. * More refactor, for pre-3.5 compile. * Updated Copyright date. * Fixed style error. * Re-fixed the copyright year. * Added missing import. --------- Signed-off-by: MithunR <[email protected]>

To fix: NVIDIA#10867 Change rapids private and jni dependency version to 24.08.0-SNAPSHOT Signed-off-by: Tim Liu <[email protected]>

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

* Add support for the renaming of PythonMapInArrow to MapInArrow * Signing off Signed-off-by: Raza Jafri <[email protected]> * Removed the unnecessary base class from 400 * addressed review comments --------- Signed-off-by: Raza Jafri <[email protected]>

Signed-off-by: Firestarman <[email protected]>

Signed-off-by: Peixin Li <[email protected]>

…itten [skip ci] (NVIDIA#10966) * DO NOT REVIEW Signed-off-by: Peixin Li <[email protected]> * Add default value for REF to avoid overwritten while unexpected manual trigger Signed-off-by: Peixin Li <[email protected]> --------- Signed-off-by: Peixin Li <[email protected]>

* AnalysisException child class Signed-off-by: Raza Jafri <[email protected]> * Use errorClass for reporting AnalysisException * POM changes Signed-off-by: Raza Jafri <[email protected]> * Reuse the RapidsErrorUtils to throw the AnalysisException * Revert "POM changes" This reverts commit 0f765c9. * Updated copyrights * Added the TrampolineUtil method back to handle cases which don't use errorClass * Add doc to the RapidsAnalysisException * addressed review comments * Fixed imports * Moved the RapidsAnalysisException out of TrampolineUtil * fixed imports * addressed review comments * fixed unused import * Removed the TrampolineUtil method for throwing RapidsAnalysisException --------- Signed-off-by: Raza Jafri <[email protected]>

…icks] (NVIDIA#10970) * Incomplete impl of RaiseError for 400 * Removed RaiseError from 400 * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>

This is a bug fix for the hive write tests. In some of the tests on Spak 351, the ProjectExec will fall back to CPU due to missing the GPU version of the MapFromArrays expression. This PR adds the ProjectExec to the allowed list of fallback for Spark 351 and the laters. Signed-off-by: Firestarman <[email protected]>

* Spark 4: Handle ANSI mode in sort_test.py Fixes NVIDIA#11027. With ANSI mode enabled (like the default in Spark 4), one sees that some tests in `sort_test.py` fail, because they expect ANSI mode to be off. This commit disables running those tests with ANSI enabled, and add a separate test for ANSI on/off. Signed-off-by: MithunR <[email protected]> * Refactored not to use disable_ansi_mode. These tests need not be revisited. They test all combinations of ANSI mode, including overflow failures. Signed-off-by: MithunR <[email protected]> --------- Signed-off-by: MithunR <[email protected]>

* Introduce lore id * Introduce lore id * Fix type * Fix type * Conf * style * part * Dump * Introduce lore framework * Add tests. * Rename test case Signed-off-by: liurenjie1024 <[email protected]> * Fix AQE test * Fix style * Use args to display lore info. * Fix build break * Fix path in loreinfo * Remove path * Fix comments * Update configs * Fix comments * Fix config --------- Signed-off-by: liurenjie1024 <[email protected]>

To fix issue: NVIDIA#11113 To support Spark 4.0+ shims, we change scala2.13 build and test against JDK17. Signed-off-by: Tim Liu <[email protected]>

…IA#10965) Signed-off-by: Jason Lowe <[email protected]>

* Skip cast tests that throw exceptions on CPU Signed-off-by: Navin Kumar <[email protected]> * This Exec doesn't actually run on the GPU so this should be added to this mark Signed-off-by: Navin Kumar <[email protected]> * Update README.md with dataproc_serverless runtime_env value Signed-off-by: Navin Kumar <[email protected]> --------- Signed-off-by: Navin Kumar <[email protected]>

…NVIDIA#11118) Signed-off-by: Robert (Bobby) Evans <[email protected]>

* Prep miscellaneous integration tests for Spark 4 Fixes NVIDIA#11020. (grouping_sets_test.py) Fixes NVIDIA#11023. (dpp_test.py) Fixes NVIDIA#11025. (date_time_test.py) Fixes NVIDIA#11026. (map_test.py) This commit prepares miscellaneous integration tests to be run on Spark 4. Certain integration tests fail on Spark 4 because of ANSI mode being enabled by default. This commit disables ANSI on the failing tests, or introduces other fixes so that the tests may pass correctly. Signed-off-by: MithunR <[email protected]>

Fixes NVIDIA#11119. `test_window_group_limits_fallback_for_row_number` can fail non-deterministically when run against a multi-node Spark cluster. This is because the ordering of the input is non-deterministic when multiple null rows are included. The tests have been changed for deterministic ordering, by including a unique order by column. Signed-off-by: MithunR <[email protected]>

* improve MetricsSuite to allow more gc jitter Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix comment Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

This PR adds the HiveHash support on GPU with some common types, and supported types are [bool, byte, short, int, long, string, date, timestamp, float, double]. It also supports bucketed write for the write commands leveraging HiveHash to generate bucket IDs. --------- Signed-off-by: Firestarman <[email protected]>

…[databricks] (NVIDIA#11137) * Handle the change for UnaryPositive now extending RuntimeReplaceable * signing off Signed-off-by: Raza Jafri <[email protected]> * Revert "Handle the change for UnaryPositive now extending RuntimeReplaceable" This reverts commit 414db71. * Replace UnaryExprMeta with ExprMeta for UnaryPositive * Replace UnaryExprMeta with ExprMeta for UnaryPositive * override isFoldableNonLitAllowed --------- Signed-off-by: Raza Jafri <[email protected]>

…VIDIA#11138) * update fastparquet version to 2024.5.0 for numpy2 compatibility Signed-off-by: Peixin Li <[email protected]> * Revert "WAR numpy2 failed fastparquet compatibility issue (NVIDIA#11072)" This reverts commit 6eb854d. * update databricks pip install command * include py3.8 compatibility * copyright --------- Signed-off-by: Peixin Li <[email protected]>

Signed-off-by: Zach Puller <[email protected]>

Signed-off-by: Firestarman <[email protected]>

* Fix ANSI mode failures in subquery_test.py. Fixes NVIDIA#11029. Some tests in subquery_test.py fail when run with ANSI mode enabled, because certain array columns are accessed with invalid indices. These tests predate the availability of ANSI mode in Spark. This commit modifies the tests so that the generated data is appropriately sized for the query. There is no loss of test coverage; failure cases for invalid index values in array columns are already tested as part of `array_test::test_array_item_ansi_fail_invalid_index` Signed-off-by: MithunR <[email protected]>

* Fix oom * Remove unused code Signed-off-by: liurenjie1024 <[email protected]> * Update copy right --------- Signed-off-by: liurenjie1024 <[email protected]>

…es (NVIDIA#11156) Signed-off-by: Jason Lowe <[email protected]>

* Add deletion vector metrics * Add for databricks Signed-off-by: liurenjie1024 <[email protected]> * Fix comments * Fix comments --------- Signed-off-by: liurenjie1024 <[email protected]>

Signed-off-by: Firestarman <[email protected]>

Signed-off-by: Robert (Bobby) Evans <[email protected]>

…VIDIA#11165) * Fix some GpuBraodcastToRowExec by not dropping columns Signed-off-by: Robert (Bobby) Evans <[email protected]> * Review comments --------- Signed-off-by: Robert (Bobby) Evans <[email protected]>

…cks] (NVIDIA#10951) * case when improvement: avoid copy_if_else Signed-off-by: Chong Gao <[email protected]> Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>

* Remove spark31x json lines and shim files 1, Remove spark31x json lines from the source code 2, Remove the files those only for spark31x shims 3, Move the files for spark31x and spark32x+ shims into sql-plugin/src/main/spark320 folder Signed-off-by: Tim Liu <[email protected]> * Drop spark31x shims in the build scripts and pom files Signed-off-by: Tim Liu <[email protected]> * Restore the accidentally deleted file: OrcStatisticShim.scala tests/src/test/spark311/scala/com/nvidia/spark/rapids/shims/OrcStatisticShim.scala --> tests/src/test/spark321cdh/scala/com/nvidia/spark/rapids/shims/OrcStatisticShim.scala check if we chan merge this file into? tests/src/test/spark320/scala/com/nvidia/spark/rapids/shims/OrcStatisticShim.scala Signed-off-by: Tim Liu <[email protected]> * Update Copyright to 2024 Signed-off-by: Tim Liu <[email protected]> * Remove the 31x in ShimLoader.scala according to the review comments Signed-off-by: Tim Liu <[email protected]> * Update the file scala2.13/pom.xml Signed-off-by: Tim Liu <[email protected]> * Drop 3.1.x shims in docs, source code and build scripts Change the default shim to spark320 from spark311 in the shims in docs, source code and build scripts Signed-off-by: Tim Liu <[email protected]> * Updating the docs for the dropping 31x shims Signed-off-by: Tim Liu <[email protected]> * Clean up unused and duplicated 'org/roaringbitmap' folder To fix: NVIDIA#11175 Clean up unused and duplicated 'org/roaringbitmap' in the spark320 shim folder to walk around for the JACOCO error 'different class with same name', after we drop 31x shims and change the default shim to spark320 Signed-off-by: Tim Liu <[email protected]> --------- Signed-off-by: Tim Liu <[email protected]>

…VIDIA#11170) Signed-off-by: Jason Lowe <[email protected]>

Signed-off-by: nvauto <[email protected]>

NvTimLiu and others added 30 commits May 22, 2024 23:06

Init version 24.08.0-SNAPSHOT

f9076a0

Keep dependencies (JNI + private) as 24.06-SNAPSHOT until they're available. Filed an issue (NVIDIA#10867) to remind us to bump up dependencies to 24.08.0-SNAPSHOT. Signed-off-by: Tim Liu <[email protected]>

Merge pull request NVIDIA#10879 from NVIDIA/branch-24.06

02a70d4

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10883 from NVIDIA/branch-24.06

0df3d05

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10885 from NVIDIA/branch-24.06

800ca6b

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10888 from NVIDIA/branch-24.06

ec9221f

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10926 from NVIDIA/branch-24.06

8a13793

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10927 from NVIDIA/branch-24.06

4e4be54

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

append zpuller to authorized user of blossom-ci (NVIDIA#10929)

02f4595

Signed-off-by: Zach Puller <[email protected]>

Merge pull request NVIDIA#10932 from NVIDIA/branch-24.06

2e8d43f

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10935 from NVIDIA/branch-24.06

2dce03d

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10936 from NVIDIA/branch-24.06

69cca07

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10937 from NVIDIA/branch-24.06

6086cac

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Merge pull request NVIDIA#10939 from NVIDIA/branch-24.06

35b1575

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

Fixed Databricks build [databricks] (NVIDIA#10933)

f0b13ed

* Fixed Databricks build * Signing off Signed-off-by: Raza Jafri <[email protected]> * Removed unused import --------- Signed-off-by: Raza Jafri <[email protected]>

Added Shim for BatchScanExec to Support Spark 4.0 [databricks] (NVIDI…

a7cdaa9

…A#10944) * Added shim for BatchScanExec to support Spark 4.0 Signed-off-by: Raza Jafri <[email protected]> * fixed the failing shim --------- Signed-off-by: Raza Jafri <[email protected]>

Change dependency version to 24.08.0-SNAPSHOT (NVIDIA#10949)

2a86bb5

To fix: NVIDIA#10867 Change rapids private and jni dependency version to 24.08.0-SNAPSHOT Signed-off-by: Tim Liu <[email protected]>

Merge pull request NVIDIA#10954 from NVIDIA/branch-24.06

bbdcac0

[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]

fix build errors for 4.0 shim (NVIDIA#10952)

1be42d4

Signed-off-by: Firestarman <[email protected]>

Add new blossom-ci allowed user (NVIDIA#10959)

5750ace

Signed-off-by: Peixin Li <[email protected]>

Move Support for RaiseError to a Shim Excluding Spark 4.0.0 [databr…

3111e2b

…icks] (NVIDIA#10970) * Incomplete impl of RaiseError for 400 * Removed RaiseError from 400 * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>

mythrocks and others added 26 commits July 1, 2024 13:58

Update Scala2.13 premerge CI against JDK17 (NVIDIA#11117)

ba64999

To fix issue: NVIDIA#11113 To support Spark 4.0+ shims, we change scala2.13 build and test against JDK17. Signed-off-by: Tim Liu <[email protected]>

Profiler: Disable collecting async allocation events by default (NVID…

e92cbd2

…IA#10965) Signed-off-by: Jason Lowe <[email protected]>

Fix issue with DPP and AQE on reused broadcast exchanges [databricks] (…

5635fd4

…NVIDIA#11118) Signed-off-by: Robert (Bobby) Evans <[email protected]>

upgrade ucx to 1.17.0 (NVIDIA#11147)

7894d51

Signed-off-by: Zach Puller <[email protected]>

Fix the test error of bucketed write for non-utc (NVIDIA#11151)

d804188

Signed-off-by: Firestarman <[email protected]>

Fix LORE dump oom. (NVIDIA#11153)

a056f16

* Fix oom * Remove unused code Signed-off-by: liurenjie1024 <[email protected]> * Update copy right --------- Signed-off-by: liurenjie1024 <[email protected]>

Fix batch splitting for partition column size on row-count-only batch…

6f36d35

…es (NVIDIA#11156) Signed-off-by: Jason Lowe <[email protected]>

Add deletion vector metrics for low shuffle merge. (NVIDIA#11132)

29904a3

* Add deletion vector metrics * Add for databricks Signed-off-by: liurenjie1024 <[email protected]> * Fix comments * Fix comments --------- Signed-off-by: liurenjie1024 <[email protected]>

fix the bucketed write error for non-utc cases (NVIDIA#11164)

befb3a5

Signed-off-by: Firestarman <[email protected]>

Coalesce batches after a logical coalesce operation (NVIDIA#11126)

aede72f

Signed-off-by: Robert (Bobby) Evans <[email protected]>

Fix some GpuBroadcastToRowExec by not dropping columns [databricks] (N…

e9d097f

…VIDIA#11165) * Fix some GpuBraodcastToRowExec by not dropping columns Signed-off-by: Robert (Bobby) Evans <[email protected]> * Review comments --------- Signed-off-by: Robert (Bobby) Evans <[email protected]>

Case when performance improvement: reduce the copy_if_else [databri…

451463f

…cks] (NVIDIA#10951) * case when improvement: avoid copy_if_else Signed-off-by: Chong Gao <[email protected]> Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>

Avoid listFiles or inputFiles on relations with static partitioning (N…

3c89a31

…VIDIA#11170) Signed-off-by: Jason Lowe <[email protected]>

Merge branch-24.08 into main

44b2b92

Change version to 24.08.0

98917eb

Signed-off-by: nvauto <[email protected]>

nvauto requested a review from NvTimLiu as a code owner July 14, 2024 03:48

NvTimLiu deleted the branch main July 24, 2024 06:51

NvTimLiu closed this Jul 24, 2024

NvTimLiu deleted the merge-branch-24.08-to-main branch July 24, 2024 06:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge branch-24.08 into main #115

Merge branch-24.08 into main #115

nvauto commented Jul 14, 2024

Merge branch-24.08 into main #115

Merge branch-24.08 into main #115

Conversation

nvauto commented Jul 14, 2024