forked from NVIDIA/spark-rapids
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge branch-24.08 into main #115
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Keep dependencies (JNI + private) as 24.06-SNAPSHOT until they're available. Filed an issue (NVIDIA#10867) to remind us to bump up dependencies to 24.08.0-SNAPSHOT. Signed-off-by: Tim Liu <[email protected]>
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
Signed-off-by: Zach Puller <[email protected]>
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
* Fixed Databricks build * Signing off Signed-off-by: Raza Jafri <[email protected]> * Removed unused import --------- Signed-off-by: Raza Jafri <[email protected]>
…IA#10871) Add classloader diagnostics to initShuffleManager error message --------- Signed-off-by: Zach Puller <[email protected]> Co-authored-by: Jason Lowe <[email protected]> Co-authored-by: Gera Shegalov <[email protected]> Co-authored-by: Alessandro Bellina <[email protected]>
…ricks] (NVIDIA#10945) * Revert "Revert "Add Support for Multiple Filtering Keys for Subquery Broadcas…" This reverts commit bb05b17. * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>
Closes NVIDIA#10875 Contributes to NVIDIA#10773 Unjar, cache, and share the test jar content among all test suites from the same jar Test: ```bash mvn package -Dbuildver=330 -pl tests -am -Dsuffixes='.*\.RapidsJsonSuite' ``` Signed-off-by: Gera Shegalov <[email protected]>
…A#10944) * Added shim for BatchScanExec to support Spark 4.0 Signed-off-by: Raza Jafri <[email protected]> * fixed the failing shim --------- Signed-off-by: Raza Jafri <[email protected]>
…hange. (NVIDIA#10863) * Account for `CommandUtils.uncacheTableOrView` signature change. Fixes NVIDIA#10710. This commit accounts for the changes in the signature of `CommandUtils.uncacheTableOrView` in Apache Spark 4.0. (See [SPARK-47191](apache/spark#45289).) Signed-off-by: MithunR <[email protected]> * Removed unnecessary base class. --------- Signed-off-by: MithunR <[email protected]>
This is a new feature adding the parquet support for GpuInsertIntoHiveTable, who only supports text write now. And this feature is tested by the new added tests in this PR. --------- Signed-off-by: Firestarman <[email protected]> Co-authored-by: Jason Lowe <[email protected]>
…ange. (NVIDIA#10857) * Account for PartitionedFileUtil.splitFiles signature change. Fixes NVIDIA#10299. In Apache Spark 4.0, the signature of `PartitionedFileUtil.splitFiles` was changed to remove unused parameters (apache/spark@eabea643c74). This causes the Spark RAPIDS plugin build to break with Spark 4.0. This commit introduces a shim to account for the signature change. Signed-off-by: MithunR <[email protected]> * Common base for PartitionFileUtilsShims. Signed-off-by: MithunR <[email protected]> * Reusing existing PartitionedFileUtilsShims. * More refactor, for pre-3.5 compile. * Updated Copyright date. * Fixed style error. * Re-fixed the copyright year. * Added missing import. --------- Signed-off-by: MithunR <[email protected]>
To fix: NVIDIA#10867 Change rapids private and jni dependency version to 24.08.0-SNAPSHOT Signed-off-by: Tim Liu <[email protected]>
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
* Add support for the renaming of PythonMapInArrow to MapInArrow * Signing off Signed-off-by: Raza Jafri <[email protected]> * Removed the unnecessary base class from 400 * addressed review comments --------- Signed-off-by: Raza Jafri <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Peixin Li <[email protected]>
…itten [skip ci] (NVIDIA#10966) * DO NOT REVIEW Signed-off-by: Peixin Li <[email protected]> * Add default value for REF to avoid overwritten while unexpected manual trigger Signed-off-by: Peixin Li <[email protected]> --------- Signed-off-by: Peixin Li <[email protected]>
* AnalysisException child class Signed-off-by: Raza Jafri <[email protected]> * Use errorClass for reporting AnalysisException * POM changes Signed-off-by: Raza Jafri <[email protected]> * Reuse the RapidsErrorUtils to throw the AnalysisException * Revert "POM changes" This reverts commit 0f765c9. * Updated copyrights * Added the TrampolineUtil method back to handle cases which don't use errorClass * Add doc to the RapidsAnalysisException * addressed review comments * Fixed imports * Moved the RapidsAnalysisException out of TrampolineUtil * fixed imports * addressed review comments * fixed unused import * Removed the TrampolineUtil method for throwing RapidsAnalysisException --------- Signed-off-by: Raza Jafri <[email protected]>
…icks] (NVIDIA#10970) * Incomplete impl of RaiseError for 400 * Removed RaiseError from 400 * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>
This is a bug fix for the hive write tests. In some of the tests on Spak 351, the ProjectExec will fall back to CPU due to missing the GPU version of the MapFromArrays expression. This PR adds the ProjectExec to the allowed list of fallback for Spark 351 and the laters. Signed-off-by: Firestarman <[email protected]>
* Spark 4: Handle ANSI mode in sort_test.py Fixes NVIDIA#11027. With ANSI mode enabled (like the default in Spark 4), one sees that some tests in `sort_test.py` fail, because they expect ANSI mode to be off. This commit disables running those tests with ANSI enabled, and add a separate test for ANSI on/off. Signed-off-by: MithunR <[email protected]> * Refactored not to use disable_ansi_mode. These tests need not be revisited. They test all combinations of ANSI mode, including overflow failures. Signed-off-by: MithunR <[email protected]> --------- Signed-off-by: MithunR <[email protected]>
* Introduce lore id * Introduce lore id * Fix type * Fix type * Conf * style * part * Dump * Introduce lore framework * Add tests. * Rename test case Signed-off-by: liurenjie1024 <[email protected]> * Fix AQE test * Fix style * Use args to display lore info. * Fix build break * Fix path in loreinfo * Remove path * Fix comments * Update configs * Fix comments * Fix config --------- Signed-off-by: liurenjie1024 <[email protected]>
To fix issue: NVIDIA#11113 To support Spark 4.0+ shims, we change scala2.13 build and test against JDK17. Signed-off-by: Tim Liu <[email protected]>
…IA#10965) Signed-off-by: Jason Lowe <[email protected]>
* Skip cast tests that throw exceptions on CPU Signed-off-by: Navin Kumar <[email protected]> * This Exec doesn't actually run on the GPU so this should be added to this mark Signed-off-by: Navin Kumar <[email protected]> * Update README.md with dataproc_serverless runtime_env value Signed-off-by: Navin Kumar <[email protected]> --------- Signed-off-by: Navin Kumar <[email protected]>
…NVIDIA#11118) Signed-off-by: Robert (Bobby) Evans <[email protected]>
* Prep miscellaneous integration tests for Spark 4 Fixes NVIDIA#11020. (grouping_sets_test.py) Fixes NVIDIA#11023. (dpp_test.py) Fixes NVIDIA#11025. (date_time_test.py) Fixes NVIDIA#11026. (map_test.py) This commit prepares miscellaneous integration tests to be run on Spark 4. Certain integration tests fail on Spark 4 because of ANSI mode being enabled by default. This commit disables ANSI on the failing tests, or introduces other fixes so that the tests may pass correctly. Signed-off-by: MithunR <[email protected]>
Fixes NVIDIA#11119. `test_window_group_limits_fallback_for_row_number` can fail non-deterministically when run against a multi-node Spark cluster. This is because the ordering of the input is non-deterministic when multiple null rows are included. The tests have been changed for deterministic ordering, by including a unique order by column. Signed-off-by: MithunR <[email protected]>
* improve MetricsSuite to allow more gc jitter Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix comment Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
This PR adds the HiveHash support on GPU with some common types, and supported types are [bool, byte, short, int, long, string, date, timestamp, float, double]. It also supports bucketed write for the write commands leveraging HiveHash to generate bucket IDs. --------- Signed-off-by: Firestarman <[email protected]>
…[databricks] (NVIDIA#11137) * Handle the change for UnaryPositive now extending RuntimeReplaceable * signing off Signed-off-by: Raza Jafri <[email protected]> * Revert "Handle the change for UnaryPositive now extending RuntimeReplaceable" This reverts commit 414db71. * Replace UnaryExprMeta with ExprMeta for UnaryPositive * Replace UnaryExprMeta with ExprMeta for UnaryPositive * override isFoldableNonLitAllowed --------- Signed-off-by: Raza Jafri <[email protected]>
…VIDIA#11138) * update fastparquet version to 2024.5.0 for numpy2 compatibility Signed-off-by: Peixin Li <[email protected]> * Revert "WAR numpy2 failed fastparquet compatibility issue (NVIDIA#11072)" This reverts commit 6eb854d. * update databricks pip install command * include py3.8 compatibility * copyright --------- Signed-off-by: Peixin Li <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Firestarman <[email protected]>
* Fix ANSI mode failures in subquery_test.py. Fixes NVIDIA#11029. Some tests in subquery_test.py fail when run with ANSI mode enabled, because certain array columns are accessed with invalid indices. These tests predate the availability of ANSI mode in Spark. This commit modifies the tests so that the generated data is appropriately sized for the query. There is no loss of test coverage; failure cases for invalid index values in array columns are already tested as part of `array_test::test_array_item_ansi_fail_invalid_index` Signed-off-by: MithunR <[email protected]>
* Fix oom * Remove unused code Signed-off-by: liurenjie1024 <[email protected]> * Update copy right --------- Signed-off-by: liurenjie1024 <[email protected]>
…es (NVIDIA#11156) Signed-off-by: Jason Lowe <[email protected]>
* Add deletion vector metrics * Add for databricks Signed-off-by: liurenjie1024 <[email protected]> * Fix comments * Fix comments --------- Signed-off-by: liurenjie1024 <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Robert (Bobby) Evans <[email protected]>
…VIDIA#11165) * Fix some GpuBraodcastToRowExec by not dropping columns Signed-off-by: Robert (Bobby) Evans <[email protected]> * Review comments --------- Signed-off-by: Robert (Bobby) Evans <[email protected]>
…cks] (NVIDIA#10951) * case when improvement: avoid copy_if_else Signed-off-by: Chong Gao <[email protected]> Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>
* Remove spark31x json lines and shim files 1, Remove spark31x json lines from the source code 2, Remove the files those only for spark31x shims 3, Move the files for spark31x and spark32x+ shims into sql-plugin/src/main/spark320 folder Signed-off-by: Tim Liu <[email protected]> * Drop spark31x shims in the build scripts and pom files Signed-off-by: Tim Liu <[email protected]> * Restore the accidentally deleted file: OrcStatisticShim.scala tests/src/test/spark311/scala/com/nvidia/spark/rapids/shims/OrcStatisticShim.scala --> tests/src/test/spark321cdh/scala/com/nvidia/spark/rapids/shims/OrcStatisticShim.scala check if we chan merge this file into? tests/src/test/spark320/scala/com/nvidia/spark/rapids/shims/OrcStatisticShim.scala Signed-off-by: Tim Liu <[email protected]> * Update Copyright to 2024 Signed-off-by: Tim Liu <[email protected]> * Remove the 31x in ShimLoader.scala according to the review comments Signed-off-by: Tim Liu <[email protected]> * Update the file scala2.13/pom.xml Signed-off-by: Tim Liu <[email protected]> * Drop 3.1.x shims in docs, source code and build scripts Change the default shim to spark320 from spark311 in the shims in docs, source code and build scripts Signed-off-by: Tim Liu <[email protected]> * Updating the docs for the dropping 31x shims Signed-off-by: Tim Liu <[email protected]> * Clean up unused and duplicated 'org/roaringbitmap' folder To fix: NVIDIA#11175 Clean up unused and duplicated 'org/roaringbitmap' in the spark320 shim folder to walk around for the JACOCO error 'different class with same name', after we drop 31x shims and change the default shim to spark320 Signed-off-by: Tim Liu <[email protected]> --------- Signed-off-by: Tim Liu <[email protected]>
…VIDIA#11170) Signed-off-by: Jason Lowe <[email protected]>
Signed-off-by: nvauto <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change version to 24.08.0
Note: merge this PR with Create a merge commit to merge