forked from NVIDIA/spark-rapids
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge branch-24.10 into main #121
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Keep dependencies (JNI + private) as 24.06-SNAPSHOT until they're available. Filed an issue (NVIDIA#10867) to remind us to bump up dependencies to 24.08.0-SNAPSHOT. Signed-off-by: Tim Liu <[email protected]>
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
Signed-off-by: Zach Puller <[email protected]>
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
* Fixed Databricks build * Signing off Signed-off-by: Raza Jafri <[email protected]> * Removed unused import --------- Signed-off-by: Raza Jafri <[email protected]>
…IA#10871) Add classloader diagnostics to initShuffleManager error message --------- Signed-off-by: Zach Puller <[email protected]> Co-authored-by: Jason Lowe <[email protected]> Co-authored-by: Gera Shegalov <[email protected]> Co-authored-by: Alessandro Bellina <[email protected]>
…ricks] (NVIDIA#10945) * Revert "Revert "Add Support for Multiple Filtering Keys for Subquery Broadcas…" This reverts commit bb05b17. * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>
Closes NVIDIA#10875 Contributes to NVIDIA#10773 Unjar, cache, and share the test jar content among all test suites from the same jar Test: ```bash mvn package -Dbuildver=330 -pl tests -am -Dsuffixes='.*\.RapidsJsonSuite' ``` Signed-off-by: Gera Shegalov <[email protected]>
…A#10944) * Added shim for BatchScanExec to support Spark 4.0 Signed-off-by: Raza Jafri <[email protected]> * fixed the failing shim --------- Signed-off-by: Raza Jafri <[email protected]>
…hange. (NVIDIA#10863) * Account for `CommandUtils.uncacheTableOrView` signature change. Fixes NVIDIA#10710. This commit accounts for the changes in the signature of `CommandUtils.uncacheTableOrView` in Apache Spark 4.0. (See [SPARK-47191](apache/spark#45289).) Signed-off-by: MithunR <[email protected]> * Removed unnecessary base class. --------- Signed-off-by: MithunR <[email protected]>
This is a new feature adding the parquet support for GpuInsertIntoHiveTable, who only supports text write now. And this feature is tested by the new added tests in this PR. --------- Signed-off-by: Firestarman <[email protected]> Co-authored-by: Jason Lowe <[email protected]>
…ange. (NVIDIA#10857) * Account for PartitionedFileUtil.splitFiles signature change. Fixes NVIDIA#10299. In Apache Spark 4.0, the signature of `PartitionedFileUtil.splitFiles` was changed to remove unused parameters (apache/spark@eabea643c74). This causes the Spark RAPIDS plugin build to break with Spark 4.0. This commit introduces a shim to account for the signature change. Signed-off-by: MithunR <[email protected]> * Common base for PartitionFileUtilsShims. Signed-off-by: MithunR <[email protected]> * Reusing existing PartitionedFileUtilsShims. * More refactor, for pre-3.5 compile. * Updated Copyright date. * Fixed style error. * Re-fixed the copyright year. * Added missing import. --------- Signed-off-by: MithunR <[email protected]>
To fix: NVIDIA#10867 Change rapids private and jni dependency version to 24.08.0-SNAPSHOT Signed-off-by: Tim Liu <[email protected]>
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
* Add support for the renaming of PythonMapInArrow to MapInArrow * Signing off Signed-off-by: Raza Jafri <[email protected]> * Removed the unnecessary base class from 400 * addressed review comments --------- Signed-off-by: Raza Jafri <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Peixin Li <[email protected]>
…itten [skip ci] (NVIDIA#10966) * DO NOT REVIEW Signed-off-by: Peixin Li <[email protected]> * Add default value for REF to avoid overwritten while unexpected manual trigger Signed-off-by: Peixin Li <[email protected]> --------- Signed-off-by: Peixin Li <[email protected]>
* AnalysisException child class Signed-off-by: Raza Jafri <[email protected]> * Use errorClass for reporting AnalysisException * POM changes Signed-off-by: Raza Jafri <[email protected]> * Reuse the RapidsErrorUtils to throw the AnalysisException * Revert "POM changes" This reverts commit 0f765c9. * Updated copyrights * Added the TrampolineUtil method back to handle cases which don't use errorClass * Add doc to the RapidsAnalysisException * addressed review comments * Fixed imports * Moved the RapidsAnalysisException out of TrampolineUtil * fixed imports * addressed review comments * fixed unused import * Removed the TrampolineUtil method for throwing RapidsAnalysisException --------- Signed-off-by: Raza Jafri <[email protected]>
…icks] (NVIDIA#10970) * Incomplete impl of RaiseError for 400 * Removed RaiseError from 400 * Signing off Signed-off-by: Raza Jafri <[email protected]> --------- Signed-off-by: Raza Jafri <[email protected]>
This is a bug fix for the hive write tests. In some of the tests on Spak 351, the ProjectExec will fall back to CPU due to missing the GPU version of the MapFromArrays expression. This PR adds the ProjectExec to the allowed list of fallback for Spark 351 and the laters. Signed-off-by: Firestarman <[email protected]>
…s] (NVIDIA#10994) * POM changes for Spark 4.0.0 Signed-off-by: Raza Jafri <[email protected]> * validate buildver and scala versions * more pom changes * fixed the scala-2.12 comment * more fixes for scala-2.13 pom * addressed comments * add in shim check to account for 400 * add 400 for premerge tests against jdk 17 * temporarily remove 400 from snapshotScala213 * fixed 2.13 pom * Remove 400 from jdk17 as it will compile with Scala 2.12 * github workflow changes * added quotes to pom-directory * update version defs to include scala 213 jdk 17 * Cross-compile all shims from JDK17 to JDK8 Eliminate Logging inheritance to prevent shimming of unshimmable API classes Signed-off-by: Gera Shegalov <[email protected]> * dummy * undo api pom change Signed-off-by: Gera Shegalov <[email protected]> * Add preview1 to the allowed shim versions Signed-off-by: Gera Shegalov <[email protected]> * Scala 2.13 to require JDK17 Signed-off-by: Gera Shegalov <[email protected]> * Removed unused import left over from razajafri#3 * Setup JAVA_HOME before caching * Only upgrade the Scala plugin for Scala 2.13 * Regenerate Scala 2.13 poms * Remove 330 from JDK17 builds for Scala 2.12 * Revert "Remove 330 from JDK17 builds for Scala 2.12" This reverts commit 1faabd4. * Downgrade scala.plugin.version for cloudera * Updated comment to include the issue * Upgrading the scala.maven.plugin version to 4.9.1 which is the same as Spark 4.0.0 * Downgrade scala-maven-plugin for Cloudera * revert mvn verify changes * Avoid cache for JDK 17 * removed cache dep from scala 213 * Added Scala 2.13 specific checks * Handle the change for UnaryPositive now extending RuntimeReplaceable * Removing 330 from jdk17.buildvers as we only support Scala2.13 and fixing the enviornment variable in version-defs.sh that we read for building against JDK17 with Scala 213 * Update Scala 2.13 poms * fixed scala2.13 verify to actually use the scala2.13/pom.xml * Added missing csv files * Skip Opcode tests There is a bytecode incompatibility which is why we are skipping these until we add support for it. For details please see the following two issues NVIDIA#11174 NVIDIA#10203 * upmerged and fixed the new compile error introduced * addressed review comments * Removed jdk17 cloudera check and moved it inside the 321,330 and 332 cloudera profiles * fixed upmerge conflicts * reverted renaming of id * Fixed HiveGenericUDFShim * addressed review comments * reverted the debugging code * generated Scala 2.13 poms --------- Signed-off-by: Raza Jafri <[email protected]> Signed-off-by: Gera Shegalov <[email protected]> Co-authored-by: Gera Shegalov <[email protected]>
Signed-off-by: Jason Lowe <[email protected]>
…11197) Signed-off-by: Chong Gao <[email protected]>
[auto-merge] branch-24.06 to branch-24.08 [skip ci] [bot]
To fix issue: NVIDIA#11114 To support Spark 3.3+ and 4.0+ shims, we change to build the Scala2.13 nightly dist jar with JDK17. Signed-off-by: Tim Liu <[email protected]>
…arquet IDs (NVIDIA#11202) Signed-off-by: Jason Lowe <[email protected]>
…huffleThreadedWriterBase (NVIDIA#11180) * Exclude the processing time in records.hasNext from the serialization time estimation Signed-off-by: Jihoon Son <[email protected]> * Exclude the wait time on limiter * Exclude batch size computing time as well * fix outdated comment; add more comments * Add a function that takes a TimeTrackingIterator * make stuff private --------- Signed-off-by: Jihoon Son <[email protected]>
* from_json invalid data in rapids added Signed-off-by: fejiang <[email protected]> * adding logging message when parsing invalid json Signed-off-by: fejiang <[email protected]> * remove unwanted test Signed-off-by: fejiang <[email protected]> * setting changed that one more exception catch Signed-off-by: fejiang <[email protected]> * formatting Signed-off-by: fejiang <[email protected]> * style changed Signed-off-by: fejiang <[email protected]> * Change exception catch logic Signed-off-by: fejiang <[email protected]> * adding new exception class Signed-off-by: fejiang <[email protected]> * removed logging Signed-off-by: fejiang <[email protected]> * removed logging Signed-off-by: fejiang <[email protected]> * line recoverd Signed-off-by: fejiang <[email protected]> --------- Signed-off-by: fejiang <[email protected]>
…DIA#11219) * Fix hash-aggregate tests failing in ANSI mode Fixes NVIDIA#11018. This commit fixes the tests in `hash_aggregate_test.py` to run correctly when run with ANSI enabled. This is essential for running the tests with Spark 4.0, where ANSI mode is on by default. A vast majority of the tests here happen to exercise aggregations like `SUM`, `COUNT`, `AVG`, etc. which fall to CPU, on account of NVIDIA#5114. These tests have been marked with `@disable_ansi_mode`, so that they run to completion correctly. These may be revisited after NVIDIA#5114 has been addressed. In cases where NVIDIA#5114 does not apply, the tests have been modified to run with ANSI on and off. --------- Signed-off-by: MithunR <[email protected]>
Support MapFromArrays on GPU --------- Signed-off-by: Suraj Aralihalli <[email protected]>
…ted. [databricks] (NVIDIA#11129) Fixes NVIDIA#11031. This PR addresses tests that fail on Spark 4.0 in the following files: 1. `integration_tests/src/main/python/datasourcev2_read_test.py` 2. `integration_tests/src/main/python/expand_exec_test.py` 3. `integration_tests/src/main/python/get_json_test.py` 4. `integration_tests/src/main/python/hive_delimited_text_test.py` 5. `integration_tests/src/main/python/logic_test.py` 6. `integration_tests/src/main/python/repart_test.py` 7. `integration_tests/src/main/python/time_window_test.py` 8. `integration_tests/src/main/python/json_matrix_test.py` 9. `integration_tests/src/main/python/misc_expr_test.py` 10. `integration_tests/src/main/python/orc_write_test.py` Signed-off-by: MithunR <[email protected]>
…-11212 Fix auto merge conflict 11212
…s] (NVIDIA#11220) * Avoid hit spark bug SPARK-44242 while generate run_dir Signed-off-by: Peixin Li <[email protected]> * Update integration_tests/run_pyspark_from_build.sh apply suggestion Co-authored-by: Jason Lowe <[email protected]> --------- Signed-off-by: Peixin Li <[email protected]> Co-authored-by: Jason Lowe <[email protected]>
…IDIA#11230) Signed-off-by: Jason Lowe <[email protected]>
* add cache dependencies step for scala 213 Signed-off-by: YanxuanLiu <[email protected]> * add populate script Signed-off-by: YanxuanLiu <[email protected]> * move yml Signed-off-by: YanxuanLiu <[email protected]> * fix error of script shell Signed-off-by: YanxuanLiu <[email protected]> * hardcode buildvers Signed-off-by: YanxuanLiu <[email protected]> * Update .github/workflows/mvn-verify-check.yml for extra new line Co-authored-by: Gera Shegalov <[email protected]> * Update .github/workflows/mvn-verify-check.yml for extra new line Co-authored-by: Gera Shegalov <[email protected]> * Update .github/workflows/mvn-verify-check.yml to differentiate the cache key cleart Co-authored-by: Gera Shegalov <[email protected]> * fix nit Signed-off-by: YanxuanLiu <[email protected]> --------- Signed-off-by: YanxuanLiu <[email protected]> Co-authored-by: Peixin <[email protected]> Co-authored-by: Gera Shegalov <[email protected]>
Maybe addresses NVIDIA#11225 Signed-off-by: Gera Shegalov <[email protected]>
* Remove the unused var CUDF_VER from the CI script Signed-off-by: Tim Liu <[email protected]> * Update for the review comment Signed-off-by: Tim Liu <[email protected]> --------- Signed-off-by: Tim Liu <[email protected]>
* clear the regex logic Signed-off-by: fejiang <[email protected]> * local change of substring index Signed-off-by: fejiang <[email protected]> * stringFunctions scala Signed-off-by: fejiang <[email protected]> * stringFunctions import conflict resolved Signed-off-by: fejiang <[email protected]> * doColumnar calling Signed-off-by: fejiang <[email protected]> * delimiter change to scalar type Signed-off-by: fejiang <[email protected]> * delimiter changed to scalar type Signed-off-by: fejiang <[email protected]> * changed delimiter type Signed-off-by: fejiang <[email protected]> * comment removed Signed-off-by: fejiang <[email protected]> * unwanted test case Signed-off-by: fejiang <[email protected]> * IT test added Signed-off-by: fejiang <[email protected]> * remove RapidExpressionsSuite Signed-off-by: fejiang <[email protected]> * adding evaluating logic when using GpuScalar Signed-off-by: fejiang <[email protected]> * formatting Signed-off-by: fejiang <[email protected]> * formatting Signed-off-by: fejiang <[email protected]> * remove the single delim note in gpuoverride Signed-off-by: fejiang <[email protected]> * doc generated Signed-off-by: fejiang <[email protected]> --------- Signed-off-by: fejiang <[email protected]>
) Signed-off-by: Tim Liu <[email protected]>
* Test Signed-off-by: Gera Shegalov <[email protected]> * reviews: fix typo Signed-off-by: Gera Shegalov <[email protected]> --------- Signed-off-by: Gera Shegalov <[email protected]>
Keep rapids JNI and private dependency version util the nightly CI for the branch branch-24.10 is done.. Track the dependency update by: https://gitlab-master.nvidia.com/timl/spark-rapids-private/-/issues/14 Signed-off-by: NVTIMLIU <[email protected]>
Signed-off-by: NVTIMLIU <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change version to 24.10.0
Note: merge this PR with Create a merge commit to merge