forked from NVIDIA/spark-rapids
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge branch-24.12 into main #141
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Keep the rapids JNI and private dependency version at 24.10.0-SNAPSHOT until the nightly CI for the branch-24.12 branch is complete. Track the dependency update process at: NVIDIA#11492 Signed-off-by: nvauto <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* Support legacy mode for yyyymmdd format Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
NVIDIA#11542) * Update rapids JNI and private dependency to 24.12.0-SNAPSHOT To fix: https://github.com/NVIDIA/spark-rapids/issues/11492\nWait for the pre-merge CI job to SUCCEED Signed-off-by: nvauto <[email protected]> * Fix the missing '}' Signed-off-by: timl <[email protected]> --------- Signed-off-by: nvauto <[email protected]> Signed-off-by: timl <[email protected]> Co-authored-by: timl <[email protected]>
Signed-off-by: Robert (Bobby) Evans <[email protected]>
Fix the latest merge conflict in integration tests
* Spark 4: Fix parquet_test.py. Fixes NVIDIA#11015. (Spark 4 failure.) Also fixes NVIDIA#11531. (Databricks 14.3 failure.) Contributes to NVIDIA#11004. This commit addresses the tests that fail in parquet_test.py, when run on Spark 4. 1. Some of the tests were failing as a result of NVIDIA#5114. Those tests have been disabled, at least until we get around to supporting aggregations with ANSI mode enabled. 2. `test_parquet_check_schema_compatibility` fails on Spark 4 regardless of ANSI mode, because it tests implicit type promotions where the read schema includes wider columns than the write schema. This will require new code. The test is disabled until NVIDIA#11512 is addressed. 3. `test_parquet_int32_downcast` had an erroneous setup phase that fails in ANSI mode. This has been corrected. The test was refactored to run in ANSI and non-ANSI mode. Signed-off-by: MithunR <[email protected]>
…CI (NVIDIA#11544) Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>
…IA#11561) Signed-off-by: Robert (Bobby) Evans <[email protected]>
* implement watermark Signed-off-by: Zach Puller <[email protected]> * consolidate/fix disk spill metric Signed-off-by: Zach Puller <[email protected]> --------- Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Jason Lowe <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
Fix merge conflict with branch-24.10
…A#11559) * Spark 4: Addressed cast_test.py failures. Fixes NVIDIA#11009 and NVIDIA#11530. This commit addresses the test failures in cast_test.py, on Spark 4.0. These generally have to do with changes in behaviour of Spark when ANSI mode is enabled. In these cases, the tests have been split out into ANSI=on and ANSI=off. The bugs uncovered from the tests have been spun into their own issues; fixing all of them was beyond the scope of this change. Signed-off-by: MithunR <[email protected]>
* use task id as tie breaker Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * save threadlocal lookup Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* Fix collection_ops_tests for Spark 4.0. Fixes NVIDIA#11011. This commit fixes the failures in `collection_ops_tests` on Spark 4.0. On all versions of Spark, when a Sequence is collected with rows that exceed MAX_INT, an exception is thrown indicating that the collected Sequence/array is larger than permissible. The different versions of Spark vary in the contents of the exception message. On Spark 4, one sees that the error message now contains more information than all prior versions, including: 1. The name of the op causing the error 2. The errant sequence size This commit introduces a shim to make this new information available in the exception. Note that this shim does not fit cleanly in RapidsErrorUtils, because there are differences within major Spark versions. For instance, Spark 3.4.0-1 have a different message as compared to 3.4.2 and 3.4.3. Likewise, the differences in 3.5.0, 3.5.1, 3.5.2. Signed-off-by: MithunR <[email protected]> * Fixed formatting error. * Review comments. This moves the construction of the long-sequence error strings into RapidsErrorUtils. The process involved introducing many new RapidsErrorUtils classes, and using mix-ins of concrete implementations for the error-string construction. * Added missing shim tag for 3.5.2. * Review comments: Fixed code style. * Reformatting, per project guideline. * Fixed missed whitespace problem. --------- Signed-off-by: MithunR <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
…-11604 Fix auto merge conflict 11604 [skip ci]
* xfail regexp tests to unblock CI Signed-off-by: Jason Lowe <[email protected]> * Disable failing regexp unit test to unblock CI --------- Signed-off-by: Jason Lowe <[email protected]>
…NVIDIA#11739) * Update to Spark 4.0 changing signature of SupportsV1Write.writeWithV1 Signed-off-by: Jason Lowe <[email protected]> * Update Spark 4 tools support files --------- Signed-off-by: Jason Lowe <[email protected]>
…VIDIA#11744) * Do not package the Databricks 14.3 shim into the dist jar The 350db143 shim will be packaged into the dist jar in branch-25.02 Signed-off-by: timl <[email protected]> * Add a follow issue Signed-off-by: timl <[email protected]> --------- Signed-off-by: timl <[email protected]>
Signed-off-by: Nghia Truong <[email protected]> Signed-off-by: Robert (Bobby) Evans <[email protected]> Co-authored-by: Nghia Truong <[email protected]> Co-authored-by: Nghia Truong <[email protected]>
Files without a Copyright header found with ``` git grep -L 'Copyright (c) .*NVIDIA' '*.scala' '*.java' | grep -v com/nvidia/spark/rapids/format ``` Signed-off-by: Gera Shegalov <[email protected]>
* batch limit removed Signed-off-by: Zach Puller <[email protected]> --------- Signed-off-by: Zach Puller <[email protected]>
* host watermark metric Signed-off-by: Zach Puller <[email protected]> * make disk and host trackers global Signed-off-by: Zach Puller <[email protected]> --------- Signed-off-by: Zach Puller <[email protected]>
…ucts` (NVIDIA#11618) * Migrate `castJsonStringToBool` to `JSONUtils.castStringsToBooleans` Signed-off-by: Nghia Truong <[email protected]> * Migrate undoKeepQuotes` to use `JSONUtils.removeQuote` Signed-off-by: Nghia Truong <[email protected]> * Migrate `fixupQuotedStrings` to `JSONUtils.removeQuotes` Signed-off-by: Nghia Truong <[email protected]> * Use `castStringsToDecimals` Signed-off-by: Nghia Truong <[email protected]> * Use `removeQuotesForFloats` for implementing `castStringToFloat` Signed-off-by: Nghia Truong <[email protected]> * Use `JSONUtils.castStringsToIntegers` Signed-off-by: Nghia Truong <[email protected]> * Throw if not supported type Signed-off-by: Nghia Truong <[email protected]> * Use `JSONUtils.castStringsToDates` for non-legacy conversion Signed-off-by: Nghia Truong <[email protected]> * Revert "Use `JSONUtils.castStringsToDates` for non-legacy conversion" This reverts commit b3dcffc. * Use `JSONUtils.castStringsToFloats` Signed-off-by: Nghia Truong <[email protected]> * Fix compile error Signed-off-by: Nghia Truong <[email protected]> * Adopting `fromJSONToStructs` Signed-off-by: Nghia Truong <[email protected]> * Fix style Signed-off-by: Nghia Truong <[email protected]> * Adopt `JSONUtils.convertDataType` Signed-off-by: Nghia Truong <[email protected]> * Cleanup Signed-off-by: Nghia Truong <[email protected]> * Fix import Signed-off-by: Nghia Truong <[email protected]> * Revert unrelated change Signed-off-by: Nghia Truong <[email protected]> * Remove empty lines Signed-off-by: Nghia Truong <[email protected]> * Change function name Signed-off-by: Nghia Truong <[email protected]> * Add more data to test Signed-off-by: Nghia Truong <[email protected]> * Fix test pattern Signed-off-by: Nghia Truong <[email protected]> * Add test Signed-off-by: Nghia Truong <[email protected]> --------- Signed-off-by: Nghia Truong <[email protected]>
…VIDIA#11733) closes NVIDIA#11732 This PR adds the support to print out the current attempt object being processed when OOM happens in the retry block. This is designed for the better OOM issues triage. --------- Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Robert (Bobby) Evans <[email protected]> Co-authored-by: Nghia Truong <[email protected]>
* Fix aqe_test failures on [databricks] 14.3. Fixes NVIDIA#11643. This commit fixes the AQE/DPP tests that were reported in NVIDIA#11643 to be failing on Databricks 14.3. This is the result of a deficient shim for GpuSubqueryBroadcastMeta being active for Databricks 14.3. The deficient shim errantly extended the non-Databricks base shim. This commit moves the commonality in Databricks shims to a common base class that is then customized for the changes in Databricks 14.3. Signed-off-by: MithunR <[email protected]>
* Support async writing for query output Signed-off-by: Jihoon Son <[email protected]> * doc change * use a long timeout * fix test failure due to a race * fix flaky test * address comments * fix the config name for hold gpu * Update sql-plugin/src/main/scala/com/nvidia/spark/rapids/io/async/AsyncOutputStream.scala Simplify case arm Co-authored-by: Gera Shegalov <[email protected]> * address comments * missing doc change * use trampoline --------- Signed-off-by: Jihoon Son <[email protected]> Co-authored-by: Gera Shegalov <[email protected]>
Signed-off-by: Jason Lowe <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
NVIDIA#11771) * Fix query hang when using kudo and multi thread shuffle manager Signed-off-by: liurenjie1024 <[email protected]> * Fix NPE --------- Signed-off-by: liurenjie1024 <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]> Signed-off-by: Firestarman <[email protected]> Co-authored-by: Firestarman <[email protected]>
* Append knoguchi to blossom-ci whitelist [skip ci] * Fixing the typo in username. Signed-off-by: Koji Noguchi <[email protected]> --------- Signed-off-by: Koji Noguchi <[email protected]>
…ks] (NVIDIA#11752) * Ability to decompress Parquet data on CPU Signed-off-by: Jason Lowe <[email protected]> * Add tests * Refactor to reduce duplicated code * scala2.13 fix * Address review comments * Fix Databricks build * Update scala2.13 poms --------- Signed-off-by: Jason Lowe <[email protected]>
Fixes NVIDIA#11536. This commit fixes the tests in `dpp_test.py` that were failing on Databricks 14.3. The failures were largely a result of an erroneous shim implementation, that was fixed as part of NVIDIA#11750. This commit accounts for the remaining failures that result from there being a `CollectLimitExec` in certain DPP query plans (that include broadcast joins, for example). The tests have been made more permissive, in allowing the `CollectLimitExec` to run on the CPU. The `CollectLimitExec` based plans will be further explored as part of NVIDIA#11764. Signed-off-by: MithunR <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Robert (Bobby) Evans <[email protected]>
…1794) Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Update change log with CLI: \n\n scripts/generate-changelog --token=<GIT_TOKEN> --releases=24.10,24.12 Signed-off-by: nvauto <[email protected]>
Signed-off-by: nvauto <[email protected]>
[skip ci] as branch-24.12 already PASS the build |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change version to 24.12.0
Note: merge this PR with Create a merge commit to merge