Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge branch-24.12 into main #141

Merged
merged 119 commits into from
Dec 8, 2024
Merged

Merge branch-24.12 into main #141

merged 119 commits into from
Dec 8, 2024

Conversation

nvauto
Copy link
Collaborator

@nvauto nvauto commented Dec 8, 2024

Change version to 24.12.0

Note: merge this PR with Create a merge commit to merge

nvauto and others added 30 commits September 24, 2024 07:10
Keep the rapids JNI and private dependency version at 24.10.0-SNAPSHOT until the nightly CI for the branch-24.12 branch is complete. Track the dependency update process at: NVIDIA#11492

Signed-off-by: nvauto <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* Support legacy mode for yyyymmdd format
Signed-off-by: Chong Gao <[email protected]>
Co-authored-by: Chong Gao <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
NVIDIA#11542)

* Update rapids JNI and private dependency to 24.12.0-SNAPSHOT

To fix: https://github.com/NVIDIA/spark-rapids/issues/11492\nWait for the pre-merge CI job to SUCCEED

Signed-off-by: nvauto <[email protected]>

* Fix the missing '}'

Signed-off-by: timl <[email protected]>

---------

Signed-off-by: nvauto <[email protected]>
Signed-off-by: timl <[email protected]>
Co-authored-by: timl <[email protected]>
Fix the latest merge conflict in integration tests
* Spark 4:  Fix parquet_test.py.

Fixes NVIDIA#11015. (Spark 4 failure.)
Also fixes NVIDIA#11531. (Databricks 14.3 failure.)
Contributes to NVIDIA#11004.

This commit addresses the tests that fail in parquet_test.py, when
run on Spark 4.

1. Some of the tests were failing as a result of NVIDIA#5114.  Those tests
have been disabled, at least until we get around to supporting
aggregations with ANSI mode enabled.

2. `test_parquet_check_schema_compatibility` fails on Spark 4 regardless
of ANSI mode, because it tests implicit type promotions where the read
schema includes wider columns than the write schema.  This will require
new code.  The test is disabled until NVIDIA#11512 is addressed.

3. `test_parquet_int32_downcast` had an erroneous setup phase that fails
   in ANSI mode.  This has been corrected. The test was refactored to
run in ANSI and non-ANSI mode.

Signed-off-by: MithunR <[email protected]>
* implement watermark

Signed-off-by: Zach Puller <[email protected]>

* consolidate/fix disk spill metric

Signed-off-by: Zach Puller <[email protected]>

---------

Signed-off-by: Zach Puller <[email protected]>
Fix merge conflict with branch-24.10
…A#11559)

* Spark 4:  Addressed cast_test.py failures.

Fixes NVIDIA#11009 and NVIDIA#11530.

This commit addresses the test failures in cast_test.py, on Spark 4.0.
These generally have to do with changes in behaviour of Spark when
ANSI mode is enabled.  In these cases, the tests have been split out into ANSI=on and ANSI=off.  

The bugs uncovered from the tests have been spun into their own issues;  fixing all of them was beyond the scope of this change.

Signed-off-by: MithunR <[email protected]>
* use task id as tie breaker

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* save threadlocal lookup

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

---------

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* Fix collection_ops_tests for Spark 4.0.

Fixes NVIDIA#11011.

This commit fixes the failures in `collection_ops_tests` on Spark 4.0.

On all versions of Spark, when a Sequence is collected with rows that exceed MAX_INT,
an exception is thrown indicating that the collected Sequence/array is
larger than permissible. The different versions of Spark vary in the
contents of the exception message.

On Spark 4, one sees that the error message now contains more
information than all prior versions, including:
1. The name of the op causing the error
2. The errant sequence size

This commit introduces a shim to make this new information available in
the exception.

Note that this shim does not fit cleanly in RapidsErrorUtils, because
there are differences within major Spark versions. For instance, Spark
3.4.0-1 have a different message as compared to 3.4.2 and 3.4.3.
Likewise, the differences in 3.5.0, 3.5.1, 3.5.2.

Signed-off-by: MithunR <[email protected]>

* Fixed formatting error.

* Review comments.

This moves the construction of the long-sequence error strings into
RapidsErrorUtils.  The process involved introducing many new RapidsErrorUtils
classes, and using mix-ins of concrete implementations for the error-string
construction.

* Added missing shim tag for 3.5.2.

* Review comments: Fixed code style.

* Reformatting, per project guideline.

* Fixed missed whitespace problem.

---------

Signed-off-by: MithunR <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
…-11604

Fix auto merge conflict 11604 [skip ci]
* xfail regexp tests to unblock CI

Signed-off-by: Jason Lowe <[email protected]>

* Disable failing regexp unit test to unblock CI

---------

Signed-off-by: Jason Lowe <[email protected]>
jlowe and others added 26 commits November 20, 2024 22:19
…NVIDIA#11739)

* Update to Spark 4.0 changing signature of SupportsV1Write.writeWithV1

Signed-off-by: Jason Lowe <[email protected]>

* Update Spark 4 tools support files

---------

Signed-off-by: Jason Lowe <[email protected]>
…VIDIA#11744)

* Do not package the Databricks 14.3 shim into the dist jar

The 350db143 shim will be packaged into the dist jar in branch-25.02

Signed-off-by: timl <[email protected]>

* Add a follow issue

Signed-off-by: timl <[email protected]>

---------

Signed-off-by: timl <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Robert (Bobby) Evans <[email protected]>
Co-authored-by: Nghia Truong <[email protected]>
Co-authored-by: Nghia Truong <[email protected]>
Files without a Copyright header found with

```
git grep  -L 'Copyright (c) .*NVIDIA' '*.scala' '*.java' | grep
-v com/nvidia/spark/rapids/format
```

Signed-off-by: Gera Shegalov <[email protected]>
* batch limit removed

Signed-off-by: Zach Puller <[email protected]>

---------

Signed-off-by: Zach Puller <[email protected]>
* host watermark metric

Signed-off-by: Zach Puller <[email protected]>

* make disk and host trackers global

Signed-off-by: Zach Puller <[email protected]>

---------

Signed-off-by: Zach Puller <[email protected]>
…ucts` (NVIDIA#11618)

* Migrate `castJsonStringToBool` to `JSONUtils.castStringsToBooleans`

Signed-off-by: Nghia Truong <[email protected]>

* Migrate undoKeepQuotes` to use `JSONUtils.removeQuote`

Signed-off-by: Nghia Truong <[email protected]>

* Migrate `fixupQuotedStrings` to `JSONUtils.removeQuotes`

Signed-off-by: Nghia Truong <[email protected]>

* Use `castStringsToDecimals`

Signed-off-by: Nghia Truong <[email protected]>

* Use `removeQuotesForFloats` for implementing `castStringToFloat`

Signed-off-by: Nghia Truong <[email protected]>

* Use `JSONUtils.castStringsToIntegers`

Signed-off-by: Nghia Truong <[email protected]>

* Throw if not supported type

Signed-off-by: Nghia Truong <[email protected]>

* Use `JSONUtils.castStringsToDates` for non-legacy conversion

Signed-off-by: Nghia Truong <[email protected]>

* Revert "Use `JSONUtils.castStringsToDates` for non-legacy conversion"

This reverts commit b3dcffc.

* Use `JSONUtils.castStringsToFloats`

Signed-off-by: Nghia Truong <[email protected]>

* Fix  compile error

Signed-off-by: Nghia Truong <[email protected]>

* Adopting `fromJSONToStructs`

Signed-off-by: Nghia Truong <[email protected]>

* Fix style

Signed-off-by: Nghia Truong <[email protected]>

* Adopt `JSONUtils.convertDataType`

Signed-off-by: Nghia Truong <[email protected]>

* Cleanup

Signed-off-by: Nghia Truong <[email protected]>

* Fix import

Signed-off-by: Nghia Truong <[email protected]>

* Revert unrelated change

Signed-off-by: Nghia Truong <[email protected]>

* Remove empty lines

Signed-off-by: Nghia Truong <[email protected]>

* Change function name

Signed-off-by: Nghia Truong <[email protected]>

* Add more data to test

Signed-off-by: Nghia Truong <[email protected]>

* Fix test pattern

Signed-off-by: Nghia Truong <[email protected]>

* Add test

Signed-off-by: Nghia Truong <[email protected]>

---------

Signed-off-by: Nghia Truong <[email protected]>
…VIDIA#11733)

closes NVIDIA#11732

This PR adds the support to print out the current attempt object being processed
when OOM happens in the retry block.
This is designed for the better OOM issues triage.
---------

Signed-off-by: Firestarman <[email protected]>
Signed-off-by: Robert (Bobby) Evans <[email protected]>
Co-authored-by: Nghia Truong <[email protected]>
* Fix aqe_test failures on [databricks] 14.3.

Fixes NVIDIA#11643.

This commit fixes the AQE/DPP tests that were reported in NVIDIA#11643 to
be failing on Databricks 14.3.

This is the result of a deficient shim for GpuSubqueryBroadcastMeta
being active for Databricks 14.3.  The deficient shim errantly
extended the non-Databricks base shim.

This commit moves the commonality in Databricks shims to a common
base class that is then customized for the changes in Databricks 14.3.

Signed-off-by: MithunR <[email protected]>
* Support async writing for query output

Signed-off-by: Jihoon Son <[email protected]>

* doc change

* use a long timeout

* fix test failure due to a race

* fix flaky test

* address comments

* fix the config name for hold gpu

* Update sql-plugin/src/main/scala/com/nvidia/spark/rapids/io/async/AsyncOutputStream.scala

Simplify case arm

Co-authored-by: Gera Shegalov <[email protected]>

* address comments

* missing doc change

* use trampoline

---------

Signed-off-by: Jihoon Son <[email protected]>
Co-authored-by: Gera Shegalov <[email protected]>
NVIDIA#11771)

* Fix query hang when using kudo and multi thread shuffle manager

Signed-off-by: liurenjie1024 <[email protected]>

* Fix NPE

---------

Signed-off-by: liurenjie1024 <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Firestarman <[email protected]>
Co-authored-by: Firestarman <[email protected]>
* Append knoguchi to blossom-ci whitelist [skip ci]

* Fixing the typo in username.

Signed-off-by: Koji Noguchi <[email protected]>

---------

Signed-off-by: Koji Noguchi <[email protected]>
…ks] (NVIDIA#11752)

* Ability to decompress Parquet data on CPU

Signed-off-by: Jason Lowe <[email protected]>

* Add tests

* Refactor to reduce duplicated code

* scala2.13 fix

* Address review comments

* Fix Databricks build

* Update scala2.13 poms

---------

Signed-off-by: Jason Lowe <[email protected]>
Fixes NVIDIA#11536.

This commit fixes the tests in `dpp_test.py` that were failing on
Databricks 14.3.

The failures were largely a result of an erroneous shim implementation,
that was fixed as part of NVIDIA#11750.

This commit accounts for the remaining failures that result from there
being a `CollectLimitExec` in certain DPP query plans (that include
broadcast joins, for example).  The tests have been made more
permissive, in allowing the `CollectLimitExec` to run on the CPU.

The `CollectLimitExec` based plans will be further explored as part of
NVIDIA#11764.

Signed-off-by: MithunR <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Update change log with CLI: \n\n   scripts/generate-changelog --token=<GIT_TOKEN> --releases=24.10,24.12

Signed-off-by: nvauto <[email protected]>
@nvauto nvauto requested a review from NvTimLiu as a code owner December 8, 2024 09:07
@nvauto
Copy link
Collaborator Author

nvauto commented Dec 8, 2024

[skip ci] as branch-24.12 already PASS the build

@nvauto nvauto merged commit de7ca19 into main Dec 8, 2024
@nvauto nvauto deleted the merge-branch-24.12-to-main branch December 8, 2024 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.