Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge branch-24.12 into main #11848

Merged
merged 129 commits into from
Dec 16, 2024
Merged

Merge branch-24.12 into main #11848

merged 129 commits into from
Dec 16, 2024

Conversation

nvauto
Copy link
Collaborator

@nvauto nvauto commented Dec 10, 2024

Change version to 24.12.0

Note: merge this PR with Create a merge commit to merge

nvauto and others added 30 commits September 24, 2024 07:10
Keep the rapids JNI and private dependency version at 24.10.0-SNAPSHOT until the nightly CI for the branch-24.12 branch is complete. Track the dependency update process at: #11492

Signed-off-by: nvauto <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* Support legacy mode for yyyymmdd format
Signed-off-by: Chong Gao <[email protected]>
Co-authored-by: Chong Gao <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
#11542)

* Update rapids JNI and private dependency to 24.12.0-SNAPSHOT

To fix: https://github.com/NVIDIA/spark-rapids/issues/11492\nWait for the pre-merge CI job to SUCCEED

Signed-off-by: nvauto <[email protected]>

* Fix the missing '}'

Signed-off-by: timl <[email protected]>

---------

Signed-off-by: nvauto <[email protected]>
Signed-off-by: timl <[email protected]>
Co-authored-by: timl <[email protected]>
Fix the latest merge conflict in integration tests
* Spark 4:  Fix parquet_test.py.

Fixes #11015. (Spark 4 failure.)
Also fixes #11531. (Databricks 14.3 failure.)
Contributes to #11004.

This commit addresses the tests that fail in parquet_test.py, when
run on Spark 4.

1. Some of the tests were failing as a result of #5114.  Those tests
have been disabled, at least until we get around to supporting
aggregations with ANSI mode enabled.

2. `test_parquet_check_schema_compatibility` fails on Spark 4 regardless
of ANSI mode, because it tests implicit type promotions where the read
schema includes wider columns than the write schema.  This will require
new code.  The test is disabled until #11512 is addressed.

3. `test_parquet_int32_downcast` had an erroneous setup phase that fails
   in ANSI mode.  This has been corrected. The test was refactored to
run in ANSI and non-ANSI mode.

Signed-off-by: MithunR <[email protected]>
* implement watermark

Signed-off-by: Zach Puller <[email protected]>

* consolidate/fix disk spill metric

Signed-off-by: Zach Puller <[email protected]>

---------

Signed-off-by: Zach Puller <[email protected]>
Fix merge conflict with branch-24.10
* Spark 4:  Addressed cast_test.py failures.

Fixes #11009 and #11530.

This commit addresses the test failures in cast_test.py, on Spark 4.0.
These generally have to do with changes in behaviour of Spark when
ANSI mode is enabled.  In these cases, the tests have been split out into ANSI=on and ANSI=off.  

The bugs uncovered from the tests have been spun into their own issues;  fixing all of them was beyond the scope of this change.

Signed-off-by: MithunR <[email protected]>
* use task id as tie breaker

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* save threadlocal lookup

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

---------

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* Fix collection_ops_tests for Spark 4.0.

Fixes #11011.

This commit fixes the failures in `collection_ops_tests` on Spark 4.0.

On all versions of Spark, when a Sequence is collected with rows that exceed MAX_INT,
an exception is thrown indicating that the collected Sequence/array is
larger than permissible. The different versions of Spark vary in the
contents of the exception message.

On Spark 4, one sees that the error message now contains more
information than all prior versions, including:
1. The name of the op causing the error
2. The errant sequence size

This commit introduces a shim to make this new information available in
the exception.

Note that this shim does not fit cleanly in RapidsErrorUtils, because
there are differences within major Spark versions. For instance, Spark
3.4.0-1 have a different message as compared to 3.4.2 and 3.4.3.
Likewise, the differences in 3.5.0, 3.5.1, 3.5.2.

Signed-off-by: MithunR <[email protected]>

* Fixed formatting error.

* Review comments.

This moves the construction of the long-sequence error strings into
RapidsErrorUtils.  The process involved introducing many new RapidsErrorUtils
classes, and using mix-ins of concrete implementations for the error-string
construction.

* Added missing shim tag for 3.5.2.

* Review comments: Fixed code style.

* Reformatting, per project guideline.

* Fixed missed whitespace problem.

---------

Signed-off-by: MithunR <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* xfail regexp tests to unblock CI

Signed-off-by: Jason Lowe <[email protected]>

* Disable failing regexp unit test to unblock CI

---------

Signed-off-by: Jason Lowe <[email protected]>
@YanxuanLiu YanxuanLiu added the build Related to CI / CD or cleanly building label Dec 10, 2024
@YanxuanLiu YanxuanLiu self-assigned this Dec 10, 2024
@YanxuanLiu YanxuanLiu marked this pull request as draft December 10, 2024 09:14
Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No diffs with 24.12. SNAPSHOT changes look good to me.

NvTimLiu and others added 5 commits December 11, 2024 13:21
I've seen several cases of PRs timing out after 4 hours though we've done a re-balance for 25.02 recently
#11826

We'll make additional efforts to balance the pre-merge CI's duration.

Let's increase the timeout to 6 hours first.

We'll continue to work on balancing the pre-merge CI's duration

Signed-off-by: Tim Liu <[email protected]>
* update download page

Signed-off-by: liyuan <[email protected]>

* update download page

Signed-off-by: liyuan <[email protected]>

* update download page

Signed-off-by: liyuan <[email protected]>

* update download page

Signed-off-by: liyuan <[email protected]>

* update download page

Signed-off-by: liyuan <[email protected]>

---------

Signed-off-by: liyuan <[email protected]>
\nWait for the pre-merge CI job to SUCCEED

Signed-off-by: nvauto <[email protected]>
* Update latest changelog [skip ci]

Update change log with CLI: \n\n   scripts/generate-changelog --token=<GIT_TOKEN> --releases=24.10,24.12

Signed-off-by: nvauto <[email protected]>

* Update changelog

Signed-off-by: Tim Liu <[email protected]>

* update changelog to involve new changes.

Signed-off-by: Yanxuan Liu <[email protected]>

---------

Signed-off-by: nvauto <[email protected]>
Signed-off-by: Tim Liu <[email protected]>
Signed-off-by: Yanxuan Liu <[email protected]>
Co-authored-by: Tim Liu <[email protected]>
Co-authored-by: Yanxuan Liu <[email protected]>
@YanxuanLiu YanxuanLiu marked this pull request as ready for review December 16, 2024 02:02
@YanxuanLiu
Copy link
Collaborator

build

1 similar comment
@NvTimLiu
Copy link
Collaborator

build

NvTimLiu and others added 3 commits December 16, 2024 11:22
Skip the build of the 350db143 shim, as v24.12.0 will not contain the 350db143 shim

Moreover, the v24.12.0 private dependency jar is not released.

To fix below error:

    [ERROR] Failed to execute goal on project rapids-4-spark-sql_2.12: Could not resolve dependencies for
     project com.nvidia:rapids-4-spark-sql_2.12:jar:24.12.0: Failure to find com.nvidia:rapids-4-spark-private_2.12:jar:spark350db143:24.12.0
     in https://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced

Signed-off-by: Tim Liu <[email protected]>
Update change log with CLI: \n\n   scripts/generate-changelog --token=<GIT_TOKEN> --releases=24.10,24.12

Signed-off-by: nvauto <[email protected]>
@YanxuanLiu
Copy link
Collaborator

build

1 similar comment
@YanxuanLiu
Copy link
Collaborator

build

@YanxuanLiu YanxuanLiu merged commit 9be8079 into main Dec 16, 2024
48 of 49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.