Test ONLY: if modified files #15

YanxuanLiu · 2025-01-03T03:42:13Z

No description provided.

Keep the rapids JNI and private dependency version at 24.12.0-SNAPSHOT until the nightly CI for the branch-25.02 branch is complete. Track the dependency update process at: NVIDIA#11755 Signed-off-by: nvauto <[email protected]>

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

* remove excluded release shim and TODO Signed-off-by: YanxuanLiu <[email protected]> * remove shim from 2.13 properties Signed-off-by: YanxuanLiu <[email protected]> * Fix error: 'NoneType' object has no attribute 'split' for excluded_shims Signed-off-by: timl <[email protected]> --------- Signed-off-by: YanxuanLiu <[email protected]> Signed-off-by: timl <[email protected]> Co-authored-by: timl <[email protected]>

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

…11772) To fix: https://github.com/NVIDIA/spark-rapids/issues/11755\nWait for the pre-merge CI job to SUCCEED Signed-off-by: nvauto <[email protected]>

Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>

…NVIDIA#11778) Signed-off-by: Jason Lowe <[email protected]>

NVIDIA#11788) The CI_PART1 job uploads the built Spark Rapids tar file to Databricks DBFS storage. The CI_PART2 job retrieves the built tar file from DBFS storage and runs integration tests against it. Signed-off-by: timl <[email protected]>

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

…ip ci] (NVIDIA#11791) * replace date with jni&private timestamp for cache key Signed-off-by: YanxuanLiu <[email protected]> * use date if quering timestamp failed Signed-off-by: YanxuanLiu <[email protected]> * add bash script to get timestamp Signed-off-by: YanxuanLiu <[email protected]> * replace timestamp with sha1 Signed-off-by: YanxuanLiu <[email protected]> --------- Signed-off-by: YanxuanLiu <[email protected]>

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Signed-off-by: Nghia Truong <[email protected]>

Signed-off-by: YanxuanLiu <[email protected]>

* Add the 'test_type' parameter for Databricks script For fixing: NVIDIA#11818 'nightly' is for nightly CI, 'pre-commit' is for the pre-merge CI the pre-merge CI does not need to copy the Rapids plugin built tar from the Databricks cluster back to the local host, only the nightly build needs to copy the spark-rapids-built.tgz back Signed-off-by: timl <[email protected]> * Update copyright Signed-off-by: timl <[email protected]> --------- Signed-off-by: timl <[email protected]>

…ace (NVIDIA#11813) * Support some escape characters in search list when rewriting regexp_replace to string replace Signed-off-by: Haoyang Li <[email protected]> * add a case Signed-off-by: Haoyang Li <[email protected]> * address comment Signed-off-by: Haoyang Li <[email protected]> * update datagen Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>

Signed-off-by: Nghia Truong <[email protected]>

* Fix TrafficController numTasks check Signed-off-by: Jihoon Son <[email protected]> * rename weights properly * simplify the loop condition * Rename the condition variable for readability Co-authored-by: Gera Shegalov <[email protected]> * missing renames * add test for when all tasks are big --------- Signed-off-by: Jihoon Son <[email protected]> Co-authored-by: Gera Shegalov <[email protected]>

* Add support for kudo write metrics * Refactor Signed-off-by: liurenjie1024 <[email protected]> * Address comments * Resolve comments * Fix compiler * Fix build break * Fix build break * Fix build break * Fix build break --------- Signed-off-by: liurenjie1024 <[email protected]>

…IA#11826) * Balance the pre-merge CI job's time for the ci_1 and ci_2 tests To fix: NVIDIA#11825 The pre-merge CI job is divided into CI_1 (mvn_verify) and CI_2. We run these two parts in parallel to speed up the pre-merge CI. Currently, CI_1 takes about 2 hours, while CI_2 takes approximately 4 hours. Mark some tests as CI_1 to balance the time between CI_1 and CI_2 After remarking tests, both CI_1 and CI_2 jobs should be finished in 3 hours or so. Signed-off-by: timl <[email protected]> * Separate pre-merge CI job to two parts To balance the duration, separate pre-merge CI job to two parts: premergeUT1(2 shims' UT + 1/3 of the integration tests) premergeUT2(1 shim's UT + 2/3 of the integration tests), for balancing the duration Signed-off-by: timl <[email protected]> --------- Signed-off-by: timl <[email protected]>

) Signed-off-by: Robert (Bobby) Evans <[email protected]>

Signed-off-by: Nghia Truong <[email protected]>

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Signed-off-by: Nghia Truong <[email protected]>

* Some minor improvements identified during benchmark Signed-off-by: liurenjie1024 <[email protected]> * Fix late initialization --------- Signed-off-by: liurenjie1024 <[email protected]>

Signed-off-by: Nghia Truong <[email protected]>

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

* Optimize Databricks Jenkins scripts Remove duplicate try/catch/container script blocks Move default Databricks parameters into the common Groovy library Signed-off-by: timl <[email protected]> * Fix merge conflict Fix merge conflict with https://github.com/NVIDIA/spark-rapids/pull/11819/files#diff-6c8e5cceR72 Signed-off-by: Tim Liu <[email protected]> --------- Signed-off-by: timl <[email protected]> Signed-off-by: Tim Liu <[email protected]>

* correct arg of get_buildvers.py Signed-off-by: YanxuanLiu <[email protected]> * output fail info Signed-off-by: YanxuanLiu <[email protected]> * fail the script when error occur Signed-off-by: YanxuanLiu <[email protected]> * test error Signed-off-by: YanxuanLiu <[email protected]> * test error Signed-off-by: YanxuanLiu <[email protected]> * split command to avoid masking error Signed-off-by: YanxuanLiu <[email protected]> --------- Signed-off-by: YanxuanLiu <[email protected]>

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Signed-off-by: Nghia Truong <[email protected]>

This PR is addressing some issues found during my local split-retry triage, to try to improve the stability. It includes: - replacing map with safeMap for the conversion between Table and ColumnarBatch. - reducing the GPU peak memory by closing the unnecessary batches as soon as possbile in Generate exec. - adding the retry support for Table splitting operation in Gpu write. - eliminating a potential memory leak in BroadcastNestedLoop join. The existing tests should already cover these changes. Signed-off-by: Firestarman <[email protected]>

… by CI_PART1 [databricks] (NVIDIA#11840) * Support running Databricks CI_PART2 integration tests with JARs built by CI_PART1 To fix: NVIDIA#11838 The CI_PART1 job uploads the built Spark Rapids tar file to Databricks DBFS storage. The CI_PART2 job retrieves the built tar file from DBFS storage and runs integration tests against it. Then the CI_PART2 job doesn't need to duplicate the building of Spark Rapids jars; it can save about 1 hour of Databricks time. Signed-off-by: timl <[email protected]> * Check rapids plugin built tar in Databricks Jenkinsfile Signed-off-by: Tim Liu <[email protected]> * Check if the comma-separated files exist in the Databricks DBFS path within timeout minutes Signed-off-by: Tim Liu <[email protected]> * CI_PART2 build plugin jars after the timeout Signed-off-by: Tim Liu <[email protected]> * Let CI2 to do the eventually cleanup Signed-off-by: Tim Liu <[email protected]> --------- Signed-off-by: timl <[email protected]> Signed-off-by: Tim Liu <[email protected]>

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Signed-off-by: Nghia Truong <[email protected]>

This commit documents the serialization format checks for writing Hive text, and why it differs from the read-side. `spark-rapids` supports only '^A'-separated Hive text files for read and write. This format tends to be denoted in a Hive table's Storage Properties with `serialization.format=1`. If a Hive table is written with a different/custom delimiter, it is denoted with a different value of `serialization.format`. For instance, a CSV table might be denoted by `serialization.format='', field.delim=','`. It was noticed in NVIDIA#11803 that: 1. On the [read side](https://github.com/NVIDIA/spark-rapids/blob/aa2da410511d8a737e207257769ec662a79174fe/sql-plugin/src/main/scala/org/apache/spark/sql/hive/rapids/HiveProviderImpl.scala#L155-L161), `spark-rapids` treats an empty `serialization.format` as `''`. 2. On the [write side](https://github.com/NVIDIA/spark-rapids/blob/aa2da410511d8a737e207257769ec662a79174fe/sql-plugin/src/main/scala/org/apache/spark/sql/hive/rapids/GpuHiveFileFormat.scala#L130-L136), an empty `serialization.format` is seen as `1`. The reason for the read side value is to be conservative. Since the table is pre-existing, its value should have been set already. The reason for the write side is that there are legitimate cases where a table might not have its `serialization.format` set. (CTAS, for one.) This commit documents all the scenarios that need to be considered on the write side. Signed-off-by: MithunR <[email protected]>

Signed-off-by: Nghia Truong <[email protected]>

nvauto and others added 30 commits November 25, 2024 06:15

Init version 25.02.0-SNAPSHOT

5068921

Keep the rapids JNI and private dependency version at 24.12.0-SNAPSHOT until the nightly CI for the branch-25.02 branch is complete. Track the dependency update process at: NVIDIA#11755 Signed-off-by: nvauto <[email protected]>

Merge pull request NVIDIA#11757 from NVIDIA/branch-24.12

8675a75

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request NVIDIA#11761 from NVIDIA/branch-24.12

c90361b

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request NVIDIA#11765 from NVIDIA/branch-24.12

7f904a7

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request NVIDIA#11767 from NVIDIA/branch-24.12

ef6cba8

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request NVIDIA#11769 from NVIDIA/branch-24.12

b208f9f

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request NVIDIA#11773 from NVIDIA/branch-24.12

d0d4590

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request NVIDIA#11774 from NVIDIA/branch-24.12

3564340

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request NVIDIA#11776 from NVIDIA/branch-24.12

2351dda

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request NVIDIA#11779 from NVIDIA/branch-24.12

9eb8047

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request NVIDIA#11780 from NVIDIA/branch-24.12

7a3f460

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request NVIDIA#11782 from NVIDIA/branch-24.12

63d8165

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Update rapids JNI and private dependency to 25.02.0-SNAPSHOT (NVIDIA#…

0bf85cb

…11772) To fix: https://github.com/NVIDIA/spark-rapids/issues/11755\nWait for the pre-merge CI job to SUCCEED Signed-off-by: nvauto <[email protected]>

Update advanced configs introduced by private repo (NVIDIA#11785)

5b77ed7

Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>

Remove unnecessary toBeReturned field from serialized batch iterators (…

ca466e7

…NVIDIA#11778) Signed-off-by: Jason Lowe <[email protected]>

Merge pull request NVIDIA#11795 from NVIDIA/branch-24.12

7f91d37

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request NVIDIA#11806 from NVIDIA/branch-24.12

568a440

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request NVIDIA#11808 from NVIDIA/branch-24.12

017fdef

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Implement GpuOverride

630c593

Signed-off-by: Nghia Truong <[email protected]>

Implement GpuTruncDateTime

075fe1e

Signed-off-by: Nghia Truong <[email protected]>

enable license header check & add header to files (NVIDIA#11786)

7927ae9

Signed-off-by: YanxuanLiu <[email protected]>

Do not fallback

f512b55

Signed-off-by: Nghia Truong <[email protected]>

revans2 and others added 29 commits December 6, 2024 14:48

Deal with spark changes fro colum<->expression conversions (NVIDIA#11827

d9bc056

) Signed-off-by: Robert (Bobby) Evans <[email protected]>

Add tests

e3e4c9c

Signed-off-by: Nghia Truong <[email protected]>

Fix parameter types

0b6339e

Signed-off-by: Nghia Truong <[email protected]>

Fix test

11b302c

Signed-off-by: Nghia Truong <[email protected]>

Change abstract class to trait

02a0a35

Signed-off-by: Nghia Truong <[email protected]>

Merge pull request NVIDIA#11836 from NVIDIA/branch-24.12

8ae6a68

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Fix expression order and implement TimeZoneAwareExpression

776b04f

Signed-off-by: Nghia Truong <[email protected]>

Some minor improvements identified during benchmark (NVIDIA#11829)

0fe162d

* Some minor improvements identified during benchmark Signed-off-by: liurenjie1024 <[email protected]> * Fix late initialization --------- Signed-off-by: liurenjie1024 <[email protected]>

Update generated docs

19d69be

Signed-off-by: Nghia Truong <[email protected]>

Merge branch 'branch-25.02' into trunc

e145204

Update generated docs

9e7fb91

Signed-off-by: Nghia Truong <[email protected]>

Merge pull request NVIDIA#11837 from NVIDIA/branch-24.12

cb2668f

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request NVIDIA#11843 from NVIDIA/branch-24.12

d3b31b6

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge pull request NVIDIA#11846 from NVIDIA/branch-24.12

4fbecbc

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Merge branch 'branch-25.02' into trunc

9d79c33

Add generated docs

9680416

Signed-off-by: Nghia Truong <[email protected]>

Add generated docs

35fea2b

Signed-off-by: Nghia Truong <[email protected]>

Merge pull request NVIDIA#11858 from NVIDIA/branch-24.12

e22a7ca

[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]

Enable tests (NVIDIA#11805)

b471bd7

Signed-off-by: Nghia Truong <[email protected]>

Allow non-utc timezone for timestamp tests

a5c5b7f

Signed-off-by: Nghia Truong <[email protected]>

Merge branch 'branch-25.02' into trunc

2536522

Adopt to JNI changes

1f47082

Signed-off-by: Nghia Truong <[email protected]>

Rewrite all classes

e56824e

Signed-off-by: Nghia Truong <[email protected]>

Rename variable

2201fde

Signed-off-by: Nghia Truong <[email protected]>

YanxuanLiu closed this Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test ONLY: if modified files #15

Test ONLY: if modified files #15

YanxuanLiu commented Jan 3, 2025

Test ONLY: if modified files #15

Test ONLY: if modified files #15

Conversation

YanxuanLiu commented Jan 3, 2025