forked from NVIDIA/spark-rapids
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test ONLY: if modified files #15
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Keep the rapids JNI and private dependency version at 24.12.0-SNAPSHOT until the nightly CI for the branch-25.02 branch is complete. Track the dependency update process at: NVIDIA#11755 Signed-off-by: nvauto <[email protected]>
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
* remove excluded release shim and TODO Signed-off-by: YanxuanLiu <[email protected]> * remove shim from 2.13 properties Signed-off-by: YanxuanLiu <[email protected]> * Fix error: 'NoneType' object has no attribute 'split' for excluded_shims Signed-off-by: timl <[email protected]> --------- Signed-off-by: YanxuanLiu <[email protected]> Signed-off-by: timl <[email protected]> Co-authored-by: timl <[email protected]>
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
…11772) To fix: https://github.com/NVIDIA/spark-rapids/issues/11755\nWait for the pre-merge CI job to SUCCEED Signed-off-by: nvauto <[email protected]>
Signed-off-by: Chong Gao <[email protected]> Co-authored-by: Chong Gao <[email protected]>
…NVIDIA#11778) Signed-off-by: Jason Lowe <[email protected]>
NVIDIA#11788) The CI_PART1 job uploads the built Spark Rapids tar file to Databricks DBFS storage. The CI_PART2 job retrieves the built tar file from DBFS storage and runs integration tests against it. Signed-off-by: timl <[email protected]>
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
…ip ci] (NVIDIA#11791) * replace date with jni&private timestamp for cache key Signed-off-by: YanxuanLiu <[email protected]> * use date if quering timestamp failed Signed-off-by: YanxuanLiu <[email protected]> * add bash script to get timestamp Signed-off-by: YanxuanLiu <[email protected]> * replace timestamp with sha1 Signed-off-by: YanxuanLiu <[email protected]> --------- Signed-off-by: YanxuanLiu <[email protected]>
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: YanxuanLiu <[email protected]>
* Add the 'test_type' parameter for Databricks script For fixing: NVIDIA#11818 'nightly' is for nightly CI, 'pre-commit' is for the pre-merge CI the pre-merge CI does not need to copy the Rapids plugin built tar from the Databricks cluster back to the local host, only the nightly build needs to copy the spark-rapids-built.tgz back Signed-off-by: timl <[email protected]> * Update copyright Signed-off-by: timl <[email protected]> --------- Signed-off-by: timl <[email protected]>
…ace (NVIDIA#11813) * Support some escape characters in search list when rewriting regexp_replace to string replace Signed-off-by: Haoyang Li <[email protected]> * add a case Signed-off-by: Haoyang Li <[email protected]> * address comment Signed-off-by: Haoyang Li <[email protected]> * update datagen Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
* Fix TrafficController numTasks check Signed-off-by: Jihoon Son <[email protected]> * rename weights properly * simplify the loop condition * Rename the condition variable for readability Co-authored-by: Gera Shegalov <[email protected]> * missing renames * add test for when all tasks are big --------- Signed-off-by: Jihoon Son <[email protected]> Co-authored-by: Gera Shegalov <[email protected]>
* Add support for kudo write metrics * Refactor Signed-off-by: liurenjie1024 <[email protected]> * Address comments * Resolve comments * Fix compiler * Fix build break * Fix build break * Fix build break * Fix build break --------- Signed-off-by: liurenjie1024 <[email protected]>
…IA#11826) * Balance the pre-merge CI job's time for the ci_1 and ci_2 tests To fix: NVIDIA#11825 The pre-merge CI job is divided into CI_1 (mvn_verify) and CI_2. We run these two parts in parallel to speed up the pre-merge CI. Currently, CI_1 takes about 2 hours, while CI_2 takes approximately 4 hours. Mark some tests as CI_1 to balance the time between CI_1 and CI_2 After remarking tests, both CI_1 and CI_2 jobs should be finished in 3 hours or so. Signed-off-by: timl <[email protected]> * Separate pre-merge CI job to two parts To balance the duration, separate pre-merge CI job to two parts: premergeUT1(2 shims' UT + 1/3 of the integration tests) premergeUT2(1 shim's UT + 2/3 of the integration tests), for balancing the duration Signed-off-by: timl <[email protected]> --------- Signed-off-by: timl <[email protected]>
) Signed-off-by: Robert (Bobby) Evans <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
Signed-off-by: Nghia Truong <[email protected]>
* Some minor improvements identified during benchmark Signed-off-by: liurenjie1024 <[email protected]> * Fix late initialization --------- Signed-off-by: liurenjie1024 <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
* Optimize Databricks Jenkins scripts Remove duplicate try/catch/container script blocks Move default Databricks parameters into the common Groovy library Signed-off-by: timl <[email protected]> * Fix merge conflict Fix merge conflict with https://github.com/NVIDIA/spark-rapids/pull/11819/files#diff-6c8e5cceR72 Signed-off-by: Tim Liu <[email protected]> --------- Signed-off-by: timl <[email protected]> Signed-off-by: Tim Liu <[email protected]>
* correct arg of get_buildvers.py Signed-off-by: YanxuanLiu <[email protected]> * output fail info Signed-off-by: YanxuanLiu <[email protected]> * fail the script when error occur Signed-off-by: YanxuanLiu <[email protected]> * test error Signed-off-by: YanxuanLiu <[email protected]> * test error Signed-off-by: YanxuanLiu <[email protected]> * split command to avoid masking error Signed-off-by: YanxuanLiu <[email protected]> --------- Signed-off-by: YanxuanLiu <[email protected]>
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
This PR is addressing some issues found during my local split-retry triage, to try to improve the stability. It includes: - replacing map with safeMap for the conversion between Table and ColumnarBatch. - reducing the GPU peak memory by closing the unnecessary batches as soon as possbile in Generate exec. - adding the retry support for Table splitting operation in Gpu write. - eliminating a potential memory leak in BroadcastNestedLoop join. The existing tests should already cover these changes. Signed-off-by: Firestarman <[email protected]>
… by CI_PART1 [databricks] (NVIDIA#11840) * Support running Databricks CI_PART2 integration tests with JARs built by CI_PART1 To fix: NVIDIA#11838 The CI_PART1 job uploads the built Spark Rapids tar file to Databricks DBFS storage. The CI_PART2 job retrieves the built tar file from DBFS storage and runs integration tests against it. Then the CI_PART2 job doesn't need to duplicate the building of Spark Rapids jars; it can save about 1 hour of Databricks time. Signed-off-by: timl <[email protected]> * Check rapids plugin built tar in Databricks Jenkinsfile Signed-off-by: Tim Liu <[email protected]> * Check if the comma-separated files exist in the Databricks DBFS path within timeout minutes Signed-off-by: Tim Liu <[email protected]> * CI_PART2 build plugin jars after the timeout Signed-off-by: Tim Liu <[email protected]> * Let CI2 to do the eventually cleanup Signed-off-by: Tim Liu <[email protected]> --------- Signed-off-by: timl <[email protected]> Signed-off-by: Tim Liu <[email protected]>
[auto-merge] branch-24.12 to branch-25.02 [skip ci] [bot]
Signed-off-by: Nghia Truong <[email protected]>
This commit documents the serialization format checks for writing Hive text, and why it differs from the read-side. `spark-rapids` supports only '^A'-separated Hive text files for read and write. This format tends to be denoted in a Hive table's Storage Properties with `serialization.format=1`. If a Hive table is written with a different/custom delimiter, it is denoted with a different value of `serialization.format`. For instance, a CSV table might be denoted by `serialization.format='', field.delim=','`. It was noticed in NVIDIA#11803 that: 1. On the [read side](https://github.com/NVIDIA/spark-rapids/blob/aa2da410511d8a737e207257769ec662a79174fe/sql-plugin/src/main/scala/org/apache/spark/sql/hive/rapids/HiveProviderImpl.scala#L155-L161), `spark-rapids` treats an empty `serialization.format` as `''`. 2. On the [write side](https://github.com/NVIDIA/spark-rapids/blob/aa2da410511d8a737e207257769ec662a79174fe/sql-plugin/src/main/scala/org/apache/spark/sql/hive/rapids/GpuHiveFileFormat.scala#L130-L136), an empty `serialization.format` is seen as `1`. The reason for the read side value is to be conservative. Since the table is pre-existing, its value should have been set already. The reason for the write side is that there are legitimate cases where a table might not have its `serialization.format` set. (CTAS, for one.) This commit documents all the scenarios that need to be considered on the write side. Signed-off-by: MithunR <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.