Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge branch-24.06 into main #110

Closed
wants to merge 82 commits into from
Closed

Merge branch-24.06 into main #110

wants to merge 82 commits into from

Conversation

nvauto
Copy link
Collaborator

@nvauto nvauto commented Jun 3, 2024

Change version to 24.06.0

Note: merge this PR with Create a merge commit to merge

NvTimLiu and others added 30 commits March 22, 2024 17:13
Keep deps (JNI + private) dependencies as 24.04-SNAPSHOT util they're available next week.

Added TODO (NVIDIA#10256) to remind us to bump up deps version to 24.06.0-SNAPSHOT.

Signed-off-by: Tim Liu <[email protected]>
[auto-merge] branch-24.04 to branch-24.06 [skip ci] [bot]
[auto-merge] branch-24.04 to branch-24.06 [skip ci] [bot]
[auto-merge] branch-24.04 to branch-24.06 [skip ci] [bot]
[auto-merge] branch-24.04 to branch-24.06 [skip ci] [bot]
[auto-merge] branch-24.04 to branch-24.06 [skip ci] [bot]
[auto-merge] branch-24.04 to branch-24.06 [skip ci] [bot]
[auto-merge] branch-24.04 to branch-24.06 [skip ci] [bot]
[auto-merge] branch-24.04 to branch-24.06 [skip ci] [bot]
[auto-merge] branch-24.04 to branch-24.06 [skip ci] [bot]
Signed-off-by: liurenjie1024 <[email protected]>
Fix merge conflict with branch-24.04 [skip ci]
[auto-merge] branch-24.04 to branch-24.06 [skip ci] [bot]
[auto-merge] branch-24.04 to branch-24.06 [skip ci] [bot]
[auto-merge] branch-24.04 to branch-24.06 [skip ci] [bot]
…-10704

Fix auto merge conflict 10704 [skip ci]
Signed-off-by: Chong Gao <[email protected]>
Signed-off-by: Robert (Bobby) Evans <[email protected]>
Co-authored-by: Chong Gao <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
* Removing some authorizations for departed users

Signed-off-by: Mike Wilson <[email protected]>
Co-authored-by: Sameer Raheja <[email protected]>
* Upgrade to jucx 1.16.0

Signed-off-by: Alessandro Bellina <[email protected]>

---------

Signed-off-by: Alessandro Bellina <[email protected]>
…10718)

* Refactor Parquet reader

Signed-off-by: Nghia Truong <[email protected]>

* Update config

* Add back the deprecated config

Signed-off-by: Nghia Truong <[email protected]>

* Fix config

Signed-off-by: Nghia Truong <[email protected]>

* Change message for the deprecated config

Signed-off-by: Nghia Truong <[email protected]>

* Rename variable

Signed-off-by: Nghia Truong <[email protected]>

* Change the logic of reading conf

Signed-off-by: Nghia Truong <[email protected]>

* Add example and mark conf as `internal()`

Signed-off-by: Nghia Truong <[email protected]>

* Reformat code

Signed-off-by: Nghia Truong <[email protected]>

* Update docs

Signed-off-by: Nghia Truong <[email protected]>

* Change configs

Signed-off-by: Nghia Truong <[email protected]>

* Update docs

Signed-off-by: Nghia Truong <[email protected]>

* Change variables into functions

Signed-off-by: Nghia Truong <[email protected]>

* Change functions back into `lazy val`

Signed-off-by: Nghia Truong <[email protected]>

---------

Signed-off-by: Nghia Truong <[email protected]>
…debugging UT in IDEA (NVIDIA#10733)

* wip for test

Signed-off-by: Haoyang Li <[email protected]>

* update comment

Signed-off-by: Haoyang Li <[email protected]>

---------

Signed-off-by: Haoyang Li <[email protected]>
* generate shims

* Generate Scala 2.13 poms

Signed-off-by: Raza Jafri <[email protected]>

* undo bad change to the supportedExprs.csv

* Fixed copyrights and removed snapshot

* Update copyrights on SparkShimsSuite

---------

Signed-off-by: Raza Jafri <[email protected]>
* set fixed seed for some random failed tests

Signed-off-by: Haoyang Li <[email protected]>

* add import

Signed-off-by: Haoyang Li <[email protected]>

---------

Signed-off-by: Haoyang Li <[email protected]>
ttnghia and others added 26 commits May 16, 2024 00:09
* Refactor Parquet reader

Signed-off-by: Nghia Truong <[email protected]>

* WIP for ORC chunked reader

* Update config

* Add back the deprecated config

Signed-off-by: Nghia Truong <[email protected]>

* Fix config

Signed-off-by: Nghia Truong <[email protected]>

* Change message for the deprecated config

Signed-off-by: Nghia Truong <[email protected]>

* Add OrcChunkedReader

Signed-off-by: Nghia Truong <[email protected]>

* Cleanup

Signed-off-by: Nghia Truong <[email protected]>

* Fix `MultiFileCloudOrcPartitionReader`

Signed-off-by: Nghia Truong <[email protected]>

* Fix `readBufferToTablesAndClose`

Signed-off-by: Nghia Truong <[email protected]>

* Fix comment

Signed-off-by: Nghia Truong <[email protected]>

* Add chunked reader to tests

Signed-off-by: Nghia Truong <[email protected]>

* Fix table schema

Signed-off-by: Nghia Truong <[email protected]>

---------

Signed-off-by: Nghia Truong <[email protected]>
* Fix NPE in GpuParseUrl for null keys.

Fixes NVIDIA#10810.

This commit fixes an NPE that occurred when `ParseUrl` is called to
extract a specific key from the `QUERY` portion of the URL, and the
specified key is `null`.

The NPE would manifest as follows:
```
24/05/13 14:28:35.379 Executor task launch worker for task 1.0 in stage 746.0 (TID 1493) ERROR Executor: Exception in task 1.0 in stage 746.0 (TID 1493)
java.lang.NullPointerException: null
	at org.apache.spark.sql.rapids.GpuParseUrl.doColumnar(GpuParseUrl.scala:86) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at org.apache.spark.sql.rapids.GpuParseUrl.$anonfun$columnarEval$5(GpuParseUrl.scala:123) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.Arm$.withResourceIfAllowed(Arm.scala:84) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at org.apache.spark.sql.rapids.GpuParseUrl.$anonfun$columnarEval$4(GpuParseUrl.scala:120) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.Arm$.withResourceIfAllowed(Arm.scala:84) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at org.apache.spark.sql.rapids.GpuParseUrl.$anonfun$columnarEval$3(GpuParseUrl.scala:119) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
	at com.nvidia.spark.rapids.Arm$.withResourceIfAllowed(Arm.scala:84) ~[rapids-4-spark-aggregator_2.12-24.06.0-SNAPSHOT-spark330.jar:?]
...
```

Signed-off-by: MithunR <[email protected]>

* Reword validity check.

Signed-off-by: MithunR <[email protected]>

---------

Signed-off-by: MithunR <[email protected]>
…s] (NVIDIA#10829)

* Scala 2.13: Inheritance shadowing

Signed-off-by: Raza Jafri <[email protected]>

* Signing off

Signed-off-by: Raza Jafri <[email protected]>

---------

Signed-off-by: Raza Jafri <[email protected]>
* Added DateTimeUtilsShims

* Signing off

Signed-off-by: Raza Jafri <[email protected]>

* added the missing DateTimeUtilsShims for 343

---------

Signed-off-by: Raza Jafri <[email protected]>
This PR is to add the ZSTD codec for GPU shuffle compression.

---------

Signed-off-by: Firestarman <[email protected]>
* Add NVTX ranges to identify Spark stages and tasks

Signed-off-by: Jason Lowe <[email protected]>

* scalastyle

---------

Signed-off-by: Jason Lowe <[email protected]>
…-10845

Fix auto merge conflict 10845 [[skip ci]]
…VIDIA#10860)

Fixes NVIDIA#10606.

This commit accounts for the change in the signature of `PartitionedFileUtil.getPartitionedFile()`,
in Apache Spark 4.0. (See [SPARK-46473](apache/spark#44437).)

Signed-off-by: MithunR <[email protected]>
…DIA#10839)

Demo PR contributing to NVIDIA#10838 

It showcases a coding convention to follow using SortOrder and FilterExec replacements as an example

```scala
scala>  spark.range(100).where($"id" <= 10).collect()

java.lang.RuntimeException: convertToGpu failed
  at scala.sys.package$.error(package.scala:30)
  at com.nvidia.spark.rapids.GpuFilterExecMeta.convertToGpu(basicPhysicalOperators.scala:790)
  at com.nvidia.spark.rapids.GpuFilterExecMeta.convertToGpu(basicPhysicalOperators.scala:783)
  at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:838)
  at com.nvidia.spark.rapids.GpuOverrides$.com$nvidia$spark$rapids$GpuOverrides$$doConvertPlan(GpuOverrides.scala:4383)
  at com.nvidia.spark.rapids.GpuOverrides.applyOverrides(GpuOverrides.scala:4728)
  at com.nvidia.spark.rapids.GpuOverrides.$anonfun$applyWithContext$3(GpuOverrides.scala:4588)
  at com.nvidia.spark.rapids.GpuOverrides$.logDuration(GpuOverrides.scala:455)
  at com.nvidia.spark.rapids.GpuOverrides.$anonfun$applyWithContext$1(GpuOverrides.scala:4585)
  at com.nvidia.spark.rapids.GpuOverrideUtil$.$anonfun$tryOverride$1(GpuOverrides.scala:4551)
  at com.nvidia.spark.rapids.GpuOverrides.applyWithContext(GpuOverrides.scala:4605)
  at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:4578)
  at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:4574)
  at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1(Columnar.scala:532)
```

Signed-off-by: Gera Shegalov <[email protected]>
NVIDIA#10872)

* add plugin link to ignore pattern

Signed-off-by: liyuan <[email protected]>

* add plugin link to ignore pattern

Signed-off-by: liyuan <[email protected]>

---------

Signed-off-by: liyuan <[email protected]>
* refine UT framework to promote GPU evaluation

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* enable some exprs for json

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* exclude flaky tests

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* fix review comments

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* use vectorized parameter where possible

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

* add todo for utc issue

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>

---------

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
…A#10858)

* Add support for multiple filtering keys for subquery broadcast

* Signing off

Signed-off-by: Raza Jafri <[email protected]>

* Fixed test compilation

---------

Signed-off-by: Raza Jafri <[email protected]>
* Disabling the cuDF default pinned pool for 24.06

Signed-off-by: Alessandro Bellina <[email protected]>

* Add a warning in case we can't configure the cuDF default pool

---------

Signed-off-by: Alessandro Bellina <[email protected]>
* Add support for self-contained profiling

Signed-off-by: Jason Lowe <[email protected]>

* Use Scala regex, add executor-side logging on profile startup/shutdown

* Use reflection to handle potentially missing Hadoop CallerContext

* scala 2.13 fix

---------

Signed-off-by: Jason Lowe <[email protected]>
…t " (NVIDIA#10934)

* Revert "Add Support for Multiple Filtering Keys for Subquery Broadcast  (NVIDIA#10858)"

This reverts commit 3001852.

* Signing off

Signed-off-by: Raza Jafri <[email protected]>

---------

Signed-off-by: Raza Jafri <[email protected]>
NVIDIA#10947)

Prevent '^[0-9]{n}' from being processed as `spark_rapids_jni::literal_range_pattern` that currently only supports "contains", not "starts with"
 
Fixes NVIDIA#10928

Also adding missing tailrec annotations to recursive parser methods.

Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: jenkins <jenkins@localhost>
@nvauto nvauto requested a review from NvTimLiu as a code owner June 3, 2024 02:16
@NvTimLiu NvTimLiu deleted the branch main June 4, 2024 05:39
@NvTimLiu NvTimLiu closed this Jun 4, 2024
@NvTimLiu NvTimLiu deleted the merge-branch-24.06-to-main branch June 4, 2024 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.