Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HiveHash support for nested types #11660

Merged
merged 11 commits into from
Dec 14, 2024

Conversation

ustcfy
Copy link
Collaborator

@ustcfy ustcfy commented Oct 25, 2024

closes #11674

This PR adds support for Hive hash with struct and list types, and it depends on NVIDIA/spark-rapids-jni#2534

@ustcfy ustcfy changed the title Add HiveHash support for nested types Add HiveHash support for nested types Nov 6, 2024
@firestarman firestarman marked this pull request as ready for review November 25, 2024 07:03
@firestarman
Copy link
Collaborator

Looks good to me, but better have more looks at this.

@ustcfy ustcfy changed the base branch from branch-24.12 to branch-25.02 November 25, 2024 09:56
@ustcfy ustcfy self-assigned this Nov 26, 2024
@firestarman
Copy link
Collaborator

build

}
val maxDepth = a.children.map(c => getMaxNestedDepth(c.dataType)).max
if (maxDepth > 8) {
willNotWorkOnGpu(s"GPU HiveHash supports 8 levels at most for " +
Copy link
Collaborator

@firestarman firestarman Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this fallback ?

Copy link
Collaborator

@ttnghia ttnghia Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We should better retrieve this maxDepth from JNI instead, so if we change the max depth support in the future, we won't have to modify this code.

Check out the example this approach here:

def fallbackCheck(instructions: List[PathInstruction]): Boolean =
instructions.length > JSONUtils.MAX_PATH_DEPTH

Comment on lines 3324 to 3325
TypeSig.psNote(TypeEnum.ARRAY, "Nested levels exceeding 8 layers are not supported") +
TypeSig.psNote(TypeEnum.STRUCT, "Nested levels exceeding 8 layers are not supported"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't explicitly say "8 layers" as this can be changed at anytime.

@ustcfy ustcfy requested a review from ttnghia December 13, 2024 02:18
@ustcfy
Copy link
Collaborator Author

ustcfy commented Dec 13, 2024

build

@ustcfy ustcfy merged commit ab3111a into NVIDIA:branch-25.02 Dec 14, 2024
50 checks passed
@ustcfy ustcfy deleted the hivehash-nested-support branch December 14, 2024 04:20
@sameerz sameerz added the feature request New feature or request label Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] HiveHash supports nested types
4 participants