Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute from_json with struct schema using JSONUtils.fromJSONToStructs #11618

Merged
merged 28 commits into from
Nov 23, 2024

Conversation

ttnghia
Copy link
Collaborator

@ttnghia ttnghia commented Oct 17, 2024

This adopts the newly implemented JNI function JSONUtils.fromJSONToStructs() to parse the input strings columns into a structs column, which is the case of calling from_json SQL function with struct schema. By replacing the Scala code entirely by native code, we can avoid a lot of overhead and optimize runtime performance.

Closes #11560.

This will also close the following issues:

Depends on:

@ttnghia ttnghia added feature request New feature or request SQL part of the SQL/Dataframe plugin performance A performance related task/issue P0 Must have for release task Work required that improves the product but is not user facing labels Oct 17, 2024
@ttnghia ttnghia requested a review from revans2 October 17, 2024 03:14
@ttnghia ttnghia self-assigned this Oct 17, 2024
@ttnghia ttnghia force-pushed the from_json_post_processing branch from 8ec6474 to 692a0cb Compare October 18, 2024 18:17
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is still in draft so the debugging comments and commented out code is fine. I just thought I would track it anyways. It looks great.

Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
@ttnghia ttnghia force-pushed the from_json_post_processing branch from cbf4499 to e2f1724 Compare November 14, 2024 05:02
@ttnghia ttnghia changed the title Perform conversion for the columns output from Table.readJSON to other data types using JSONUtils.convertDataTypes() Execute from_json with struct schema using JSONUtils.fromJSONToStructs Nov 14, 2024
@ttnghia ttnghia marked this pull request as ready for review November 14, 2024 05:09
revans2
revans2 previously approved these changes Nov 14, 2024
# Conflicts:
#	sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuJsonToStructs.scala
@ttnghia
Copy link
Collaborator Author

ttnghia commented Nov 23, 2024

build

@ttnghia ttnghia merged commit daaaf24 into NVIDIA:branch-24.12 Nov 23, 2024
48 of 49 checks passed
@ttnghia ttnghia deleted the from_json_post_processing branch November 23, 2024 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request P0 Must have for release performance A performance related task/issue SQL part of the SQL/Dataframe plugin task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Improve GpuJsonToStructs performance
2 participants