-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48898][SQL] Set nullability correctly in the Variant schema #49151
Conversation
…schema" This reverts commit b2c8b30.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (Pending CIs).
Thank you, @cashmand .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you re-trigger the failed test pipeline, @cashmand ?
Could you re-trigger once more, @cashmand ? |
Hi @dongjoon-hyun, it looks like the tests are passing now. |
Actually @dongjoon-hyun, I just realized that ArrayType has the same nullability issue. Let me update the PR to also set the nullability correctly on ArrayType. |
Thank you! |
@dongjoon-hyun, the PR should be good to merge now. Thanks for your patience! |
thanks, merging to master! |
### What changes were proposed in this pull request? The variantShreddingSchema method converts a human-readable schema for Variant to one that's a valid shredding schema. According to the shredding schema in apache/parquet-format#461, each shredded field in an object should be a required group - i.e. a non-nullable struct. This PR fixes the variantShreddingSchema to mark that struct as non-nullable. ### Why are the changes needed? If we use variantShreddingSchema to construct a schema for Parquet, the schema would be technically non-conformant with the spec by setting the group as optional. I don't think this should really matter to readers, but it would waste a bit of space in the Parquet file by adding an extra definition level. ### Does this PR introduce _any_ user-facing change? No, this code is not used yet. ### How was this patch tested? Added a test to do some minimal validation of the variantShreddingSchema function. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49151 from cashmand/SPARK-48898-nullability-again. Authored-by: cashmand <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
The variantShreddingSchema method converts a human-readable schema for Variant to one that's a valid shredding schema. According to the shredding schema in apache/parquet-format#461, each shredded field in an object should be a required group - i.e. a non-nullable struct. This PR fixes the variantShreddingSchema to mark that struct as non-nullable.
Why are the changes needed?
If we use variantShreddingSchema to construct a schema for Parquet, the schema would be technically non-conformant with the spec by setting the group as optional. I don't think this should really matter to readers, but it would waste a bit of space in the Parquet file by adding an extra definition level.
Does this PR introduce any user-facing change?
No, this code is not used yet.
How was this patch tested?
Added a test to do some minimal validation of the variantShreddingSchema function.
Was this patch authored or co-authored using generative AI tooling?
No.