Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ParquetField public #6991

Closed
wants to merge 1 commit into from
Closed

Conversation

XiangpengHao
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

ParquetField (and its friends) are foundation types in the crate, without this being public, many of the functions can't be called, e.g., build_array_reader. If this is public, downstream users can easily hooking their own parquet reading logic while still reusing much of the existing code.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jan 17, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me

Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me too.

@tustvold
Copy link
Contributor

tustvold commented Jan 17, 2025

ParquetField currently has public members, we might want to change this. As it stands even adding a field would be a breaking API change once made public

Taking a step back, why do you want this public, what public APIs make use of it? build_array_reader is not public, or at least shouldn't be...

@XiangpengHao
Copy link
Contributor Author

build_array_reader is not public, or at least shouldn't be...

build_array_reader is public under experimental feature.

why do you want this public

The current ParquetRecordBatchReader is a black box, my use case would want finer control over how data is read, e.g., decode timing.

As it stands even adding a field would be a breaking API change once made public

That's a good point, but this specific type has not changed for 3 years and probably won't change for a while.

@tustvold
Copy link
Contributor

build_array_reader is public under experimental feature.

Perhaps we could make this public under the same experimental feature gate? I'm being cautious as we've been stung in the past by exposing too much of this crates' internals

@XiangpengHao
Copy link
Contributor Author

Perhaps we could make this public under the same experimental feature gate?

Sounds great! Will do soon

@alamb alamb marked this pull request as draft January 17, 2025 21:54
@alamb
Copy link
Contributor

alamb commented Jan 17, 2025

Marking as draft -- please mark it as ready to review when it is

@XiangpengHao
Copy link
Contributor Author

I'll need to think more carefully on what should be public, let's close this until I have a better understanding of how to proceed. Thank you for the review! @alamb @etseidl @tustvold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants