Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide additional common builder interfaces with ValuesBuilder, a subtrait of ArrayBuilder. #6927

Closed
wants to merge 5 commits into from

Conversation

wiedld
Copy link
Contributor

@wiedld wiedld commented Dec 31, 2024

Which issue does this PR close?

There is no github issue. It was a gap noticed during a code review, resulting in the use of declarative macros in place of generic functions. The problem was that we lacked defined interfaces for common builder functionality.

Rationale for this change

The ArrayBuilder defines the minimum set of builder interfaces for runtime. Some common interfaces are not included, such as building lists of values or lists-of-lists. These interfaces are not included since the ArrayBuilder includes the minimum common set for generics, and handle very different bytes layouts (and runtime builder patterns) such as MapBuilder.

However, it may be worthwhile to define ArrayBuilder subtraits when its worthwhile for common runtime interactions -- such as building a list of values (the example done in this PR).

What changes are included in this PR?

  • commit 1 = define trait ValuesBuilder: ArrayBuilder, and implement for only those byte array builders
    • note: other builder could use this trait, I was doing a minimum starting change
  • commit 2 = bring trait into scope when used
  • commit 3 = replace macro process_regexp_match with generic function
  • commit 4 = replace macro process_regexp_array_match with generic function
  • commit 5 = add trait into doc rust code examples

Are there any user-facing changes?

A new subtrait is defined.

@github-actions github-actions bot added parquet Changes to the parquet crate arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate labels Dec 31, 2024
Comment on lines +275 to +276
/// Builds the [`GenericByteViewArray`] and reset this builder
pub fn finish(&mut self) -> GenericByteViewArray<T> {
Copy link
Contributor Author

@wiedld wiedld Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github diff is weird. These few functions were not changed -- only moved as the ValuesBuilder was implemented below.

Comment on lines +250 to +261
fn process_regexp_array_match<
'a,
T: ?Sized,
A: ArrayAccessor<Item = &'a str>,
O: OffsetSizeTrait,
B: ValuesBuilder<T, Value = str>,
>(
array: ArrayIter<A>,
regex_array: ArrayIter<A>,
flags_array: Option<ArrayIter<A>>,
list_builder: &mut GenericListBuilder<O, B>,
) -> Result<(), ArrowError> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic for this method (previous macro) is the same. It's only the generics which changed, such as providing generic ArrayIter is place of arrays which then were then converted into an ArrayIter with array.iter().

@wiedld wiedld marked this pull request as ready for review December 31, 2024 18:14
@tustvold
Copy link
Contributor

tustvold commented Jan 1, 2025

I've not had time to look in detail, but this sounds a lot like the Extend trait which we already support, and is more flexible as it allows appending iterators? Perhaps we just need to better document this?

@wiedld
Copy link
Contributor Author

wiedld commented Jan 2, 2025

I've not had time to look in detail, but this sounds a lot like the Extend trait which we already support, and is more flexible as it allows appending iterators? Perhaps we just need to better document this?

Ah yes, the initial use case (in the regex code) could have been refactored and then used the Extend trait for the generics. Thank you.

Do you think its still worthwhile to have this trait anyways? Since the append interfaces (for list building) are publicly exposed for other users. Or should we only add more documentation referring users to the Extend interface?

@tustvold
Copy link
Contributor

tustvold commented Jan 2, 2025

I think it would be good to better document the Extend trait.

I think if we do proceed with this, we'd need to keep the member functions as well, so people don't have to import the trait - as it stands this PR is a breaking change.

That being said, I'd personally be inclined to not add this, but it has been suggested before so people clearly have difficulty finding Extend. Perhaps documentation will fix that...

@wiedld
Copy link
Contributor Author

wiedld commented Jan 3, 2025

I think it would be good to better document the Extend trait.

@tustvold -- what do you think of these docs?
#6932

@wiedld
Copy link
Contributor Author

wiedld commented Jan 3, 2025

Closing as unneeded.

@wiedld wiedld closed this Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants