Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Binary arrays in starts_with, ends_with and contains #6926

Merged
merged 14 commits into from
Jan 22, 2025

Conversation

rluvaton
Copy link
Contributor

@rluvaton rluvaton commented Dec 31, 2024

(ignore branch name)

Which issue does this PR close?

Closes #6923
Closes #6924

What changes are included in this PR?

  1. Made PredicateImpl trait to work with the predicate regardless of string or binary
  2. move implementation to use the Predicate and make it more generic
  3. implement the PredicateImpl for the old Predicate and the new BinaryPredicate using macro (I don't really like this as it seem less maintainable, but not sure what's better, duplicating or macro, or another approach)

Are there any user-facing changes?

Yes, allow users to pass binary arrays to like/starts with/contains and more

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 31, 2024
@rluvaton rluvaton marked this pull request as ready for review December 31, 2024 19:39
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry in the delay reviewing this PR -- it is hard to find time reviewing such a large PR

I wonder what the usecase is for using LIKE on binary data? I as because it seems to me that LIKE is mostly useful for character strings.

I can see the usecase for starts_with / ends_with and contains for binary data,

Perhaps instead of trying to inject binary array into the code for handling strings, we could simply have simpler prefix/suffix matching for binary -- it might have some more repetition but would be simpler to understand any avoid any potential performance issues related to this code 🤔

@@ -59,6 +59,16 @@ pub struct FixedSizeBinaryArray {
}

impl FixedSizeBinaryArray {
/// Returns true if all data within this array is ASCII
pub fn is_ascii(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the need to check a binary array for ASCII -- there shouldn't be any optimizations that rely on the data being ASCII

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I did not want to duplicate the whole like and predicate file I implemented like and this needed for the like impl.

But instead I only implemented the contains, start with, and ends with

@alamb
Copy link
Contributor

alamb commented Jan 17, 2025

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

@alamb alamb marked this pull request as draft January 17, 2025 22:02
@rluvaton rluvaton force-pushed the add-example-arrow-binary-crate branch from 5f9a6c8 to 30b432e Compare January 19, 2025 12:04
@rluvaton rluvaton marked this pull request as ready for review January 19, 2025 16:50
@rluvaton
Copy link
Contributor Author

@alamb this PR is now ready for review and is much smaller. thanks for the feedback

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rluvaton -- I looked through this and I think it looks quite nice and easy to follow now and I think the risk of regressions in performance for strings is low

Thank you for your patience

@alamb alamb changed the title add binary support in arrow-string Support Binary arrays in starts_with, ends_with and contains Jan 21, 2025
@alamb
Copy link
Contributor

alamb commented Jan 21, 2025

I plan to merge this tomorrow unless anyone else would like time to review

@alamb alamb merged commit 6fd4607 into apache:main Jan 22, 2025
28 checks passed
@alamb
Copy link
Contributor

alamb commented Jan 22, 2025

Thanks again @rluvaton

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

arrow-string function should support binary input as well
2 participants