Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add required methods to access inner builder for NullBufferBuilder #7002

Open
Chen-Yuan-Lai opened this issue Jan 21, 2025 · 4 comments · May be fixed by #7013
Open

Add required methods to access inner builder for NullBufferBuilder #7002

Chen-Yuan-Lai opened this issue Jan 21, 2025 · 4 comments · May be fixed by #7013
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@Chen-Yuan-Lai
Copy link

Chen-Yuan-Lai commented Jan 21, 2025

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

To replace NullBufferBuilder with BooleanBufferBuilder for optimization in DataFusion , some public methods are needed to be implemented in NullBufferBuilder with BooleanBufferBuilder to access inner builder.

Describe the solution you'd like

I would like to add methods like BooleanBufferBuilder :

  • get_bit()
  • capacity
  • truncate()

Describe alternatives you've considered

Other strategies below also may access the inner builder:

  • make the inner builder public
  • add a method that can return the reference of the inner builder

Additional context

Related issue

Use NullBufferBuilder instead of BooleanBufferBuilder for creating Null masks #14115

@Chen-Yuan-Lai Chen-Yuan-Lai added the enhancement Any new improvement worthy of a entry in the changelog label Jan 21, 2025
@Chen-Yuan-Lai
Copy link
Author

@jayzhan211

@jayzhan211
Copy link
Contributor

@Chen-Yuan-Lai I think we need to add methods for NullBufferBuilder and not make the inner builder public.

NullBufferBuilder may not have bitmap_builder if all we only append true (materialize is not called), so we need if-else to handle non-materialized case

@tustvold
Copy link
Contributor

I'd recommend following the naming on NullBuffer, e.g. is_valid vs get_bit.

@alamb
Copy link
Contributor

alamb commented Jan 22, 2025

FWIW I think the list of functions listed above are good and straightforward to add to NullBufferBuilder.

There are some function in Datafusion that I think might be too special purpose (aka only DataFusion would use them) such as take_n and thus probably don't belong in arrow-rs unless we have evidence others would use them

https://github.com/apache/datafusion/blob/63b94c8f9e128b938e81b7e867ce6256a94d67e6/datafusion/physical-plan/src/aggregates/group_values/null_builder.rs#L110C1-L111C1

@Chen-Yuan-Lai Chen-Yuan-Lai linked a pull request Jan 24, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants