-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(array): less leaky string array #5483
Conversation
Change the behviour of the string array back to the old behaviour where accessing the Value function returns a string that is backed by the arrow memory buffer. This avoids data allocations to memory outside of the memory allocator. The implementation of array.String has been simplified somewhat as part of the new behaviour. There are a number of places where correct behviour relies on copies of the data being made. To avoid having to fix all of these in the same PR a temporary ValueCopy function has been added to maintain the old semantics. This is being used everywhere the Value function was previously, except for cases where the value is obviously immediately processed, then discarded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a lot of files got touched in this PR, but the main change was adding the interface binaryArray
and adapting it to different types. Many of the files changes are just refactoring the method name. I left some unblocking comments and questions ✅
I would prefer an other pair of eyes for review!
// ValueCopy returns the value at the requested position copied into a | ||
// new memory location. This value will remain valid after the array is | ||
// released, but is not tracked by the memory allocator. | ||
// | ||
// This function is intended to be temporary while changes are being | ||
// made to reduce the amount of unaccounted data memory. | ||
func (a *String) ValueCopy(i int) string { | ||
return string(a.ValueRef(i).Bytes()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments. It is helpful! 👍
// Buffer returns the memory buffer that contains the value. | ||
func (r StringRef) Buffer() *arrowmem.Buffer { | ||
return r.buf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find anywhere uses Buffer()
function. Is it still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be used by the follow-up PRs in this series.
Co-authored-by: Chunchun Ye <[email protected]>
Change the behviour of the string array back to the old behaviour where accessing the Value function returns a string that is backed by the arrow memory buffer. This avoids data allocations to memory outside of the memory allocator.
The implementation of array.String has been simplified somewhat as part of the new behaviour.
There are a number of places where correct behviour relies on copies of the data being made. To avoid having to fix all of these in the same PR a temporary ValueCopy function has been added to maintain the old semantics. This is being used everywhere the Value function was previously, except for cases where the value is obviously immediately processed, then discarded.
The cases where the
VisitCopy
function is being used will be address one at a time until we can avoid significant levels of unaccounted memory.Checklist
Dear Author 👋, the following checks should be completed (or explicitly dismissed) before merging.
experimental/
docs/Spec.md
has been updatedDear Reviewer(s) 👋, you are responsible (among others) for ensuring the completeness and quality of the above before approval.