Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example reading data from an mmaped IPC file #6986

Merged
merged 2 commits into from
Jan 21, 2025

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 15, 2025

Which issue does this PR close?

Rationale for this change

Reading arrow IPC files without copying is a key format feature, but it is hard to understand how to make this happen today

What changes are included in this PR?

  1. Add an example of how to use mmap with FileDecoder

Are there any user-facing changes?

Example

Potential follow ons (TODO):

  1. Create a Buffer::from_owner as suggested by @tustvold in MMap support for IPC files #6709 (comment)
  2. Maybe add IPCBufferDecoder or something similar into the actual crate API

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 15, 2025
@alamb alamb force-pushed the alamb/mmap_example branch from b6df782 to 1c0b81e Compare January 15, 2025 13:01
@@ -87,6 +87,9 @@ criterion = { version = "0.5", default-features = false }
half = { version = "2.1", default-features = false }
rand = { version = "0.8", default-features = false, features = ["std", "std_rng"] }
serde = { version = "1.0", default-features = false, features = ["derive"] }
# used in examples
memmap2 = "0.9.3"
bytes = "1.9"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could avoid this bytes dependency if we added Buffer::from_owned

@alamb alamb marked this pull request as ready for review January 15, 2025 13:03
use std::path::PathBuf;
use std::sync::Arc;

/// This example shows how to read data from an Arrow IPC file without copying
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-read the code in StreamDecoder again and I am pretty sure it will also be zero copy array creation if fed via StreamDecoder::decode --

pub fn decode(&mut self, buffer: &mut Buffer) -> Result<Option<RecordBatch>, ArrowError> {

Perhaps I can extend this example to show that as well 🤔

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alamb alamb merged commit b8b2f21 into apache:main Jan 21, 2025
27 checks passed
@alamb
Copy link
Contributor Author

alamb commented Jan 21, 2025

Thank you for the review @tustvold

@alamb alamb deleted the alamb/mmap_example branch January 21, 2025 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MMap support for IPC files
2 participants