-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Improve pyarrow-free remote-IO performance #16166
Conversation
Let's target 24.10. My view is, 24.08 we're deprecating but not removing NativeFile support. 24.10 we're removing, so we need to have some alternative in place, but it's OK if it's slower as long as we're working on a plan for improvement. Ideally by 24.12 we'd have something merged that at least restores performance parity. |
Okay - targeting 24.10 for a real "behavior" change makes perfect sense to me. In 24.08, we are technically deprecating the user's ability to control whether or not
This PR is already comparable in most cases, and faster in some (e.g. many small files). Therefore, I'm confident we will have parity for 24.10 and hopeful that we will have something "better" for 24.12 :) |
…x-parquet-dispatch
Co-authored-by: Mads R. B. Kristensen <[email protected]>
Closing as stale. |
Follow up to #16613 Supersedes #16166 Improves remote-IO read performance when multiple files are read at once. Also enables partial IO for remote Parquet files (previously removed in `24.10` by #16589). Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Lawrence Mitchell (https://github.com/wence-) URL: #16657
…i#16657) Follow up to rapidsai#16613 Supersedes rapidsai#16166 Improves remote-IO read performance when multiple files are read at once. Also enables partial IO for remote Parquet files (previously removed in `24.10` by rapidsai#16589). Authors: - Richard (Rick) Zamora (https://github.com/rjzamora) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Lawrence Mitchell (https://github.com/wence-) URL: rapidsai#16657
Description
Improves fsspec-only behavior for multi-file and partial reads from remote storage. Host-memory usage will be suboptimal compared to
NativeFile
(for now), but performance will be comparable in most cases.This PR also includes deprecations for pyarrow-based IO. However, #16132 should take priority for the deprecations (and this PR should be modified accordingly).
TODO
open_file_options
deprecation should probably be pulled out and merged before both 16132 and this PR.Checklist