You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently we are writing to the (deprecated) Arrow format, but when sending the data to workers, we encode it to RecordBatch and re-encode data in the IPC format.
The benefit of writing to the IPC format directly, is that we can stream the data from disk and don't have to de-encode / re-encode the data. We'll also have to compress the data once (and benefit from reduced file-sizes too).
Describe the solution you'd like
Use the Arrow IPC stream format rather than the (old) file format.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered:
Dandandan
changed the title
Use IPC StreamWriter / StreamReader rather than writing to Arrow files
Use IPC StreamWriter / StreamReader rather than writing to old Arrow file format
Dec 22, 2022
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently we are writing to the (deprecated) Arrow format, but when sending the data to workers, we encode it to
RecordBatch
and re-encode data in the IPC format.The benefit of writing to the IPC format directly, is that we can stream the data from disk and don't have to de-encode / re-encode the data. We'll also have to compress the data once (and benefit from reduced file-sizes too).
Describe the solution you'd like
Use the Arrow IPC stream format rather than the (old) file format.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: