You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The file reader is designed for the sync reader. They typically support relatively cheap seek operation. However, if we try to seek on stream bytes (e.g., read bytes from S3), the seek position could be costly (re-send a new get request to S3).
I checked CSV and JSON format, and these file format reader only requires Read trait to infer a schema. Therefore, we might only need to care about the Parquet and the ORC format.
What type of enhancement is this?
Performance
What does the enhancement do?
The file reader is designed for the sync reader. They typically support relatively cheap seek operation. However, if we try to seek on stream bytes (e.g., read bytes from S3), the seek position could be costly (re-send a new get request to S3).
I checked CSV and JSON format, and these file format reader only requires
Read
trait to infer a schema. Therefore, we might only need to care about the Parquet and the ORC format.See also: #3191
Implementation challenges
No response
The text was updated successfully, but these errors were encountered: