-
Notifications
You must be signed in to change notification settings - Fork 512
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* rfc: Lazy IO Signed-off-by: Xuanwo <[email protected]> * rfc: Lazy Reader Signed-off-by: Xuanwo <[email protected]> * Assign number Signed-off-by: Xuanwo <[email protected]> * Fix docs Signed-off-by: Xuanwo <[email protected]> * Add bench Signed-off-by: Xuanwo <[email protected]> * Add token Signed-off-by: Xuanwo <[email protected]> * Fix link Signed-off-by: Xuanwo <[email protected]> --------- Signed-off-by: Xuanwo <[email protected]>
- Loading branch information
Showing
3 changed files
with
122 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
- Proposal Name: `lazy_reader` | ||
- Start Date: 2023-10-22 | ||
- RFC PR: [apache/incubator-opendal#3356](https://github.com/apache/incubator-opendal/pull/3356) | ||
- Tracking Issue: [apache/incubator-opendal#3359](https://github.com/apache/incubator-opendal/issues/3359) | ||
|
||
# Summary | ||
|
||
Doing read IO in a lazy way. | ||
|
||
# Motivation | ||
|
||
The aim is to minimize IO cost. OpenDAL sends an actual IO request to the storage when `Accessor::read()` is invoked. For storage services such as S3, this equates to an IO request. However, in practical scenarios, users typically create a reader and use `seek` to navigate to the correct position. | ||
|
||
Take [parquet2 read_metadata](https://docs.rs/parquet2/latest/src/parquet2/read/metadata.rs.html) as an example: | ||
|
||
```rust | ||
/// Reads a [`FileMetaData`] from the reader, located at the end of the file. | ||
pub fn read_metadata<R: Read + Seek>(reader: &mut R) -> Result<FileMetaData> { | ||
// check file is large enough to hold footer | ||
let file_size = stream_len(reader)?; | ||
if file_size < HEADER_SIZE + FOOTER_SIZE { | ||
return Err(Error::oos( | ||
"A parquet file must contain a header and footer with at least 12 bytes", | ||
)); | ||
} | ||
|
||
// read and cache up to DEFAULT_FOOTER_READ_SIZE bytes from the end and process the footer | ||
let default_end_len = min(DEFAULT_FOOTER_READ_SIZE, file_size) as usize; | ||
reader.seek(SeekFrom::End(-(default_end_len as i64)))?; | ||
|
||
... | ||
|
||
deserialize_metadata(reader, max_size) | ||
} | ||
``` | ||
|
||
In `read_metadata`, we initiate a seek as soon as the reader is invoked. This action, when performed on non-seekable storage services such as s3, results in an immediate HTTP request and cancellation. By postponing the IO request until the first `read` call, we can significantly reduce the number of IO requests. | ||
|
||
The expense of initiating and immediately aborting an HTTP request is significant. Here are the benchmark results, using a stat call as our baseline: | ||
|
||
On minio server that setup locally: | ||
|
||
```rust | ||
service_s3_read_stat/4.00 MiB | ||
time: [315.23 µs 328.23 µs 341.42 µs] | ||
|
||
service_s3_read_abort/4.00 MiB | ||
time: [961.69 µs 980.68 µs 999.50 µs] | ||
``` | ||
|
||
On remote storage services with high latency: | ||
|
||
```rust | ||
service_s3_read_stat/4.00 MiB | ||
time: [407.85 ms 409.61 ms 411.39 ms] | ||
|
||
service_s3_read_abort/4.00 MiB | ||
time: [1.5282 s 1.5554 s 1.5828 s] | ||
|
||
``` | ||
|
||
# Guide-level explanation | ||
|
||
There have been no changes to the API. The only modification is that the IO request has been deferred until the first `read` call, meaning no errors will be returned when calling `op.reader()`. For instance, users won't encounter a `file not found` error when invoking `op.reader()`. | ||
|
||
# Reference-level explanation | ||
|
||
Most changes will happen inside `CompleteLayer`. In the past, we will call `Accessor::read()` directly in `complete_reader`: | ||
|
||
```rust | ||
async fn complete_reader( | ||
&self, | ||
path: &str, | ||
args: OpRead, | ||
) -> Result<(RpRead, CompleteReader<A, A::Reader>)> { | ||
.. | ||
|
||
let seekable = capability.read_can_seek; | ||
let streamable = capability.read_can_next; | ||
|
||
let range = args.range(); | ||
let (rp, r) = self.inner.read(path, args).await?; | ||
let content_length = rp.metadata().content_length(); | ||
|
||
... | ||
} | ||
``` | ||
|
||
In the future, we will postpone the `Accessor::read()` request until the first `read` call. | ||
|
||
# Drawbacks | ||
|
||
None | ||
|
||
# Rationale and alternatives | ||
|
||
None | ||
|
||
# Prior art | ||
|
||
None | ||
|
||
# Unresolved questions | ||
|
||
None | ||
|
||
# Future possibilities | ||
|
||
## Add `read_at` for `oio::Reader` | ||
|
||
After `oio::Reader` becomes zero cost, we can add `read_at` to `oio::Reader` to support read data by range. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters