Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: Data loss while retrying File::flush when disk is full #6325

Open
Xuanwo opened this issue Feb 4, 2024 · 3 comments
Open

fs: Data loss while retrying File::flush when disk is full #6325

Xuanwo opened this issue Feb 4, 2024 · 3 comments
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-fs Module: tokio/fs

Comments

@Xuanwo
Copy link

Xuanwo commented Feb 4, 2024

Version

:) cargo tree | grep tokio
│   │   └── tokio v1.36.0
│   │       └── tokio-macros v2.2.0 (proc-macro)
│   │   │   │   ├── tokio v1.36.0 (*)
│   │   │   │   ├── tokio-util v0.7.10
│   │   │   │   │   ├── tokio v1.36.0 (*)
│   │   │   │   ├── tokio v1.36.0 (*)
│   │   │   │   ├── tokio v1.36.0 (*)
│   │   │   │   └── tokio-rustls v0.24.1
│   │   │   │       └── tokio v1.36.0 (*)
│   │   │   ├── tokio v1.36.0 (*)
│   │   │   ├── tokio-rustls v0.24.1 (*)
│   │   │   ├── tokio-util v0.7.10 (*)
│   ├── tokio v1.36.0 (*)
├── tokio v1.36.0 (*)

Platform

Linux xuanwo-work 6.7.3-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Thu, 01 Feb 2024 10:30:25 +0000 x86_64 GNU/Linux

Description

While addressing apache/opendal#4058, we discovered that retrying File::flush while disk is full could result in data loss.

To reproduce:

  1. Setup a small fs by:
fallocate -l 512K disk.img
mkfs disk.img

mkdir /tmp/test_dir
sudo mount -o loop disk.img /tmp/test_dir

sudo chmod a+wr /tmp/test_dir

Now we have a fs that only have 512K.

  1. Running the follwing code:
use std::env;
use rand::prelude::*;
use tokio::io::AsyncWriteExt;
use tracing_subscriber;

#[tokio::main]
async fn main() -> Result<()> {
    tracing_subscriber::fmt::init();

    let path = &env::var("OPENDAL_FS_ROOT").expect("root must be set for this test");

    let mut f = tokio::fs::OpenOptions::new()
        .create(true)
        .write(true)
        .open(format!("{path}/test"))
        .await
        .unwrap();

    let size = thread_rng().gen_range(512 * 1024 + 1..4 * 1024 * 1024);
    let mut bs = vec![0; size];
    thread_rng().fill_bytes(&mut bs);

    f.write(&bs).await.unwrap();

    let res = f.flush().await;
    dbg!(&res);

    // After some operations, we retry the file flush.
    let res = f.flush().await;
    dbg!(&res);

    Ok(())
}

The full code example code be found at apache/opendal#4141. I remove the opendal related code to make this example more readable.

The output is:

    Finished dev [unoptimized + debuginfo] target(s) in 0.79s
     Running `/home/xuanwo/Code/apache/opendal/core/target/debug/edge_file_close_with_retry_on_full_disk`
[edge/file_close_with_retry_on_full_disk/src/main.rs:48] &res = Err(
    Os {
        code: 28,
        kind: StorageFull,
        message: "No space left on device",
    },
)
[edge/file_close_with_retry_on_full_disk/src/main.rs:52] &res = Ok(
    (),
)

The first time, flush generates StorageFull which is expeceted. But the second time, the same flush call returns Ok.

I expected to see a StorageFull error instead.

The key problem here is:

  • Users can't recover from this error, even if they try removing other files. The previous write operation returned Ok.
  • Retrying the flush operation is permitted but risky. Data written may be lost forever once flush returns Ok.

Based on the code here:

tokio/tokio/src/fs/file.rs

Lines 887 to 906 in 63caced

fn poll_flush(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), io::Error>> {
if let Some(e) = self.last_write_err.take() {
return Poll::Ready(Err(e.into()));
}
let (op, buf) = match self.state {
State::Idle(_) => return Poll::Ready(Ok(())),
State::Busy(ref mut rx) => ready!(Pin::new(rx).poll(cx))?,
};
// The buffer is not used here
self.state = State::Idle(Some(buf));
match op {
Operation::Read(_) => Poll::Ready(Ok(())),
Operation::Write(res) => Poll::Ready(res),
Operation::Seek(_) => Poll::Ready(Ok(())),
}
}
}

Maybe we should:

  • Perform the flush (write) again if we still have buf?
  • Update the last_write_err if the write operation failed?

I'm willing to give it a fix.

@Xuanwo Xuanwo added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Feb 4, 2024
@Darksonn Darksonn added the M-fs Module: tokio/fs label Feb 4, 2024
@carllerche
Copy link
Member

I'm a bit confused. I'm less familiar w/ the details of FS ops.

The key problem here is:

  • Users can't recover from this error, even if they try removing other files. The previous write operation returned Ok.
  • Retrying the flush operation is permitted but risky. Data written may be lost forever once flush returns Ok.

The key problem w/ Tokio's impl or getting an err when calling flush in general?

Maybe you could explain how to handle StorageFull and flush w/ blocking std calls and where converting that blocking code to Tokio's fs api fails.

@Xuanwo
Copy link
Author

Xuanwo commented Feb 7, 2024

I believe it's more like an issue of Tokio's implementation, which doesn't pass the write error on to the user.

I believe it's more like an issue of Tokio's implementation, which doesn't pass the write error on to the user.

Maybe you could explain how to handle StorageFull and flush w/ blocking std calls and where converting that blocking code to Tokio's fs api fails.

There is no direct mapping from std::File::flush to tokio::File::flush. std::File::flush on linux is a no-op, while tokio::File::flush involves to it's internal buffer logic.


I prepared a full repro here: https://github.com/Xuanwo/tokio-issue-6325-storage-full

let n = f.write(&bs).await?;
dbg!(&n);
assert_eq!(n, size, "tokio file always write data into buffer first");

While we calling write on a file, tokio will store it inside buf directly. After flush returns the write error, we cleaned it up and won't write again.

The same repro doesn't work on std::fs since std::fs will return the correct write size in f.write(). User will got the error while trying to write more data.

Xuanwo added a commit to Xuanwo/tokio-tcpstream-debug that referenced this issue Feb 7, 2024
Xuanwo added a commit to Xuanwo/tokio-tcpstream-debug that referenced this issue Feb 7, 2024
@Xuanwo
Copy link
Author

Xuanwo commented Feb 7, 2024

During implement #6330, I found that tokio will clear the buffer while error happened during write.

pub(crate) fn write_to<T: Write>(&mut self, wr: &mut T) -> io::Result<()> {
assert_eq!(self.pos, 0);
// `write_all` already ignores interrupts
let res = wr.write_all(&self.buf);
self.buf.clear();
res
}

I'm guessing we need to maintain the internal states here instead of droping all data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-fs Module: tokio/fs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants