Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible issue with retries on a put to S3 when using the iput -X flag. #2203

Open
JustinKyleJames opened this issue Jun 13, 2024 · 5 comments
Labels

Comments

@JustinKyleJames
Copy link
Contributor

JustinKyleJames commented Jun 13, 2024

I saw that there were S3 errors when someone did an iput -r -X to an S3 resource.

I did a little testing where I created a directory of ten 100MiB files then recursively put that directory to an S3 resource. If I left it alone it passed.

However, if I did a control-c on the first iput, then restarted the iput (with the -f flag), I would sometimes get an S3 error. If I kept trying it would eventually pass.

My working theory is that because S3 must store some state in shared memory, the shared memory didn't get cleaned up on the exit and a retry would have an inconsistent state.

The shared memory logic does have a timeout so once that timeout expires the memory will be flushed. That could be why it eventually passes.

Some notes:

  1. This is just an educated guess at this point. I never actually reproduced the exact error (timeout reading circular buffer on part upload) that the user was seeing.
  2. I also saw failures when retrying after a control-C with iput -X to a unixfilesystem resource. One such failure was due to a file being left in intermediate state.
@alanking
Copy link
Contributor

One note 2, did the data object in question eventually come out of the intermediate state due to the agent timing out and tearing itself down?

@alanking alanking added the bug label Jun 13, 2024
@JustinKyleJames
Copy link
Contributor Author

JustinKyleJames commented Jun 13, 2024

One note 2, did the data object in question eventually come out of the intermediate state due to the agent timing out and tearing itself down?

It didn't but maybe I didn't wait long enough?

@alanking
Copy link
Contributor

Okay, if possible, please confirm that the object is not stuck. If it is stuck, this is a situation we need to examine for the core code because that should not happen no matter what.

@JustinKyleJames
Copy link
Contributor Author

Okay, if possible, please confirm that the object is not stuck. If it is stuck, this is a situation we need to examine for the core code because that should not happen no matter what.

Well now I can't reproduce it. ;-)

@trel
Copy link
Member

trel commented Jun 13, 2024

iput and ctrl-c should send the signal to the server, and it should 'finalize'... but if it didn't get a chance to send anything, the server will wait for the timeout before finalizing.

or we have a bug, like alan said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants