Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unify hummock s3 retry and timeout interface #13843

Closed
MrCroxx opened this issue Dec 7, 2023 · 9 comments
Closed

unify hummock s3 retry and timeout interface #13843

MrCroxx opened this issue Dec 7, 2023 · 9 comments

Comments

@MrCroxx
Copy link
Contributor

MrCroxx commented Dec 7, 2023

Currently, the object store only has a retry mechanism based on error or long-term timeout. But sometimes, an object store request can also be hung by other causes and the operation cannot finish and then block the streaming. The object store needs a mechanism to set low-level timeout and retry configurations.

Considering there are various types of operations, the operation timeout cannot be unified with the same timeout. The timeout of each operation should be separated, which includes:

  • read
  • stream read
  • upload
  • stream upload
  • list
  • delete

Besides, more low-level timeout configurations are needed to prevent other exceptions:

  • socket connection timeout (for all operations)
  • time to first byte timeout (for some operations, or don't need?)
@MrCroxx MrCroxx self-assigned this Dec 7, 2023
@github-actions github-actions bot added this to the release-1.6 milestone Dec 7, 2023
@hzxa21
Copy link
Collaborator

hzxa21 commented Dec 19, 2023

Are socket connection time and TTFB time included or excluded in operation timeout? If yes, are we expecting the socket connection timeout and TTFB timeout to be significantly smaller than operation timeout? For simplicity, can we use operation timeout only without introducing socket connection timeout and TTFB timeout?

@MrCroxx
Copy link
Contributor Author

MrCroxx commented Dec 19, 2023

Are socket connection time and TTFB time included or excluded in operation timeout? If yes, are we expecting the socket connection timeout and TTFB timeout to be significantly smaller than operation timeout? For simplicity, can we use operation timeout only without introducing socket connection timeout and TTFB timeout?

It is included. But if we handle the socket connection and TTFB timeout, we can retry unexpected requests much earlier than it reaches opetation timeout.

@MrCroxx
Copy link
Contributor Author

MrCroxx commented Dec 19, 2023

..IIUC, the socket connection timeout is the duration between request and server accept, and the TTFB timeout is the duration beyween request and the first byte of data that returned, which includes theobject store agent handling the request and serving the first chunk.

@MrCroxx
Copy link
Contributor Author

MrCroxx commented Dec 19, 2023

IMO, the connection timeout and TTFB timeout maybe useful for most reads, except for compactor.

@hzxa21
Copy link
Collaborator

hzxa21 commented Dec 19, 2023

..IIUC, the socket connection timeout is the duration between request and server accept, and the TTFB timeout is the duration beyween request and the first byte of data that returned, which includes theobject store agent handling the request and serving the first chunk.

True. IIUC, setting socket connection timeout and TTFB timeout are simple for aws sdk so I think there is no overhead of doing so. I am not sure whether opendal provides interfaces to set them though.

Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

@hzxa21
Copy link
Collaborator

hzxa21 commented Jun 13, 2024

@Li0k @MrCroxx Can we considered this issue done with #16231 ?

@Li0k
Copy link
Contributor

Li0k commented Jun 13, 2024

@Li0k @MrCroxx Can we considered this issue done with #16231 ?

Do we need to close the issue after switching the backend to OpenDal?

@MrCroxx
Copy link
Contributor Author

MrCroxx commented Jun 20, 2024

Do we need to close the issue after switching the backend to OpenDal?

Agreed. Since there is no plan to further refactor the AWS SDK, let me close the issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants