Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking: Implementing S3 object store via OpenDAL #14321

Closed
6 of 7 tasks
wcy-fdu opened this issue Jan 3, 2024 · 3 comments
Closed
6 of 7 tasks

Tracking: Implementing S3 object store via OpenDAL #14321

wcy-fdu opened this issue Jan 3, 2024 · 3 comments

Comments

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Jan 3, 2024

For a long time in the past, we have been using the rust version of aws s3-sdk to implement our object store. However, the frequent break change updates of the sdk have caused some stability problems. At the same time, for other object storage support, such as gcs, azblob, etc., we all use OpenDAL. In a recent PR, we introduced s3 file source through OpenDAL's s3 service and found that its performance and stability are also good.

It is worth mentioning that we currently use madism to wrap a layer of aws s3 sdk for deterministic test, however, when switching to OpenDAL s3, it's not easy to implement madsim for OpenDAL(at least it won't be easy in the short term), we can just mock the implementation of object store trait, instead of the aws s3 client or OpenDAL object. This will be much easier.

Based on the above, I think we can try to replace the existing s3 object store, all object stores are all in OpenDAL. Since S3 is our most commonly used object store and is used by many customers, this switch must be done with great caution, so I have listed a rough roadmap:

  • Implement new s3 object store via OpenDAL
  • Do performance testing and stability testing between new s3 and origin s3.
  • Mock object store trait via madsim to make deterministic tests happy.
  • Set up relatively smooth switching logic between old and new s3.
  • Test new s3 in small clusters and PoC customers with customer consent.
  • Switch all clusters to OpenDAL s3 and continue to run stably for a while.
  • Remove origin s3 object implematation.
@wcy-fdu wcy-fdu self-assigned this Jan 3, 2024
@github-actions github-actions bot added this to the release-1.6 milestone Jan 3, 2024
@wcy-fdu
Copy link
Contributor Author

wcy-fdu commented Jan 3, 2024

If you have any reasons or concerns about retaining the current s3, please discuss it here. cc @hzxa21 @fuyufjh

@wcy-fdu wcy-fdu changed the title Tracking: Implementing S3 object store using OpenDAL Tracking: Implementing S3 object store via OpenDAL Jan 3, 2024
@wcy-fdu wcy-fdu modified the milestones: release-1.6, release-1.8 Mar 6, 2024
@wcy-fdu wcy-fdu removed this from the release-1.8 milestone Apr 8, 2024
Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

@wcy-fdu
Copy link
Contributor Author

wcy-fdu commented Oct 28, 2024

I think we can close this issue, as release-2.0 already use OpenDAL to access s3, and many users are already using it.

@wcy-fdu wcy-fdu closed this as completed Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant