Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync down improvements #722

Open
ppolewicz opened this issue May 27, 2021 · 0 comments
Open

Sync down improvements #722

ppolewicz opened this issue May 27, 2021 · 0 comments

Comments

@ppolewicz
Copy link
Collaborator

Currently large file downloads use 10 threads per file, but --threads tells the sync process how many files to process at the same time, with a default of 10 --threads it's 100 download threads in total, which can cause some of them to time out if the source cluster has a high TTFB and local storage device is not fast enough. This has been observed in #720, but also others.

Configurability of download thread count should be improved.

Continuing broken downloads should be implemented somehow. As of today the native B2 server API does not allow for checking of checksums on the cloud side. Most likely it requires a separate "journal" file that would be flushed whenever we write a block, to indicate which parts of the file are written correctly (in case we use parallel downloader), or up to which point the file is correct.

B2 integration checklist says:

Downloads over 200MB should be split into parts and downloaded simultaneously. Once all parts are downloaded, the large file should be stitched together.
out of concern for performance, I think, but maybe if we have multiple download operations running simultaneously (due to sync down of many big files), then splitting them to smaller chunks is not good for anyone.

Finally, if a download operation is interrupted forcibly somehow, the client has a file with correct extension, maybe even correct size, but not necessarily correct data. In order to avoid this, Google Chrome adds a .crdownload extra extension to a file to indicate is has not been finished yet.

Most of those changes are backwards incompatible and would require a major version increase, so they should probably be done together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant