Sync down improvements #722

ppolewicz · 2021-05-27T13:49:19Z

Currently large file downloads use 10 threads per file, but --threads tells the sync process how many files to process at the same time, with a default of 10 --threads it's 100 download threads in total, which can cause some of them to time out if the source cluster has a high TTFB and local storage device is not fast enough. This has been observed in #720, but also others.

Configurability of download thread count should be improved.

Continuing broken downloads should be implemented somehow. As of today the native B2 server API does not allow for checking of checksums on the cloud side. Most likely it requires a separate "journal" file that would be flushed whenever we write a block, to indicate which parts of the file are written correctly (in case we use parallel downloader), or up to which point the file is correct.

B2 integration checklist says:

Downloads over 200MB should be split into parts and downloaded simultaneously. Once all parts are downloaded, the large file should be stitched together.
out of concern for performance, I think, but maybe if we have multiple download operations running simultaneously (due to sync down of many big files), then splitting them to smaller chunks is not good for anyone.

Finally, if a download operation is interrupted forcibly somehow, the client has a file with correct extension, maybe even correct size, but not necessarily correct data. In order to avoid this, Google Chrome adds a .crdownload extra extension to a file to indicate is has not been finished yet.

Most of those changes are backwards incompatible and would require a major version increase, so they should probably be done together.

The text was updated successfully, but these errors were encountered:

ppolewicz added the enhancement label May 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync down improvements #722

Sync down improvements #722

ppolewicz commented May 27, 2021

Sync down improvements #722

Sync down improvements #722

Comments

ppolewicz commented May 27, 2021