-
Title: Dsync: a Lightweight Delta Synchronization Approach for Cloud Storage Services
Source: MSST 2020
Authors: Yuan He Lingfeng Xiang Wen Xia Hong Jiang Zhenhua Li Xuan Wang and Xiangyu Zou
Summary
-
Delta synchronization had to match the data chunk by sliding a search window byte by byte.
-
CDC technique solves the ‘boundary-shift’ problem but causes extra compute overhead.
-
We require lightweight delta sync.
-
This paper proposed Dsync, which included FastFp(fast weak hash) and redesigning the delta sync protocol(deduplication)
-
OLD:
- rsync: the window will slide further byte by byte, mainly process in the client
- WebR2sync+: the client is a web browser, mainly process in server
- CDC: The matching window of these two methods needs to slide from the beginning to the end of a file byte by byte, comparing every byte in the worst case. So this paper wants to use CDC, but CDC technique remains attractive because it greatly simplifies the chunk fingerprinting and searching process. Also, it may fail to eliminate redundancy among similar but non-duplicate chunks.
-
NEW:
-
FASTCDC: This paper propose to use FASTCDC(ATC16),and replace SipHash with SHA1 in WebR2sync+ because it's more reliable.**
-
FASTFP:the CDC technique incurs extra calculation even though FastCDC is very fast.Since CDC is sliding on the data by using the rolling hash algorithm, this paper uses the weak hash according to the rolling hashes during CDC. These additional operations are so lightweight as not to influence the overall performance of FastCDC.
-
Client/Server Communication Protocol
- Only FastFP(without the strong hash) will be sent to the server at first. If the weak hash is a match, the client will calculate the strong hash.
- Also they use the big chunk, the continuous match chunk will be merged as a big chunk and just compare the big chunk's strong hash.
-
-
FastFp reaches almost 400-460MB/s, about 50% higher than Adler32.
-
Dsync runs 1.5×-3.3× faster than the state-of-the-art rsync based WebR2Sync+ and Dedue-based approach.
Strengthens
- Parts of fastCDC is used to generate FASTFP, which greatly saves the cost of hashing calculation.
- Big Chunk saves a lot of strong hash computation overhead
Weaknesses
-
Both deduplication and FASTCDC are existing things.
-
The main contributions of this paper are piecing them together and proposing the fastFP hash algorithm.
Comments
- The overall design of the article is not complicated, the writing is clear and fluent, and the testing part is also relatively perfect
-