Title: HashKV: Enabling Efficient Updates in KV Storage via Hashing

Source: ATC'18

Authors: Helen H. W. Chan, Yongkun Li, Patrick P. C. Lee Yinlong Xu

Summary

The problem the paper aims to solve
- Key-Value Separated LSM-Stores (WiscKey) introduces extra query and insert overhead while doing Garbage Collection.
- Unnecessary rewrites for cold KV pairs - Write amplification.
How can the paper address the problem? What is the main idea of this paper?
- Hash-based Data Grouping
  
  (1) Partition isolation: hash(Key) -> Partition.
  
  (2) KV pairs with same key are segmented into same group;
- Group-Based Garbage Collection
  
  (1) Select the segment group with the largest amount of writes.
  
  (2) Using hash table (No LSM-tree queries required).
- Hotness Awareness
  
  (1) Tagging for Hot-cold value separation.
  
  (2) Cold values are separately stored.
- Selective KV Separation
  
  (1) Large values: KV separation.
  
  (2) Small values: stored entirely in LSM-tree.
Validations

The results of experimentation and analysis show that through the elimination of mixture of hot and cold data, HashKV can achieve better throughput and reduce write amplification.

Strengthens

The idea of splitting KV pairs into segments based on hash of keys is interesting.

Weaknesses

For Hash-based Data Grouping, The Hash function is important. But it's not mentioned in the paper.
The overhead introduced by HashKV is not evaluated.

Comments

Provide feedback

Saved searches