diff --git a/README.md b/README.md index fca67c8..ed84b06 100644 --- a/README.md +++ b/README.md @@ -31,14 +31,14 @@ as memory buffers, and finally to the combination of the twos to create insertio Sorted String Table (SSTable) is a collection of files modelling key-value pairs in sorted order by key. It is used as a persistent storage for the LSM tree. -### Components +#### Components - _Data_: key-value pairs in sorted order by key, stored in a file; - _Sparse index_: sparse index containing key and offset of the corresponding key-value pair in the data; - _Bloom filter_: a [probabilistic data structure](https://en.wikipedia.org/wiki/Bloom_filter) used to test whether a key is in the SSTable. -### Key lookup +#### Key lookup The basic idea is to use the sparse index to find the key-value pair in the data file. The steps are: @@ -51,7 +51,7 @@ The steps are: The search is as lazy as possible, meaning that we read the minimum amount of data from disk, for instance, if the next key length is smaller than the one we are looking for, we can skip the whole key-value pair. -### Persistence +#### Persistence A table is persisted to disk when it is created. A base filename is defined, and three files are present: @@ -92,7 +92,7 @@ insertion and deletion of elements in a sorted sequence. In the LSM tree, it is used as an in-memory data structure to store key-value pairs in sorted order by key. Once the skip-list reaches a certain size, it is flushed to disk as an SSTable. -### Operations details +#### Operations details The idea of a skip list is similar to a classic linked list. We have nodes with forward pointers, but also levels. We can think about a @@ -114,7 +114,7 @@ Having defined SSTables and Skip Lists we can obtain the final structure as a co The main idea is to use the latter as an in-memory buffer, while the former efficiently stores flushed buffers. -### Insertion +#### Insertion Each insert goes directly to a Memtable, which is a Skip List under the hood, so the response time is quite fast. There exists a threshold, over which the mutable structure is made immutable by appending it to the _immmutable @@ -123,7 +123,7 @@ memtables LIFO list_ and replaced with a new mutable list. The immutable memtable list is asynchronously consumed by a background thread, which takes the next available list and create a disk-resident SSTable with its content. -### Lookup +#### Lookup While looking for a key, we proceed as follows: @@ -131,13 +131,13 @@ While looking for a key, we proceed as follows: 2. Look into the immutable memtables list, iterating from the most recent to the oldest, if not present continue; 3. Look into disk tables, iterating from the most recent one to the oldest, if not present return null. -### Deletions +#### Deletions To delete a key, we do not need to delete all its replicas, from the on-disk tables, we just need a special value called _tombstone_. Hence a deletion is the same as an insertion, but with a value set to null. While looking for a key, if we encounter a null value we simply return null as a result. -### SSTable Compaction +#### SSTable Compaction The most expensive operation while looking for a key is certainly the disk search, and this is why bloom filters are crucial for negative @@ -158,7 +158,7 @@ the results are obtained on AMD Ryzen™ 5 4600H with 16GB of RAM and 512GB SSD. To run them use `./gradlew jmh`. -### SSTable +#### SSTable - Negative access: the key is not present in the table, hence the Bloom filter will likely stop the search; - Random access: the key is present in the table, the order of the keys is random. @@ -171,7 +171,7 @@ c.t.l.sstable.SSTableBenchmark.randomAccess thrpt 5 7989.945 ± 40 ``` -### Bloom filter +#### Bloom filter - Add: add keys to a 1M keys Bloom filter with 0.01 false positive rate; - Contains: test whether the keys are present in the Bloom filter. @@ -183,7 +183,7 @@ c.t.l.bloom.BloomFilterBenchmark.contains thrpt 5 3567392.634 ± 220377 ``` -### Skip-List +#### Skip-List - Get: get keys from a 100k keys skip-list; - Add/Remove: add and remove keys from a 100k keys skip-list. @@ -196,7 +196,7 @@ c.t.l.memtable.SkipListBenchmark.get thrpt 5 487265.620 ± 8201 ``` -### Tree +#### Tree - Get: get elements from a tree with 1M keys; - Add: add 1M distinct elements to a tree with a memtable size of 2^18