Skip to content

Commit

Permalink
Add java.util.Date support to the comparator and add test. Update REA…
Browse files Browse the repository at this point in the history
…DME.
  • Loading branch information
whilo committed Apr 21, 2020
1 parent aa7ab6e commit ed1df49
Show file tree
Hide file tree
Showing 8 changed files with 269 additions and 93 deletions.
134 changes: 93 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,107 @@
# Hitchhiker Tree

Hitchhiker trees are a newly invented (by @dgrnbrg) datastructure, synthesizing fractal trees and functional data structures, to create fast, snapshottable, massively scalable databases.
Hitchhiker trees are a datastructure [invented by David Greenberg](https://github.com/datacrypt-project/hitchhiker-tree), synthesizing fractal trees and functional data structures, to create fast, snapshottable, massively scalable databases.

[Watch the talk from Strange Loop](https://www.youtube.com/watch?v=jdn617M3-P4) to learn more, especially about the concept!
We documented our extended design [here](https://blog.datopia.io/2018/11/03/hitchhiker-tree/). This repository currently reflects its development as a backend for [Datahike](https://github.com/replikativ/datahike). This respository adds ClojureScript support and provides a [core.async](https://github.com/clojure/core.async) backend for [konserve](https://github.com/replikativ/konserve) including a [Merkle](https://en.wikipedia.org/wiki/Merkle_tree) variant of the tree, effectively making the hitchhiker much more portable and apt for distributed infrastructure.

## What's in this Repository?

The hitchhiker namespaces contain a complete implementation of a persistent, serializable, lazily-loaded hitchhiker tree.
This is a sorted key-value datastructure, like a scalable `sorted-map`.
It can incrementally persist and automatically lazily load itself from any backing store which implements a simple protocol.

Outboard is a sample application for the hitchhiker tree.
It includes an implementation of the IO subsystem backed by Redis, and it manages all of the incremental serialization and flushing.
## Usage

The hitchhiker tree is designed very similarly to how Datomic's backing trees must work--I would love to see integration with [DataScript](https://github.com/tonsky/datascript) for a fully open source [Datomic](http://www.datomic.com).
Add this dependency to your project:

[![Clojars Project](http://clojars.org/io.replikativ/hitchhiker-tree/latest-version.svg)](http://clojars.org/io.replikativ/hitchhiker-tree)

## Current API

We use [tree.cljc](src/hitchhiker/tree.cljc) and [messaging.cljc](src/hitchhiker/tree/messaging.cljc) as the core API. The following snippet is extracted from the [konserve tests](test/hitchhiker/konserve_test.cljc).

```clojure
(ns hitchhiker-tree.sandbox
(:require [hitchhiker.tree :as core]
[hitchhiker.tree.messaging :as msg]
[hitchhiker.tree.bootstrap.konserve :as kons]
[hitchhiker.tree.utils.async :as ha]
[konserve.cache :as kc]
[konserve.filestore :refer [new-fs-store delete-store list-keys]]
[clojure.core.async :as async]
[clojure.test :refer [deftest testing run-tests is]
]))


(let [folder "/tmp/async-hitchhiker-tree-test"
_ (delete-store folder)
store (kons/add-hitchhiker-tree-handlers
(kc/ensure-cache (async/<!! (new-fs-store folder :config {:fsync false}))))
backend (kons/->KonserveBackend store)
config (core/->Config 1 3 (- 3 1))
flushed (ha/<?? (core/flush-tree
(time (reduce (fn [t i]
(ha/<?? (msg/insert t i i)))
(ha/<?? (core/b-tree config))
(range 1 11)))
backend))
root-key (kons/get-root-key (:tree flushed))
tree (ha/<?? (kons/create-tree-from-root-key store root-key))]
(is (= (ha/<?? (msg/lookup tree -10)) nil))
(is (= (ha/<?? (msg/lookup tree 100)) nil))
(dotimes [i 10]
(is (= (ha/<?? (msg/lookup tree (inc i))) (inc i))))
(is (= (map first (msg/lookup-fwd-iter tree 4)) (range 4 11)))
(is (= (map first (msg/lookup-fwd-iter tree 0)) (range 1 11)))
(let [deleted (ha/<?? (core/flush-tree (ha/<?? (msg/delete tree 3)) backend))
root-key (kons/get-root-key (:tree deleted))
tree (ha/<?? (kons/create-tree-from-root-key store root-key))]
(is (= (ha/<?? (msg/lookup tree 2)) 2))
(is (= (ha/<?? (msg/lookup tree 3)) nil))
(is (= (ha/<?? (msg/lookup tree 4)) 4)))
(delete-store folder))
```


## Benchmarking

This library includes a detailed, instrumented benchmarking suite.
It's built to enable comparative benchmarks between different parameters or code changes, so that improvements to the structure can be correctly categorized as such, and bottlenecks can be reproduced and fixed.

To try it, just run

lein bench

The benchmark tool supports testing with different parameters, such as:

- The tree's branching factor
- Whether to enable fractal tree features, just use the B-tree features, or compare to a vanilla Clojure sorted map
- Reordering of delete operations (to stress certain workloads)
- Whether to use the in-memory or Redis-backed implementation

The benchmarking tool is designed to make it convenient to run several benchmarks;
each benchmark's parameters can be separate by a `--`.
This makes it easy to understand the characteristics of the hitchhiker tree over a variety of settings for a parameter.

You can run a more sophisticated experiment benchmark by doing

lein bench OUTPUT_DIR options -- options-for-2nd-experiment -- options-for-3rd-experiment

This generates an Excel workbooks called "analysis.xlsx" with benchmark results.
For instance, if you'd like to run experiments to understand the performance difference between various values of B (the branching factor), you can do:

lein bench perf_diff_experiment -b 10 -- -b 20 -- -b 40 -- -b 80 -- -b 160 -- -b 320 -- -b 640

And it will generate lots of data and the Excel workbook for analysis.

If you'd like to see the options for the benchmarking tool, just run `lein bench`.



## Original Outboard Redis API

_Note that this API is not actively developed, but can still be useful if you are interested in Redis._

## Outboard

Outboard is a simple API for your Clojure applications that enables you to make use of tens of gigabytes of local memory, far beyond what the JVM can manage.
Outboard also allows you to restart your application and reuse all of that in-memory data, which dramatic reduces startup times due to data loading.
Expand Down Expand Up @@ -100,40 +186,6 @@ You'l need a local Redis instance running to run the tests. Once you have it, ju

lein test


## Benchmarking

This library includes a detailed, instrumented benchmarking suite.
It's built to enable comparative benchmarks between different parameters or code changes, so that improvements to the structure can be correctly categorized as such, and bottlenecks can be reproduced and fixed.

To try it, just run

lein bench

The benchmark tool supports testing with different parameters, such as:

- The tree's branching factor
- Whether to enable fractal tree features, just use the B-tree features, or compare to a vanilla Clojure sorted map
- Reordering of delete operations (to stress certain workloads)
- Whether to use the in-memory or Redis-backed implementation

The benchmarking tool is designed to make it convenient to run several benchmarks;
each benchmark's parameters can be separate by a `--`.
This makes it easy to understand the characteristics of the hitchhiker tree over a variety of settings for a parameter.

You can run a more sophisticated experiment benchmark by doing

lein bench OUTPUT_DIR options -- options-for-2nd-experiment -- options-for-3rd-experiment

This generates an Excel workbooks called "analysis.xlsx" with benchmark results.
For instance, if you'd like to run experiments to understand the performance difference between various values of B (the branching factor), you can do:

lein bench perf_diff_experiment -b 10 -- -b 20 -- -b 40 -- -b 80 -- -b 160 -- -b 320 -- -b 640

And it will generate lots of data and the Excel workbook for analysis.

If you'd like to see the options for the benchmarking tool, just run `lein bench`.

## Technical details

See the `doc/` folder for technical details of the hitchhiker tree and Redis garbage collection system.
Expand All @@ -145,6 +197,6 @@ Also, thanks to Tom Faulhaber for making the Excel analysis awesome!

## License

Copyright © 2016 David Greenberg
Copyright © 2016 David Greenberg, 2017-2020 Christian Weilbach

Distributed under the Eclipse Public License version 1.0
2 changes: 1 addition & 1 deletion project.clj
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
(defproject io.replikativ/hitchhiker-tree "0.1.6"
(defproject io.replikativ/hitchhiker-tree "0.1.7-SNAPSHOT"
:description "A Hitchhiker Tree Library"
:url "https://github.com/replikativ/hitchhiker-tree"
:license {:name "Eclipse Public License"
Expand Down
58 changes: 44 additions & 14 deletions src/hitchhiker/tree/codec/nippy.cljc
Original file line number Diff line number Diff line change
@@ -1,33 +1,63 @@
(ns hitchhiker.tree.codec.nippy
(:require
[hitchhiker.tree :as tree]
[hitchhiker.tree.node :as n]
#?(:clj [taoensso.nippy :as nippy])))


;; TODO share with konserve
(declare encode)

(defn nilify
[m ks]
(reduce (fn [m k] (assoc m k nil))
m
ks))

(defn encode-index-node
[node]
(-> node
(nilify [:storage-addr :*last-key-cache])
(assoc :children (mapv encode (:children node)))))

(defn encode-data-node
[node]
(nilify node
[:storage-addr
:*last-key-cache]))

(defn encode-address
[node]
(nilify node
[:store
:storage-addr]))

(defn encode
[node]
(cond
(tree/index-node? node) (encode-index-node node)
(tree/data-node? node) (encode-data-node node)
(n/address? node) (encode-address node)
:else node))


(defonce install*
(delay
#?@(:clj [(nippy/extend-freeze hitchhiker.tree.IndexNode :b-tree/index-node
[{:keys [storage-addr cfg children op-buf]} data-output]
(nippy/freeze-to-out! data-output cfg)
(nippy/freeze-to-out! data-output children)
(nippy/freeze-to-out! data-output (into [] op-buf)))
[node data-output]
(nippy/freeze-to-out! data-output (into {} (encode node))))

(nippy/extend-thaw :b-tree/index-node
[data-input]
(let [cfg (nippy/thaw-from-in! data-input)
children (nippy/thaw-from-in! data-input)
op-buf (nippy/thaw-from-in! data-input)]
(tree/index-node children op-buf cfg)))
(tree/map->IndexNode (nippy/thaw-from-in! data-input)))

(nippy/extend-freeze hitchhiker.tree.DataNode :b-tree/data-node
[{:keys [cfg children]} data-output]
(nippy/freeze-to-out! data-output cfg)
(nippy/freeze-to-out! data-output children))
[node data-output]
(nippy/freeze-to-out! data-output (into {} (encode node))))

(nippy/extend-thaw :b-tree/data-node
[data-input]
(let [cfg (nippy/thaw-from-in! data-input)
children (nippy/thaw-from-in! data-input)]
(tree/data-node children cfg)))])))
(tree/map->DataNode (nippy/thaw-from-in! data-input)))])))

(defn ensure-installed!
[]
Expand Down
30 changes: 29 additions & 1 deletion src/hitchhiker/tree/key_compare.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@
java.util.UUID
(-order-on-edn-types [_] 8)

java.util.Date
(-order-on-edn-types [_] 9)

nil
(-order-on-edn-types [_] 10000)

Expand Down Expand Up @@ -77,6 +80,9 @@
cljs.core/UUID
(-order-on-edn-types [_] 8)

js/Date
(-order-on-edn-types [_] 9)

nil
(-order-on-edn-types [_] 10000)

Expand Down Expand Up @@ -164,6 +170,23 @@
(catch ClassCastException e
(- (n/-order-on-edn-types key2)
(n/-order-on-edn-types key1))))))

java.util.Date
(-compare [^java.util.Date key1 key2]
(if (instance? java.util.Date key2)
(cond
(< (.getTime key1) (.getTime key2)) -1
(= (.getTime key1) (.getTime key2)) 0
:else 1)
(try
(compare key1 key2)
(catch ClassCastException e
(- (n/-order-on-edn-types key2)
(n/-order-on-edn-types key1))))))
nil
(-compare [^java.lang.Null key1 key2]
(- (n/-order-on-edn-types key2)
(n/-order-on-edn-types key1)))
]
:cljs
[number
Expand All @@ -189,4 +212,9 @@
(- (n/-order-on-edn-types key2)
(n/-order-on-edn-types key1))))
(- (n/-order-on-edn-types key2)
(n/-order-on-edn-types key1))))]))
(n/-order-on-edn-types key1))))
nil
(-compare [key1 key2]
(- (n/-order-on-edn-types key2)
(n/-order-on-edn-types key1)))
]))
2 changes: 2 additions & 0 deletions src/hitchhiker/tree/messaging.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -228,3 +228,5 @@
(drop-while (fn [[k v]]
(neg? (c/-compare k key)))
(forward-iterator path)))))))


Loading

0 comments on commit ed1df49

Please sign in to comment.