published | title | layout | id |
---|---|---|---|
true |
Data Sketches |
html_page |
home |
Sketches are fast. The sketch algorithms in this library process data in a single pass and are suitable for both real-time and batch. Sketches enable processing unique identifiers in an "additive" way that streamlines system's architecture and enable fast queries of heretofore difficult metrics such as unique user counts.
<div class="col-md-4">
<a href="/docs/KeyFeatures.html">
<span class="fa fa-database fa-4x"></span>
<h2>Big Data</h2>
</a>
<p class="text-justify">This library has been specifically designed for Big Data systems: Hadoop, Druid, and Hive* sketch adaptors, a Memory package for managing large off-heap memory data structures, additional protection of sensitive user identifiers by special handling of hash seeds, additional reduction of memory consumption with a front-end sampling, and compact binary storage.<br>* coming soon!</p>
</div>
<div class="col-md-4">
<p><a href="/docs/KeyFeatures.html">
<span class="fa fa-bar-chart-o fa-4x"></span><br>
<h2>Analysis</h2>
</a></p>
<p class="text-justify">Built-in set operators (Union, Intersection, Difference) produce sketches as a result (and not just a number) enabling full set expressions, such as ((A ∪ B) ∩ (C ∪ D)) \ (E ∪ F). This capability along with predictable and superior accuracy (compared with <i>Include/Exclude</i> approaches) enable unprecedented analysis capabilities for fast queries. </p>
</div>