Skip to content

Latest commit

 

History

History
75 lines (69 loc) · 3.48 KB

index.md

File metadata and controls

75 lines (69 loc) · 3.48 KB
published title layout id
true
Data Sketches
html_page
home

Sketches Library from

Overview Download GitHub


Fast

Sketches are fast. The sketch algorithms in this library process data in a single pass and are suitable for both real-time and batch. Sketches enable processing unique identifiers in an "additive" way that streamlines system's architecture and enable fast queries of heretofore difficult metrics such as unique user counts.

<div class="col-md-4">
  <a href="/docs/KeyFeatures.html">
    <span class="fa fa-database fa-4x"></span>
    <h2>Big Data</h2>
  </a>
  <p class="text-justify">This library has been specifically designed for Big Data systems: Hadoop, Druid, and Hive* sketch adaptors, a Memory package for managing large off-heap memory data structures, additional protection of sensitive user identifiers by special handling of hash seeds, additional reduction of memory consumption with a front-end sampling, and compact binary storage.<br>* coming soon!</p>
</div>

<div class="col-md-4">
  <p><a href="/docs/KeyFeatures.html">
    <span class="fa fa-bar-chart-o fa-4x"></span><br>
    <h2>Analysis</h2>
  </a></p>
  <p class="text-justify">Built-in set operators (Union, Intersection, Difference) produce sketches as a result (and not just a number) enabling full set expressions, such as ((A &#8746; B) &#8745; (C &#8746; D)) \ (E &#8746; F).  This capability along with predictable and superior accuracy (compared with <i>Include/Exclude</i> approaches) enable unprecedented analysis capabilities for fast queries. </p>
</div>