Skip to content

0.8.0 Oct 5, 2016: Modular release structure, reservoir sampling, and more...

Compare
Choose a tag to compare
@leerho leerho released this 05 Oct 22:19
· 2629 commits to master since this release

Modular Release Structure

Because the Memory package has many applications beyond just the DataSketches library, it made sense to separate it out into its own module. The jars for the Memory package will appear as
memory-X.Y.Z-<type>.jar.

The remainder of the library is its own module and will appear as usual as
sketches-core-X.Y.Z-<type>.jar, but with a dependency on the memory jar.

In addition to the usual jar types, there is an additional
sketches-core-X.Y.Z-with-shaded-memory.jar, which contains the
sketches-core-X.Y.Z.jar and a shaded, renamed memory-X.Y.Z.jar.

This shading allows protection from the "DLL Hell" situation where there may be a different version of the memory jar registered in the same system.

New Sampling Package

We have now added a sampling package into the suite of different types of sketch algorithms.

The first entry in this area is an efficient implementation of the classical reservoir sampling algorithm that is often used as an interview question. However, this implementation is quite a bit more sophisticated in that it also solves the more complex problem of merging with different sized sketches. It also includes a base implementation using longs (more as a tutorial example) as well as a Java Generic version that can be extended to any type, including polymorphic types. As with all the other sketches in the library the challenges of efficient serialization and deserialization have also been addressed.

There are a number of exciting ways this sampling package can grow.

Memory Package Enhancements

The Memory package is used extensively in the library for off-heap work and a number of groups have shown a lot of interest in using it more broadly. In this release we have extended the API to include read-only variants of the Memory classes in the same way as the ByteBuffer classes. It also has been extended to allow direct access to the Unsafe class in those situations where the utmost in performance is required. Examples of the emerging use of this capability can be found in the PreambleUtils class in the quantiles package. Caution is advised when using this package, as it is easy to "shoot yourself in the foot"! Caveat Emptor!

New PairwiseSetOperations

The new PairwiseSetOperations class fills in the need for performing set operations on just 2 arguments fast. These are stateless operations and are specifically optimized for Theta Sketches that are already in ordered, CompactSketch form.