Releases: apache/datasketches-java
Releases · apache/datasketches-java
0.12.0 Aug 7, 2018: Update POM to Memory 0.12.0, improves performance.
- Updated to Memory 0.12.0, which will improve performance
- Fixed handling of min and max values in KLL sketch merge
- Minor API changes
0.11.1 Apr 20, 2018: Quantiles, KLL, Tuple, Fixes & Improvements
- Quantiles sketch
- fixed issue #195
- added DoublesUnion.heapify() and DoublesUnion.wrap() methods
- deprecated DoublesUnionBuilder.heapify() and DoublesUnionBuilder.wrap() methods
- KLL sketch
- methods to obtain rank error for both single-sided and double-sided queries
- methods to compute parameter k given a target rank error
- Javadoc improvements
- Tuple sketch
- added Filter
0.11.0 Mar 15, 2018: KLL quantiles sketch, tuple sketch API change and more
- New KLL sketch:
KllFloatsSketch
:- This is a new quantiles sketch with better accuracy per stored bit than the original quantiles
DoublesSketch
. If you select a value of K for the KLL sketch so that it matches the same accuracy as the DoublesSketch, the K will be larger, but the space required will be much smaller. This sketch is specifically tuned for the smallest amount of space usage as possible (near theoretical optimum) and usesfloats
rather thandoubles
. On update this new KLL sketch is a little faster than the originalDoublesSketch
, but may be slower on merge. Also, this KLL sketch currently does not have a generic version (as does theDoublesSketch
) nor does it provide off-heap capability like theDoublesSketch
. Refer to the javadocs for a link to the KLL theoretical paper.
- This is a new quantiles sketch with better accuracy per stored bit than the original quantiles
- Tuple:
- generic sketch API change
- removed the convention to require static methods with a certain signature, these methods are now based on a more visible API
- added SummaryDeserializer
- The need to serialize factories has been removed
- removed getSummaries() method - use iterator instead
- generic sketch API change
- Theta:
- added new
SingleItemSketch
- fast way to create sketches with a single input item
- added new
- Original quantiles sketch enhancements:
- added getRank() - faster than getCDF() with one split point
- empty sketch returns null from getQuantiles(), getPMF() and getCDF()
- empty sketch returns NaN from getQuantile(), getMinValue() and getMaxValue()
- Komologorov-Smirnov Statistic between two quantiles sketches
- fixed sorting using comparator in generic ItemsSketch
0.10.3 Oct 26, 2017: Theta backward compatibility
Theta sketch: As a part of the resize factor serialization fix in version 0.10.2 a validation check was added, which led to inability to deserialize UpdateSketch or Union serialized using sketches-core-0.8.4 and above. This release is to address the issue.
0.10.2 Oct 20, 2017: Theta, HLL bug fixes
- Theta:
- Fixed bug in HeapUpdatesketch.toByteArray() that didn't set resize factor
- Added getFamily() to all Set Operations. Any user-defined subclasses of SetOperations will need to implement this method.
- HLL:
- Fixed HLL Union conversion to HLL_4 bug
- Made isSameResource() public
0.10.1 Sep 7, 2017: HLL Sketch Extended for Off-Heap Operation
- This release extends the prior HLL release 0.10.0 to also allow the HLL sketch to operate off-heap leveraging the new Memory package (located in the DataSketches/Memory repository. This capability is critical for large systems that must manage millions of sketches as updatable fields located in off-heap (native) memory. The other sketches in the library that also enable this off-heap operation include the Theta sketch as well as the Quantiles sketch.
0.10.0 Jun 16, 2017: New Memory, new HLL, new weighted sampling
- The Memory package, which is used extensively by all the DataSketches library, has been completely rewritten and moved to its own repository.
- The new Memory package now leverages Closeable and when used with try-with-resources blocks eliminates the need to close() resources external to the JVM (e.g., memory-mapped files and off-heap memory allocations). This totally replaces the freeMemory() requirements of the prior Memory implementation.
- The API has been streamlined to allow simpler creation of regions (like ByteBuffer slices), which are views of the same underlying resource.
- The internal architecture has been redesigned to eliminate redundancy and cleaner separation of the management of resources (off-heap memory, memory-mapped files, wrapped ByteBuffers and wrapped primitive arrays) from the specifics of the API implementation.
- Currently there are two API implementations: Memory, which provides direct-addressed, primitive (and primitive array) access, and Buffer, which provides a relative positional interface for primitive (and primitive array) access.
- This has required some API changes when using the Memory package: For example, instead of new NativeMemory(bytes) use Memory.wrap(bytes) or WritableMemory.wrap(bytes). Watch the distinction between the read-only wrap methods, which take Memory and updatable wrap methods, which take WritableMemory. Attempts to modify read-only objects will throw SketchesReadOnlyException.
- Completely rewritten HLL sketches with improved speed and accuracy performance.
- The prior version of HLL had some performance, usability and design issues that were problematic. In addition, our science team has developed some more advanced estimators that dramatically improve the accuracy of the HLL sketches, especially in the low-range. We decided that the best route was to redesign the HLL sketches from scratch.
- Added weighted sampling sketch
- VarOptItemsSketch creates a random sample of weighted items from a stream, with the inclusion probability approximately a function of the item's weight. The sketch can additionally apply a predicate to the sampled items to compute sums of weights over the subset, along with error bounds.
- Added support for subset sums with error bounds to Reservoir sampling
- Mirrors the (new) functionality for weighted sampling, back-ported to unweighted sampling.
- Some API changes in the Builder.build() methods:
- Builder.build() methods don't accept sketch size anymore, and optionally only accept a Memory object. This was changed to avoid an easy-to-create bug by a user that can be difficult to find. The initMemory(Memory) function is moved to the build(Memory) and the build(int k) function is moved to a builder.setK(int k) function.
- To improve consistency and clarity of functionality across the library, we have changed factory method names from the generic getInstance() to newInstance() when a virgin instance is being created and heapify(), or wrap() when the result instance already contains data.
0.9.1 Apr 14, 2017: Sorted Quantiles CompactDoublesSketch, added reset methods
- Fixed issue with unsorted Quantiles CompactDoublesSketch
- Added reset methods to Sampling sketches
- Added reset methods to Tuple sketches
0.9.0 Mar 24, 2017: Quantiles DoublesSketch refactoring, Frequent Items merge fix, read-only memory fix
- Quantiles DoublesSketch refactoring with API change
- New UpdateDoublesSketch and CompactDoublesSketch classes; can only call update() on the former
- Default serialization retains update or compact structure, allows wrap() to work as expected
- Create Union from any combination of update or compact, direct or heap sketches
- Fixed problem with merging Frequent Items sketches
- Fixed problem with read-only memory
0.8.4 Jan 18, 2017: Quantiles DirectDoublesSketch, Jaccard Similarity, Sampling improvements, bug fixes
- Quantiles DirectDoublesSketch
- Jaccard Similarity
- Quantiles forward compatibility from 0.3.0
- Sampling improvements
- additional getFrequentItems() method with threshold for convenience
- PairwiseSetOperations bug fixes and performance improvements