Skip to content

Commit

Permalink
This aligns 6.1.X with master
Browse files Browse the repository at this point in the history
  • Loading branch information
leerho committed Dec 12, 2024
1 parent 6724a39 commit 9fa8799
Show file tree
Hide file tree
Showing 48 changed files with 173 additions and 93 deletions.
9 changes: 7 additions & 2 deletions .github/workflows/check_cpp_files.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,17 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Checkout C++
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
repository: apache/datasketches-cpp
path: cpp
- name: Setup Java
uses: actions/setup-java@v2
with:
java-version: '11'
distribution: 'temurin'
- name: Configure C++ build
run: cd cpp/build && cmake .. -DGENERATE=true
- name: Build C++ unit tests
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ under the License.

<groupId>org.apache.datasketches</groupId>
<artifactId>datasketches-java</artifactId>
<version>6.1.1</version>
<version>6.2.0-SNAPSHOT</version>
<packaging>jar</packaging>

<name>${project.artifactId}</name>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@
import org.apache.datasketches.memory.XxHash;

/**
* <p>A Bloom filter is a data structure that can be used for probabilistic
* set membership.</p>
* A Bloom filter is a data structure that can be used for probabilistic
* set membership.
*
* <p>When querying a Bloom filter, there are no false positives. Specifically:
* When querying an item that has already been inserted to the filter, the filter will
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@
import org.apache.datasketches.memory.WritableMemory;

/**
* <p>This class provides methods to help estimate the correct parameters when
* creating a Bloom filter, and methods to create the filter using those values.</p>
* This class provides methods to help estimate the correct parameters when
* creating a Bloom filter, and methods to create the filter using those values.
*
* <p>The underlying math is described in the
* <a href='https://en.wikipedia.org/wiki/Bloom_filter#Optimal_number_of_hash_functions'>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,10 @@
import org.apache.datasketches.memory.WritableMemory;

/**
* <p>This sketch is useful for tracking approximate frequencies of items of type <i>&lt;T&gt;</i>
* This sketch is useful for tracking approximate frequencies of items of type <i>&lt;T&gt;</i>
* with optional associated counts (<i>&lt;T&gt;</i> item, <i>long</i> count) that are members of a
* multiset of such items. The true frequency of an item is defined to be the sum of associated
* counts.</p>
* counts.
*
* <p>This implementation provides the following capabilities:</p>
* <ul>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,9 @@
import org.apache.datasketches.memory.WritableMemory;

/**
* <p>This sketch is useful for tracking approximate frequencies of <i>long</i> items with optional
* This sketch is useful for tracking approximate frequencies of <i>long</i> items with optional
* associated counts (<i>long</i> item, <i>long</i> count) that are members of a multiset of
* such items. The true frequency of an item is defined to be the sum of associated counts.</p>
* such items. The true frequency of an item is defined to be the sum of associated counts.
*
* <p>This implementation provides the following capabilities:</p>
* <ul>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,11 @@
/**
* This class defines the preamble data structure and provides basic utilities for some of the key
* fields.
* <p>
* The intent of the design of this class was to isolate the detailed knowledge of the bit and byte
*
* <p>The intent of the design of this class was to isolate the detailed knowledge of the bit and byte
* layout of the serialized form of the sketches derived from the Sketch class into one place. This
* allows the possibility of the introduction of different serialization schemes with minimal impact
* on the rest of the library.
* </p>
* on the rest of the library.</p>
*
* <p>
* MAP: Low significance bytes of this <i>long</i> data structure are on the right. However, the
Expand Down
2 changes: 0 additions & 2 deletions src/main/java/org/apache/datasketches/hash/MurmurHash3.java
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,8 @@
import org.apache.datasketches.memory.Memory;

/**
* <p>
* The MurmurHash3 is a fast, non-cryptographic, 128-bit hash function that has
* excellent avalanche and 2-way bit independence properties.
* </p>
*
* <p>
* Austin Appleby's C++
Expand Down
3 changes: 1 addition & 2 deletions src/main/java/org/apache/datasketches/hash/package-info.java
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,11 @@
*/

/**
* <p>The hash package contains a high-performing and extended Java implementations
* The hash package contains a high-performing and extended Java implementations
* of Austin Appleby's 128-bit MurmurHash3 hash function originally coded in C.
* This core MurmurHash3.java class is used throughout many of the sketch classes for consistency
* and as long as the user specifies the same seed will result in coordinated hash operations.
* This package also contains an adaptor class that extends the basic class with more functions
* commonly associated with hashing.
* </p>
*/
package org.apache.datasketches.hash;
Original file line number Diff line number Diff line change
Expand Up @@ -312,6 +312,7 @@ private static void randomlyHalveUpDoubles(final double[] buf, final int start,

/**
* Compression algorithm used to merge higher levels.
*
* <p>Here is what we do for each level:</p>
* <ul><li>If it does not need to be compacted, then simply copy it over.</li>
* <li>Otherwise, it does need to be compacted, so...
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,7 @@ public final void merge(final KllSketch other) {

/**
* {@inheritDoc}
*
* <p>The parameter <i>k</i> will not change.</p>
*/
@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -312,6 +312,7 @@ private static void randomlyHalveUpFloats(final float[] buf, final int start, fi

/**
* Compression algorithm used to merge higher levels.
*
* <p>Here is what we do for each level:</p>
* <ul><li>If it does not need to be compacted, then simply copy it over.</li>
* <li>Otherwise, it does need to be compacted, so...
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,7 @@ public final void merge(final KllSketch other) {

/**
* {@inheritDoc}
*
* <p>The parameter <i>k</i> will not change.</p>
*/
@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,7 @@ static <T> void updateItem(final KllItemsSketch<T> itmSk, final T item, final lo

/**
* Compression algorithm used to merge higher levels.
*
* <p>Here is what we do for each level:</p>
* <ul><li>If it does not need to be compacted, then simply copy it over.</li>
* <li>Otherwise, it does need to be compacted, so...
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -312,6 +312,7 @@ private static void randomlyHalveUpLongs(final long[] buf, final int start, fina

/**
* Compression algorithm used to merge higher levels.
*
* <p>Here is what we do for each level:</p>
* <ul><li>If it does not need to be compacted, then simply copy it over.</li>
* <li>Otherwise, it does need to be compacted, so...
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,7 @@ public final void merge(final KllSketch other) {

/**
* {@inheritDoc}
*
* <p>The parameter <i>k</i> will not change.</p>
*/
@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -506,6 +506,7 @@ public QuantilesDoublesSketchIterator iterator() {

/**
* {@inheritDoc}
*
* <p>The parameter <i>k</i> will not change.</p>
*/
@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,8 @@
*/

/**
* <p>The quantiles package contains stochastic streaming algorithms that enable single-pass
* The quantiles package contains stochastic streaming algorithms that enable single-pass
* analysis of the distribution of a stream of quantiles.
* </p>
*
* @see org.apache.datasketches.quantiles.DoublesSketch
* @see org.apache.datasketches.quantiles.ItemsSketch
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ public interface DoublesSortedView extends SortedView {
* @param splitPoints an array of <i>m</i> unique, monotonically increasing items
* (of the same type as the input items)
* that divide the item input domain into <i>m+1</i> overlapping intervals.
*
* <blockquote>
* <p>The start of each interval is below the lowest item retained by the sketch
* corresponding to a zero rank or zero probability, and the end of the interval
* is the rank or cumulative probability corresponding to the split point.</p>
Expand All @@ -55,7 +55,7 @@ public interface DoublesSortedView extends SortedView {
* </ul>
*
* <p>It is not recommended to include either the minimum or maximum items of the input stream.</p>
*
* </blockquote>
* @param searchCrit the desired search criteria.
* @return a discrete CDF array of m+1 double ranks (or cumulative probabilities) on the interval [0.0, 1.0].
* @throws IllegalArgumentException if sketch is empty.
Expand Down Expand Up @@ -100,7 +100,7 @@ default double[] getCDF(double[] splitPoints, QuantileSearchCriteria searchCrit)
* @param splitPoints an array of <i>m</i> unique, monotonically increasing items
* (of the same type as the input items)
* that divide the item input domain into <i>m+1</i> consecutive, non-overlapping intervals.
*
* <blockquote>
* <p>Each interval except for the end intervals starts with a split point and ends with the next split
* point in sequence.</p>
*
Expand All @@ -124,7 +124,7 @@ default double[] getCDF(double[] splitPoints, QuantileSearchCriteria searchCrit)
* </ul>
*
* <p>It is not recommended to include either the minimum or maximum items of the input stream.</p>
*
* </blockquote>
* @param searchCrit the desired search criteria.
* @return a PMF array of m+1 probability masses as doubles on the interval [0.0, 1.0].
* @throws IllegalArgumentException if sketch is empty.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ public interface FloatsSortedView extends SortedView {
* @param splitPoints an array of <i>m</i> unique, monotonically increasing items
* (of the same type as the input items)
* that divide the item input domain into <i>m+1</i> overlapping intervals.
*
* <blockquote>
* <p>The start of each interval is below the lowest item retained by the sketch
* corresponding to a zero rank or zero probability, and the end of the interval
* is the rank or cumulative probability corresponding to the split point.</p>
Expand All @@ -55,7 +55,7 @@ public interface FloatsSortedView extends SortedView {
* </ul>
*
* <p>It is not recommended to include either the minimum or maximum items of the input stream.</p>
*
* </blockquote>
* @param searchCrit the desired search criteria.
* @return a discrete CDF array of m+1 double ranks (or cumulative probabilities) on the interval [0.0, 1.0].
* @throws IllegalArgumentException if sketch is empty.
Expand Down Expand Up @@ -100,7 +100,7 @@ default double[] getCDF(float[] splitPoints, QuantileSearchCriteria searchCrit)
* @param splitPoints an array of <i>m</i> unique, monotonically increasing items
* (of the same type as the input items)
* that divide the item input domain into <i>m+1</i> consecutive, non-overlapping intervals.
*
* <blockquote>
* <p>Each interval except for the end intervals starts with a split point and ends with the next split
* point in sequence.</p>
*
Expand All @@ -124,7 +124,7 @@ default double[] getCDF(float[] splitPoints, QuantileSearchCriteria searchCrit)
* </ul>
*
* <p>It is not recommended to include either the minimum or maximum items of the input stream.</p>
*
* </blockquote>
* @param searchCrit the desired search criteria.
* @return a PMF array of m+1 probability masses as doubles on the interval [0.0, 1.0].
* @throws IllegalArgumentException if sketch is empty.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ public interface GenericSortedView<T> extends PartitioningFeature<T>, SketchPar
* @param splitPoints an array of <i>m</i> unique, monotonically increasing items
* (of the same type as the input items)
* that divide the item input domain into <i>m+1</i> overlapping intervals.
*
* <blockquote>
* <p>The start of each interval is below the lowest item retained by the sketch
* corresponding to a zero rank or zero probability, and the end of the interval
* is the rank or cumulative probability corresponding to the split point.</p>
Expand All @@ -64,7 +64,7 @@ public interface GenericSortedView<T> extends PartitioningFeature<T>, SketchPar
* </ul>
*
* <p>It is not recommended to include either the minimum or maximum items of the input stream.</p>
*
* </blockquote>
* @param searchCrit the desired search criteria.
* @return a discrete CDF array of m+1 double ranks (or cumulative probabilities) on the interval [0.0, 1.0].
* @throws IllegalArgumentException if sketch is empty.
Expand Down Expand Up @@ -116,7 +116,7 @@ default double[] getCDF(final T[] splitPoints, final QuantileSearchCriteria sear
* @param splitPoints an array of <i>m</i> unique, monotonically increasing items
* (of the same type as the input items)
* that divide the item input domain into <i>m+1</i> consecutive, non-overlapping intervals.
*
* <blockquote>
* <p>Each interval except for the end intervals starts with a split point and ends with the next split
* point in sequence.</p>
*
Expand All @@ -140,7 +140,7 @@ default double[] getCDF(final T[] splitPoints, final QuantileSearchCriteria sear
* </ul>
*
* <p>It is not recommended to include either the minimum or maximum items of the input stream.</p>
*
* </blockquote>
* @param searchCrit the desired search criteria.
* @return a PMF array of m+1 probability masses as doubles on the interval [0.0, 1.0].
* @throws IllegalArgumentException if sketch is empty.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ public interface LongsSortedView extends SortedView {
* @param splitPoints an array of <i>m</i> unique, monotonically increasing items
* (of the same type as the input items)
* that divide the item input domain into <i>m+1</i> overlapping intervals.
*
* <blockquote>
* <p>The start of each interval is below the lowest item retained by the sketch
* corresponding to a zero rank or zero probability, and the end of the interval
* is the rank or cumulative probability corresponding to the split point.</p>
Expand All @@ -55,7 +55,7 @@ public interface LongsSortedView extends SortedView {
* </ul>
*
* <p>It is not recommended to include either the minimum or maximum items of the input stream.</p>
*
* </blockquote>
* @param searchCrit the desired search criteria.
* @return a discrete CDF array of m+1 double ranks (or cumulative probabilities) on the interval [0.0, 1.0].
* @throws IllegalArgumentException if sketch is empty.
Expand Down Expand Up @@ -100,7 +100,7 @@ default double[] getCDF(long[] splitPoints, QuantileSearchCriteria searchCrit) {
* @param splitPoints an array of <i>m</i> unique, monotonically increasing items
* (of the same type as the input items)
* that divide the item input domain into <i>m+1</i> consecutive, non-overlapping intervals.
*
* <blockquote>
* <p>Each interval except for the end intervals starts with a split point and ends with the next split
* point in sequence.</p>
*
Expand All @@ -124,7 +124,7 @@ default double[] getCDF(long[] splitPoints, QuantileSearchCriteria searchCrit) {
* </ul>
*
* <p>It is not recommended to include either the minimum or maximum items of the input stream.</p>
*
* </blockquote>
* @param searchCrit the desired search criteria.
* @return a PMF array of m+1 probability masses as doubles on the interval [0.0, 1.0].
* @throws IllegalArgumentException if sketch is empty.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@
package org.apache.datasketches.quantilescommon;

/**
* <p>This is a stochastic streaming sketch that enables near-real time analysis of the
* This is a stochastic streaming sketch that enables near-real time analysis of the
* approximate distribution of items from a very large stream in a single pass, requiring only
* that the items are comparable.
* The analysis is obtained using the <i>getQuantile()</i> function or the
* inverse functions getRank(), getPMF() (the Probability Mass Function), and getCDF()
* (the Cumulative Distribution Function).</p>
* (the Cumulative Distribution Function).
*
* <p>Given an input stream of <i>N</i> items, the <i>natural rank</i> of any specific
* item is defined as its index <i>(1 to N)</i> in the hypothetical sorted stream of all
Expand Down
Loading

0 comments on commit 9fa8799

Please sign in to comment.