Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First implementation of weighted update for KllDoubles. #487

Merged
merged 4 commits into from
Jan 3, 2024

Conversation

leerho
Copy link
Contributor

@leerho leerho commented Dec 24, 2023

Not fully tested for off heap yet.
Not implemented for floats or items yet.

Not fully tested for off heap yet.
Not implemented for floats or items yet.
@leerho leerho requested a review from jmalkin January 3, 2024 03:28
Copy link
Contributor

@jmalkin jmalkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments but they're just remarks, don't need changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than createItemsArray() I don't know enough of what goes on in KLL to really get the reason for the other changes and if they're necessary for this PR. Consider those basically unreviewed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I generally prefer comments about what happens inside a method to be above the method, rather than mixed in with the variables for the method. Is this inline-with-variables style a common java approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will detail the changes by file:

  • Util.java:

    • Added bitAt(...) function used by KllHelper.createLevelsArray(), part of weighted update.
  • KllDoublesHelper.java

    • Added createItemsArray(..), part of weighted update
    • The mergeDoubleImpl(..) is complex and difficult to understand what is going on. I added more comments to help clarify its operation. In addition, I found that it needed a subtle change in its algorithm that would have created a problem with our new weighted updates, because weighted updates can create new levels at higher indices than would normally occur with the original update call. I had to improve the calculation of of the upper bound "ub" of "worklevels" to include this possibility. Otherwise, the merge could crash due to inadequate size of the "worklevels" buffer. This also meant calculating the "provisionalNumLevels" before "ub" and not the other way around.
    • The changes to populateDoubleWorkArrays(..) include a comment for additional clarification of what the method is doing and changing some of the variable names to be consistent with CamelBack notation, which is standard in java.
  • KllDoublesSketch.java

    • Added the "public void weightedUpdate(..)" function.
  • KllHeapDoublesSketch.java

    • Added specialized constructor for the temporary sketch created to hold the specially created levels array and items array prior to its merge into the target sketch.
  • KllHelper.java

    • 3 methods had to be converted from "private" to "package-private" to allow for testing: "intCapAux(..)", "intCapAuxAux(..)", and "powersOfThree(..)".
    • The method "getLevelCapacityItems(..)" was redundant and removed.
    • Added "createLevelsArray(..)" which creates the special levels array for weighted updates.
  • KllSketch.java

    • Corrected one comment.
  • QuantilesDoublesAPI.java

    • Corrected one javadoc
  • KllMiscDoublesTest.java

    • Added tests specific to weighted updates
    • Added tests specific to the critical calculations of the merge and compaction processes.
    • For a number of these tests I added print methods, that when printing is enabled, reveal detailed information about the construction of the new weighted update methods as well as more complete information about other critical methods used in merging and compaction processes. For anyone attempting to understand the KLL code, these optional printing methods, that also serve as tests, are critical to understanding the KLL algorithm and how we implemented it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to have code comments as close as possible to the relevant code. This is especially important in large, complex methods. These comments are distinctly different from Javadocs which document what a method does at a high level. Code comments are addressed to the developer, who needs to understand the detailed implementation and might need to modify or translate the code. Javadocs are addressed to the user of the method and generally is not concerned with the details of its implementation, except when it impacts its use.

@leerho leerho merged commit 1d5ea1c into master Jan 3, 2024
4 checks passed
@leerho leerho deleted the kll_weighted_doubles branch January 3, 2024 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants