-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First implementation of weighted update for KllDoubles. #487
Conversation
Not fully tested for off heap yet. Not implemented for floats or items yet.
src/main/java/org/apache/datasketches/kll/KllDoublesSketch.java
Outdated
Show resolved
Hide resolved
Change enablePrinting to false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple comments but they're just remarks, don't need changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than createItemsArray()
I don't know enough of what goes on in KLL to really get the reason for the other changes and if they're necessary for this PR. Consider those basically unreviewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I generally prefer comments about what happens inside a method to be above the method, rather than mixed in with the variables for the method. Is this inline-with-variables style a common java approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will detail the changes by file:
-
Util.java:
- Added bitAt(...) function used by KllHelper.createLevelsArray(), part of weighted update.
-
KllDoublesHelper.java
- Added createItemsArray(..), part of weighted update
- The mergeDoubleImpl(..) is complex and difficult to understand what is going on. I added more comments to help clarify its operation. In addition, I found that it needed a subtle change in its algorithm that would have created a problem with our new weighted updates, because weighted updates can create new levels at higher indices than would normally occur with the original update call. I had to improve the calculation of of the upper bound "ub" of "worklevels" to include this possibility. Otherwise, the merge could crash due to inadequate size of the "worklevels" buffer. This also meant calculating the "provisionalNumLevels" before "ub" and not the other way around.
- The changes to populateDoubleWorkArrays(..) include a comment for additional clarification of what the method is doing and changing some of the variable names to be consistent with CamelBack notation, which is standard in java.
-
KllDoublesSketch.java
- Added the "public void weightedUpdate(..)" function.
-
KllHeapDoublesSketch.java
- Added specialized constructor for the temporary sketch created to hold the specially created levels array and items array prior to its merge into the target sketch.
-
KllHelper.java
- 3 methods had to be converted from "private" to "package-private" to allow for testing: "intCapAux(..)", "intCapAuxAux(..)", and "powersOfThree(..)".
- The method "getLevelCapacityItems(..)" was redundant and removed.
- Added "createLevelsArray(..)" which creates the special levels array for weighted updates.
-
KllSketch.java
- Corrected one comment.
-
QuantilesDoublesAPI.java
- Corrected one javadoc
-
KllMiscDoublesTest.java
- Added tests specific to weighted updates
- Added tests specific to the critical calculations of the merge and compaction processes.
- For a number of these tests I added print methods, that when printing is enabled, reveal detailed information about the construction of the new weighted update methods as well as more complete information about other critical methods used in merging and compaction processes. For anyone attempting to understand the KLL code, these optional printing methods, that also serve as tests, are critical to understanding the KLL algorithm and how we implemented it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is to have code comments as close as possible to the relevant code. This is especially important in large, complex methods. These comments are distinctly different from Javadocs which document what a method does at a high level. Code comments are addressed to the developer, who needs to understand the detailed implementation and might need to modify or translate the code. Javadocs are addressed to the user of the method and generally is not concerned with the details of its implementation, except when it impacts its use.
Not fully tested for off heap yet.
Not implemented for floats or items yet.