-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datehistogram improvement #170
Datehistogram improvement #170
Conversation
Signed-off-by: bowenlan-amzn <[email protected]>
Signed-off-by: bowenlan-amzn <[email protected]>
while (i < targetBuckets) { | ||
// Calculate the lower bucket bound | ||
final byte[] lower = new byte[8]; | ||
NumericUtils.longToSortableBytes(Math.max(roundedLow, low), lower, 0); | ||
// Calculate the upper bucket bound | ||
final byte[] upper = new byte[8]; | ||
roundedLow = preparedRounding.round(roundedLow + interval); | ||
// Subtract -1 if the minimum is roundedLow as roundedLow itself | ||
// is included in the next bucket | ||
NumericUtils.longToSortableBytes(Math.min(roundedLow - 1, high), upper, 0); | ||
|
||
filters[i++] = context.searcher().createWeight(new PointRangeQuery(field, lower, upper, 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic has problem since we are focused on creating the required number of target buckets, instead we should limit to the upper and lower bounds. For example, I see the following response:
% curl -s -X GET "localhost:9200/nyc_taxis/_search?pretty" -H 'Content-Type: application/json' -d'{"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-01-01 01:04:06","lt": "2016-01-01 00:00:00"}}},"aggs": {"dropoffs_over_time": {"auto_date_histogram": {"field": "dropoff_datetime","buckets": "4"}}}}'
{
"took" : 4295,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 100,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"dropoffs_over_time" : {
"buckets" : [
{
"key_as_string" : "2015-01-01 00:00:00",
"key" : 1420070400000,
"doc_count" : 100
},
{
"key_as_string" : "2016-01-01 00:00:00",
"key" : 1451606400000,
"doc_count" : 0
},
{
"key_as_string" : "2017-01-01 00:00:00",
"key" : 1483228800000,
"doc_count" : 0
}
],
"interval" : "1y"
}
}
}
roundingInfosLoop: do { | ||
RoundingInfo curRoundingInfo = roundingInfos[roundingIdx]; | ||
for (int curInnerInterval: curRoundingInfo.innerIntervals) { | ||
if (bestDuration <= curInnerInterval * curRoundingInfo.roughEstimateDurationMillis) { | ||
interval = curInnerInterval * curRoundingInfo.roughEstimateDurationMillis; | ||
break roundingInfosLoop; | ||
} | ||
} | ||
roundingIdx++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need to move the preparedRounding as well to the next index? Else it is always staying at 0 (second level)
for (i = 0; i < filters.length; i++) { | ||
long bucketOrd = bucketOrds.add( | ||
owningBucketOrd, | ||
preparedRounding.round(NumericUtils.sortableBytesToLong(((PointRangeQuery) filters[i].getQuery()).getLowerPoint(), 0)) | ||
); | ||
if (bucketOrd < 0) { // already seen | ||
bucketOrd = -1 - bucketOrd; | ||
} | ||
incrementBucketDocCount(bucketOrd, counts[i]); | ||
} | ||
throw new CollectionTerminatedException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can be aggressive during bucket creation, and invoke the logic to merge buckets if it exceeds the target bucket count:
do {
try (LongKeyedBucketOrds oldOrds = bucketOrds) {
preparedRounding = prepareRounding(++roundingIdx);
long[] mergeMap = new long[Math.toIntExact(oldOrds.size())];
bucketOrds = new LongKeyedBucketOrds.FromSingle(context.bigArrays());
LongKeyedBucketOrds.BucketOrdsEnum ordsEnum = oldOrds.ordsEnum(0);
while (ordsEnum.next()) {
long oldKey = ordsEnum.value();
long newKey = preparedRounding.round(oldKey);
long newBucketOrd = bucketOrds.add(0, newKey);
mergeMap[(int) ordsEnum.ord()] = newBucketOrd >= 0 ? newBucketOrd : -1 - newBucketOrd;
}
merge(mergeMap, bucketOrds.size());
}
} while (roundingIdx < roundingInfos.length - 1
&& (bucketOrds.size() > targetBuckets * roundingInfos[roundingIdx].getMaximumInnerInterval()
|| max - min > targetBuckets * roundingInfos[roundingIdx].getMaximumRoughEstimateDurationMillis()));
Description
[Describe what this change achieves]
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.