WIP: fix counter resets when merging batches #9909
Draft
+584
−63
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does
When merging samples from different iterators, histogram counter reset hints need to be recalculated. The
mergeIterator
currently does not do so.This is currently a WIP to test out a possible fix.
We know the following:
The fix has two parts:
merge()
. However, a batch contains some consecutive samples in a chunk. A later batch might come from the same chunk and the samples could be consecutive to the previous batch's samples. Therefore, in batchStream, we keep a map of iteratorID -> last histogram timestamp from the iterator that was written to the batchStream. When we write the first sample in the new batch, we check if the previous sample in the batchStream has the same timestamp as the last histogram timestamp from the same iterator. If so, we can trust the counter reset hint for the sample in the new batch rather than resetting it. The idea for figuring out consecutive samples between batches using the previous timestamp comes from @krajorama and this PR of this: fix(histograms): inflated counter resets on merge #9823.Note we could still be overdetecting unknown counter resets - it's possible that chunks in different newNonOverlappingIterators are actually consecutive but for now if samples are coming from different iterators we set them to unknown. This is because we don't know for sure as we don't have a way to tell if chunk are consecutive to each other at this point in the Mimir code (see prometheus/prometheus#15346 for related discussion).
Current TODOs:
Which issue(s) this PR fixes or relates to
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.