-
Notifications
You must be signed in to change notification settings - Fork 110
Histograms
A Histogram measures the distribution of values in a stream of data. From the Java library documentation
Histogram metrics allow you to measure not just easy things like the min, mean, max, and standard deviation of values, but also quantiles like the median or 95th percentile.
Traditionally, the way the median (or any other quantile) is calculated is to take the entire data set, sort it, and take the value in the middle (or 1% from the end, for the 99th percentile). This works for small data sets, or batch processing systems, but not for high-throughput, low-latency services.
The solution for this is to sample the data as it goes through. By maintaining a small, manageable reservoir which is statistically representative of the data stream as a whole, we can quickly and easily calculate quantiles which are valid approximations of the actual quantiles. This technique is called reservoir sampling.
private readonly Histogram histogram = Metric.Histogram("Search Results", Unit.Items);
public void Search(string keyword)
{
var results = ActualSearch(keyword);
histogram.Update(results.Length);
}
Out of the box three sampling types are provided:
- Exponentially Decaying Reservoir - produces quantiles which are representative of (roughly) the last five minutes of data
- Uniform Reservoir - produces quantiles which are valid for the entirely of the histogram’s lifetime
- Sliding Window Reservoir - produces quantiles which are representative of the past N measurements
More information about the reservoir types can be found in the Java library documentation
The histogram has the ability to track for which user value a Min, Max or Last Value has been recorded. The user value can be any string value (documentId, operationId, etc).
The Histogram will record for example for which documentId the operation returned the most items:
public class UserValueHistogramSample
{
private readonly Histogram histogram =
Metric.Histogram("Results", Unit.Items);
public void Process(string documentId)
{
var results = GetResultsForDocument(documentId);
this.histogram.Update(results.Length, documentId);
}
}
After running a few requests, the output of the histogram in text format looks like this:
Results
Count = 90 Items
Last = 46.00 Items
Last User Value = document-3
Min = 2.00 Items
Min User Value = document-7
Max = 98.00 Items
Max User Value = document-4
Mean = 51.52 Items
StdDev = 30.55 Items
Median = 50.00 Items
75% <= 80.00 Items
95% <= 97.00 Items
98% <= 98.00 Items
99% <= 98.00 Items
99.9% <= 98.00 Items
As you can see the histogram recorded that the min value ( 2 Items ) has been returned for document-7, the max value ( 98 Items ) has been returned for document-4 and the last value ( 46 Items ) for document-3.
For any issues please use the GitHub issues. For any other questions and ideas feel free to ping us: @PaulParau, @HinteaDan, @BogdanGaliceanu.