opentelemetry-sdk: speed up exemplars a bit #4260

xrmx · 2024-11-08T16:27:16Z

Description

Use a sparse dict to allocate ExemplarsBucket on demand instead of preallocating all of them in FixedSizeExemplarReservoirABC.

Make the following return around 2X more loops for both trace_based and always_off exemplars filter:

.tox/benchmark-opentelemetry-sdk/bin/pytest opentelemetry-sdk/benchmarks/metrics/ -k 'test_histogram_record_1000[7]'

I get 8% less rounds than with a8aacb0c6f2f06bf19b501d98e62f7c0e667fa4c that is the commit before the introductions of exemplars. Without this we are doing 60% less rounds.

This also adds a new benchmark file that instead of recording random numbers pick the very same bucket bounds entries.

Refs #4243

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

.tox/benchmark-opentelemetry-sdk/bin/pytest opentelemetry-sdk/benchmarks/metrics/ -k 'test_histogram_record_1000[7]'

Does This PR Require a Contrib Repo Change?

Yes. - Link to PR:
No.

Checklist:

Followed the style guidelines of this project
Changelogs have been updated
Unit tests have been added
Documentation has been updated

aabmass · 2024-11-08T19:01:08Z

Does the dict have any downside vs list in the steady state (measuring the same attribute set repeatedly)?

fcollonval

Thanks @xrmx
This looks good to me.

fcollonval · 2024-11-09T15:02:08Z

Does the dict have any downside vs list in the steady state (measuring the same attribute set repeatedly)?

Without any proof, I would say accessing a list item by index is faster than a dict value by key. So if there are lots of buckets this may be less optimized; but the number of buckets should never be very high.

xrmx · 2024-11-11T08:32:54Z

Does the dict have any downside vs list in the steady state (measuring the same attribute set repeatedly)?

Without any proof, I would say accessing a list item by index is faster than a dict value by key. So if there are lots of buckets this may be less optimized; but the number of buckets should never be very high.

With small data sets a list may be faster because it's simpler than the dict but with more items the O(n) list access time would lose against dict O(1) access time.

aabmass · 2024-11-11T18:47:15Z

but the number of buckets should never be very high.

If we don't already have a benchmark for taking a measurement with a pre-existing attribute set, it would be great to add. Maybe with the default buckets and one for an ExponentialHistogram, which defaults to maximum of 160 buckets

fcollonval · 2024-11-12T07:58:24Z

One important note about the exponential histogram case is that the number of buckets for the exemplar has a lower upper bound for buckets than the histogram:

Base2 Exponential Histogram Aggregation SHOULD use a SimpleFixedSizeExemplarReservoir with a reservoir equal to the smaller of the maximum number of buckets configured on the aggregation or twenty (e.g. min(20, max_buckets)).
Xref: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#exemplar-defaults

fcollonval · 2024-11-12T08:04:10Z

the O(n) list access time would lose against dict O(1) access time.

Is it that complexity for search rather than for accessing? If you know the index like in this case, I doubt the complexity of the list is higher.

xrmx · 2024-11-12T10:39:22Z

the O(n) list access time would lose against dict O(1) access time.

Is it that complexity for search rather than for accessing? If you know the index like in this case, I doubt the complexity of the list is higher.

Got nerdsniped in this and yes the getting a list item looks O(1) too:

PyObject *
PyList_GetItem(PyObject *op, Py_ssize_t i)
{
    if (!PyList_Check(op)) {
        PyErr_BadInternalCall();
        return NULL;
    }
    if (!valid_index(i, Py_SIZE(op))) {
        _Py_DECLARE_STR(list_err, "list index out of range");
        PyErr_SetObject(PyExc_IndexError, &_Py_STR(list_err));
        return NULL;
    }
    return ((PyListObject *)op) -> ob_item[i];
}

So I guess the difference in performance comes just from the lazyness of the ExemplarBucket() instantiations.

aabmass · 2024-11-14T21:59:07Z

If we don't already have a benchmark for taking a measurement with a pre-existing attribute set, it would be great to add.

@xrmx would you be open to adding a benchmark for this to make sure the dict doesn't make this much slower? That seems like the more common case to optimize for vs churning attributes.

xrmx · 2024-11-18T13:13:00Z

If we don't already have a benchmark for taking a measurement with a pre-existing attribute set, it would be great to add.

@xrmx would you be open to adding a benchmark for this to make sure the dict doesn't make this much slower? That seems like the more common case to optimize for vs churning attributes.

Need to sort out how to write such benchmark 😅

xrmx · 2024-11-26T09:13:48Z

@aabmass is something like this what you had in mind?

@pytest.mark.parametrize("num_labels", [0, 1, 3, 5, 7])
def test_histogram_record_1000(benchmark, num_labels):
    values = itertools.cycle(range(1, 10))
    def benchmark_histogram_record_1000():
        hist1000.record(next(values))

    benchmark(benchmark_histogram_record_1000)

Before

---------------------------------------------------------------------------------------- benchmark: 5 tests ----------------------------------------------------------------------------------------
Name (time in us)                    Min                Max              Mean            StdDev            Median               IQR             Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_histogram_record_1000[0]     2.9318 (1.15)     13.4856 (1.32)     3.1635 (1.11)     0.3693 (2.12)     3.1190 (1.11)     0.1350 (1.20)         22;69      316.1018 (0.90)       1022           1
test_histogram_record_1000[1]     2.5900 (1.01)     14.5938 (1.43)     2.8457 (1.00)     0.1850 (1.06)     2.8145 (1.00)     0.1146 (1.02)     3821;2833      351.4110 (1.00)      52373           1
test_histogram_record_1000[3]     2.5583 (1.00)     13.5032 (1.33)     2.8566 (1.01)     0.1873 (1.07)     2.8266 (1.01)     0.1201 (1.07)     3910;2536      350.0655 (0.99)      52902           1
test_histogram_record_1000[5]     2.5555 (1.0)      14.0648 (1.38)     2.8462 (1.00)     0.1965 (1.13)     2.8135 (1.00)     0.1127 (1.0)      3217;2935      351.3447 (1.00)      53338           1
test_histogram_record_1000[7]     2.5723 (1.01)     10.1784 (1.0)      2.8408 (1.0)      0.1742 (1.0)      2.8098 (1.0)      0.1136 (1.01)     4774;3252      352.0196 (1.0)       55710           1

After

---------------------------------------------------------------------------------------- benchmark: 5 tests ----------------------------------------------------------------------------------------
Name (time in us)                    Min                Max              Mean            StdDev            Median               IQR             Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_histogram_record_1000[0]     2.6487 (1.02)     13.6066 (1.0)      3.0223 (1.06)     0.3527 (1.82)     2.9970 (1.06)     0.2645 (2.56)        100;41      330.8757 (0.95)       2367           1
test_histogram_record_1000[1]     2.6105 (1.00)     14.9934 (1.10)     2.8952 (1.01)     0.2371 (1.22)     2.8554 (1.01)     0.1248 (1.21)     3357;3879      345.4039 (0.99)      65935           1
test_histogram_record_1000[3]     2.6114 (1.00)     17.8963 (1.32)     2.8626 (1.0)      0.1938 (1.0)      2.8303 (1.0)      0.1034 (1.0)      3534;3742      349.3333 (1.0)       53803           1
test_histogram_record_1000[5]     2.6058 (1.00)     24.4612 (1.80)     2.8805 (1.01)     0.2325 (1.20)     2.8433 (1.00)     0.1183 (1.14)     3105;3773      347.1662 (0.99)      55704           1
test_histogram_record_1000[7]     2.5993 (1.0)      18.7624 (1.38)     2.9343 (1.03)     0.2049 (1.06)     2.8927 (1.02)     0.1760 (1.70)     5919;1259      340.7929 (0.98)      59382           1

aabmass · 2024-11-27T16:05:57Z

@aabmass is something like this what you had in mind?

Yes LGTM. If I'm reading those results right, it looks like pretty similar performance in the steady state?

Are you planning to push that new case to this PR? I think it would be good to keep.

xrmx · 2024-11-27T16:22:10Z

@aabmass is something like this what you had in mind?

Yes LGTM. If I'm reading those results right, it looks like pretty similar performance in the steady state?

That's my understanding as well.

Are you planning to push that new case to this PR? I think it would be good to keep.

Yes, will add new benchmarks functions to the PR.

aabmass

Approving in advance 😃

Use a sparse dict to allocate ExemplarsBucket on demand instead of preallocating all of them in FixedSizeExemplarReservoirABC. Make the following return around 2X more loops for both trace_based and always_off exemplars filter: .tox/benchmark-opentelemetry-sdk/bin/pytest opentelemetry-sdk/benchmarks/metrics/ -k 'test_histogram_record_1000[7]'

emdneto

Thanks! Do we need to changelog this?
Noticed that we don't have benchmark tests for exponential histograms, but we can leave for another PR.

xrmx requested a review from a team as a code owner November 8, 2024 16:27

xrmx added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Nov 8, 2024

xrmx mentioned this pull request Nov 8, 2024

sdk/metrics: performance impact of exemplars support even when configured as always_off #4243

Open

fcollonval approved these changes Nov 9, 2024

View reviewed changes

aabmass approved these changes Nov 27, 2024

View reviewed changes

xrmx force-pushed the cheaper-exemplars-reservoir branch from c3cd760 to 8f02f78 Compare November 28, 2024 09:53

xrmx added 2 commits November 28, 2024 11:25

Add benchmarks for steady state

0070080

xrmx force-pushed the cheaper-exemplars-reservoir branch from 8f02f78 to 0070080 Compare November 28, 2024 10:25

Ignore benchmarks dirs for exported public symbols

a1ddce1

emdneto approved these changes Nov 28, 2024

View reviewed changes

Add changelog

a0cc788

xrmx removed the Skip Changelog PRs that do not require a CHANGELOG.md entry label Nov 28, 2024

Merge branch 'main' into cheaper-exemplars-reservoir

2ae91f3

xrmx enabled auto-merge (squash) December 2, 2024 08:46

xrmx merged commit 72fad82 into open-telemetry:main Dec 2, 2024
354 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opentelemetry-sdk: speed up exemplars a bit #4260

opentelemetry-sdk: speed up exemplars a bit #4260

xrmx commented Nov 8, 2024 •

edited

Loading

aabmass commented Nov 8, 2024

fcollonval left a comment

fcollonval commented Nov 9, 2024

xrmx commented Nov 11, 2024 •

edited

Loading

aabmass commented Nov 11, 2024

fcollonval commented Nov 12, 2024

fcollonval commented Nov 12, 2024

xrmx commented Nov 12, 2024 •

edited

Loading

aabmass commented Nov 14, 2024

xrmx commented Nov 18, 2024 •

edited

Loading

xrmx commented Nov 26, 2024 •

edited

Loading

aabmass commented Nov 27, 2024

xrmx commented Nov 27, 2024

aabmass left a comment

emdneto left a comment •

edited

Loading

opentelemetry-sdk: speed up exemplars a bit #4260

opentelemetry-sdk: speed up exemplars a bit #4260

Conversation

xrmx commented Nov 8, 2024 • edited Loading

Description

Type of change

How Has This Been Tested?

Does This PR Require a Contrib Repo Change?

Checklist:

aabmass commented Nov 8, 2024

fcollonval left a comment

Choose a reason for hiding this comment

fcollonval commented Nov 9, 2024

xrmx commented Nov 11, 2024 • edited Loading

aabmass commented Nov 11, 2024

fcollonval commented Nov 12, 2024

fcollonval commented Nov 12, 2024

xrmx commented Nov 12, 2024 • edited Loading

aabmass commented Nov 14, 2024

xrmx commented Nov 18, 2024 • edited Loading

xrmx commented Nov 26, 2024 • edited Loading

aabmass commented Nov 27, 2024

xrmx commented Nov 27, 2024

aabmass left a comment

Choose a reason for hiding this comment

emdneto left a comment • edited Loading

Choose a reason for hiding this comment

xrmx commented Nov 8, 2024 •

edited

Loading

xrmx commented Nov 11, 2024 •

edited

Loading

xrmx commented Nov 12, 2024 •

edited

Loading

xrmx commented Nov 18, 2024 •

edited

Loading

xrmx commented Nov 26, 2024 •

edited

Loading

emdneto left a comment •

edited

Loading