Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug-1898345: fix metrics key prefix #1020

Merged
merged 1 commit into from
May 24, 2024
Merged

bug-1898345: fix metrics key prefix #1020

merged 1 commit into from
May 24, 2024

Conversation

willkg
Copy link
Contributor

@willkg willkg commented May 24, 2024

In bec86e3, I removed setting the statsd_prefix because I thought we were adding "socorro.collector" in telegraf--not in DatadogMetrics markus backend. That was wrong and I broke stats in Socorro stage AWS.

This fixes that by setting "socorro" in the MetricsInterface per our convention and then fixing the keys to include "collector".

Then I discovered one of the tests was busted, so I fixed that. I also added metrics assertions to test_submit_crash_report.

The host value can change, so I wrote an AnyTagValue class that'll let us assert a tag was emitted with whatever value it has. I'll move that to Markus in willkg/markus#141 at some point.

In bec86e3, I removed setting the statsd_prefix because I thought we
were adding "socorro.collector" in telegraf--not in DatadogMetrics
markus backend. That was wrong and I broke stats in Socorro stage AWS.

This fixes that by setting "socorro" in the MetricsInterface per our
convention and then fixing the keys to include "collector".

Then I discovered one of the tests was busted, so I fixed that. I also
added metrics assertions to test_submit_crash_report.

The host value can change, so I wrote an AnyTagValue class that'll let
us assert a tag was emitted with whatever value it has.
@willkg willkg requested a review from a team as a code owner May 24, 2024 13:14
@willkg willkg requested a review from relud May 24, 2024 13:18
@@ -59,7 +59,7 @@


def count_sentry_scrub_error(msg):
METRICS.incr("app.sentry_scrub_error", value=1)
METRICS.incr("collector.sentry_scrub_error", value=1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the key to socorro.collector.sentry_scrub_error which is more like the other things we're doing. I'll update the dashboard once this lands.

@@ -261,7 +261,7 @@ def extract_payload(self, req):

if has_json and has_kvpairs:
# If the crash payload has both kvpairs and a JSON blob, then it's malformed
# so we add a note and log it.
# so we add a note and log it, but we don't reject it
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I clarified this here because it used to be that we rejected these crash reports, then we changed the behavior and current me wants future me to be less confused if I have to look at this again.

@@ -19,6 +19,34 @@
from testlib.mini_poster import compress, multipart_encode


class AnyTagValue:
Copy link
Contributor Author

@willkg willkg May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other PR, I did a ridiculous thing to solve the same problem this solves because I thought I couldn't solve it this way on account of how Markus filter_records works. But then I decided to try it out and it works super. I'll fix Markus and then we can remove this and the ridiculous thing I did in that other PR.

The thing this does is two fold:

  1. AnyTagValue("host") == "host:somevalue"
  2. AnyTagValue("host") will sort correctly with other strings; for example these will both sort such that the "host" tag ends up in the same place:
    foo = ["host:12345", "environment:stage"]
    foo = [AnyTagValue("host"), "environment:stage"]
    

def text_extract_payload_kvpairs_and_json(self, request_generator, metricsmock):
# If there's a JSON blob and also kv pairs, then that's a malformed
# crash
def test_extract_payload_kvpairs_and_json(self, request_generator, metricsmock):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test never ran because it started with text. This tests what happens if a crash report includes annotations in multipart format and annotations in the extra field in a JSON-encoded value.

Once I fixed the test method name, it ran and failed because it didn't account for the change in how we handle this situation. I updated this test.

@@ -446,6 +487,22 @@ def test_submit_crash_report(self, client):
"version": 2,
}

mm.assert_histogram("socorro.collector.breakpad_resource.crash_size", value=632)
Copy link
Contributor Author

@willkg willkg May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the tests assert anything about metrics which is bad. I added this section here to this test to assert what metrics should get emitted.

@willkg willkg merged commit 739b4cc into main May 24, 2024
2 checks passed
@willkg
Copy link
Contributor Author

willkg commented May 24, 2024

Thank you!

I'll update the sentry_scrub_error in the Socorro stage AWS dashboard and all the collector metrics in the Socorro stage GCP dashboard once this deploys.

@willkg willkg deleted the willkg-bug-1898345-prefix branch May 24, 2024 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants