bug-1898345: fix metrics key prefix #1020

willkg · 2024-05-24T13:14:24Z

In bec86e3, I removed setting the statsd_prefix because I thought we were adding "socorro.collector" in telegraf--not in DatadogMetrics markus backend. That was wrong and I broke stats in Socorro stage AWS.

This fixes that by setting "socorro" in the MetricsInterface per our convention and then fixing the keys to include "collector".

Then I discovered one of the tests was busted, so I fixed that. I also added metrics assertions to test_submit_crash_report.

The host value can change, so I wrote an AnyTagValue class that'll let us assert a tag was emitted with whatever value it has. I'll move that to Markus in willkg/markus#141 at some point.

In bec86e3, I removed setting the statsd_prefix because I thought we were adding "socorro.collector" in telegraf--not in DatadogMetrics markus backend. That was wrong and I broke stats in Socorro stage AWS. This fixes that by setting "socorro" in the MetricsInterface per our convention and then fixing the keys to include "collector". Then I discovered one of the tests was busted, so I fixed that. I also added metrics assertions to test_submit_crash_report. The host value can change, so I wrote an AnyTagValue class that'll let us assert a tag was emitted with whatever value it has.

willkg · 2024-05-24T13:54:38Z

antenna/app.py

@@ -59,7 +59,7 @@


 def count_sentry_scrub_error(msg):
-    METRICS.incr("app.sentry_scrub_error", value=1)
+    METRICS.incr("collector.sentry_scrub_error", value=1)


This changes the key to socorro.collector.sentry_scrub_error which is more like the other things we're doing. I'll update the dashboard once this lands.

willkg · 2024-05-24T13:57:10Z

antenna/breakpad_resource.py

@@ -261,7 +261,7 @@ def extract_payload(self, req):

        if has_json and has_kvpairs:
            # If the crash payload has both kvpairs and a JSON blob, then it's malformed
-            # so we add a note and log it.
+            # so we add a note and log it, but we don't reject it


I clarified this here because it used to be that we rejected these crash reports, then we changed the behavior and current me wants future me to be less confused if I have to look at this again.

willkg · 2024-05-24T14:00:04Z

tests/unittest/test_breakpad_resource.py

@@ -19,6 +19,34 @@
 from testlib.mini_poster import compress, multipart_encode


+class AnyTagValue:


In the other PR, I did a ridiculous thing to solve the same problem this solves because I thought I couldn't solve it this way on account of how Markus filter_records works. But then I decided to try it out and it works super. I'll fix Markus and then we can remove this and the ridiculous thing I did in that other PR.

The thing this does is two fold:

AnyTagValue("host") == "host:somevalue"

AnyTagValue("host") will sort correctly with other strings; for example these will both sort such that the "host" tag ends up in the same place:
foo = ["host:12345", "environment:stage"] foo = [AnyTagValue("host"), "environment:stage"]

willkg · 2024-05-24T14:02:03Z

tests/unittest/test_breakpad_resource.py

-    def text_extract_payload_kvpairs_and_json(self, request_generator, metricsmock):
-        # If there's a JSON blob and also kv pairs, then that's a malformed
-        # crash
+    def test_extract_payload_kvpairs_and_json(self, request_generator, metricsmock):


This test never ran because it started with text. This tests what happens if a crash report includes annotations in multipart format and annotations in the extra field in a JSON-encoded value.

Once I fixed the test method name, it ran and failed because it didn't account for the change in how we handle this situation. I updated this test.

willkg · 2024-05-24T14:02:44Z

tests/unittest/test_breakpad_resource.py

@@ -446,6 +487,22 @@ def test_submit_crash_report(self, client):
            "version": 2,
        }

+        mm.assert_histogram("socorro.collector.breakpad_resource.crash_size", value=632)


None of the tests assert anything about metrics which is bad. I added this section here to this test to assert what metrics should get emitted.

willkg · 2024-05-24T14:43:12Z

Thank you!

I'll update the sentry_scrub_error in the Socorro stage AWS dashboard and all the collector metrics in the Socorro stage GCP dashboard once this deploys.

willkg requested a review from a team as a code owner May 24, 2024 13:14

willkg requested a review from relud May 24, 2024 13:18

willkg commented May 24, 2024

View reviewed changes

relud approved these changes May 24, 2024

View reviewed changes

willkg merged commit 739b4cc into main May 24, 2024
2 checks passed

willkg deleted the willkg-bug-1898345-prefix branch May 24, 2024 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug-1898345: fix metrics key prefix #1020

bug-1898345: fix metrics key prefix #1020

willkg commented May 24, 2024

willkg May 24, 2024

willkg May 24, 2024

willkg May 24, 2024 •

edited

Loading

willkg May 24, 2024

willkg May 24, 2024 •

edited

Loading

willkg commented May 24, 2024

		@@ -19,6 +19,34 @@
		from testlib.mini_poster import compress, multipart_encode


		class AnyTagValue:

bug-1898345: fix metrics key prefix #1020

bug-1898345: fix metrics key prefix #1020

Conversation

willkg commented May 24, 2024

willkg May 24, 2024

Choose a reason for hiding this comment

willkg May 24, 2024

Choose a reason for hiding this comment

willkg May 24, 2024 • edited Loading

Choose a reason for hiding this comment

willkg May 24, 2024

Choose a reason for hiding this comment

willkg May 24, 2024 • edited Loading

Choose a reason for hiding this comment

willkg commented May 24, 2024

willkg May 24, 2024 •

edited

Loading

willkg May 24, 2024 •

edited

Loading