-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug-1898345: fix metrics key prefix #1020
Conversation
In bec86e3, I removed setting the statsd_prefix because I thought we were adding "socorro.collector" in telegraf--not in DatadogMetrics markus backend. That was wrong and I broke stats in Socorro stage AWS. This fixes that by setting "socorro" in the MetricsInterface per our convention and then fixing the keys to include "collector". Then I discovered one of the tests was busted, so I fixed that. I also added metrics assertions to test_submit_crash_report. The host value can change, so I wrote an AnyTagValue class that'll let us assert a tag was emitted with whatever value it has.
@@ -59,7 +59,7 @@ | |||
|
|||
|
|||
def count_sentry_scrub_error(msg): | |||
METRICS.incr("app.sentry_scrub_error", value=1) | |||
METRICS.incr("collector.sentry_scrub_error", value=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the key to socorro.collector.sentry_scrub_error
which is more like the other things we're doing. I'll update the dashboard once this lands.
@@ -261,7 +261,7 @@ def extract_payload(self, req): | |||
|
|||
if has_json and has_kvpairs: | |||
# If the crash payload has both kvpairs and a JSON blob, then it's malformed | |||
# so we add a note and log it. | |||
# so we add a note and log it, but we don't reject it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I clarified this here because it used to be that we rejected these crash reports, then we changed the behavior and current me wants future me to be less confused if I have to look at this again.
@@ -19,6 +19,34 @@ | |||
from testlib.mini_poster import compress, multipart_encode | |||
|
|||
|
|||
class AnyTagValue: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the other PR, I did a ridiculous thing to solve the same problem this solves because I thought I couldn't solve it this way on account of how Markus filter_records
works. But then I decided to try it out and it works super. I'll fix Markus and then we can remove this and the ridiculous thing I did in that other PR.
The thing this does is two fold:
AnyTagValue("host") == "host:somevalue"
AnyTagValue("host")
will sort correctly with other strings; for example these will both sort such that the "host" tag ends up in the same place:foo = ["host:12345", "environment:stage"] foo = [AnyTagValue("host"), "environment:stage"]
def text_extract_payload_kvpairs_and_json(self, request_generator, metricsmock): | ||
# If there's a JSON blob and also kv pairs, then that's a malformed | ||
# crash | ||
def test_extract_payload_kvpairs_and_json(self, request_generator, metricsmock): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test never ran because it started with text
. This tests what happens if a crash report includes annotations in multipart format and annotations in the extra
field in a JSON-encoded value.
Once I fixed the test method name, it ran and failed because it didn't account for the change in how we handle this situation. I updated this test.
@@ -446,6 +487,22 @@ def test_submit_crash_report(self, client): | |||
"version": 2, | |||
} | |||
|
|||
mm.assert_histogram("socorro.collector.breakpad_resource.crash_size", value=632) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the tests assert anything about metrics which is bad. I added this section here to this test to assert what metrics should get emitted.
Thank you! I'll update the sentry_scrub_error in the Socorro stage AWS dashboard and all the collector metrics in the Socorro stage GCP dashboard once this deploys. |
In bec86e3, I removed setting the
statsd_prefix
because I thought we were adding"socorro.collector"
in telegraf--not in DatadogMetrics markus backend. That was wrong and I broke stats in Socorro stage AWS.This fixes that by setting
"socorro"
in theMetricsInterface
per our convention and then fixing the keys to include"collector"
.Then I discovered one of the tests was busted, so I fixed that. I also added metrics assertions to
test_submit_crash_report
.The host value can change, so I wrote an
AnyTagValue
class that'll let us assert a tag was emitted with whatever value it has. I'll move that to Markus in willkg/markus#141 at some point.