Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe logging of problems in incoming events [was: Malformed name tag - Log level should be higher] #552

Open
Cruuncher opened this issue Apr 1, 2024 · 2 comments
Labels
enhancement help wanted PRs for this issue are especially welcome

Comments

@Cruuncher
Copy link

The following log is currently set at debug level:

level.Debug(logger).Log("msg", "Malformed name tag", "tag", tag, "component", component)

This log is very useful for debugging missing tags in metrics. However to get it we have to set statsd-exporter at debug level logging, which also brings in the "incoming line" log which logs way too much causing issues for us.

Can we increase this log to at least info level? Debug doesn't make sense to me for such important info

@Cruuncher
Copy link
Author

I can open the PR if the change is approved

@matthiasr
Copy link
Contributor

The reason we set it at debug level is because it might emit a log line for more or less every incoming event. As a general rule, we've kept everything O(incoming) out of info-level logs, to avoid logging volume (= cost) surprises in normal operation.

Some people run this exporter with a lot of events coming in, logging a significant fraction of these for even a short period across the whole fleet may be wildly expensive. On the other hand, you're right that we are not giving them a lot of options to find out what's wrong without turning on the firehose on at least a subset.

If you're up for it, I would love to have rate limited logging of problematic event lines. What I roughly imagine is that we identify the non-varying parts of the log line, and only emit one line per some time period for each of these. We'll probably want some settings to tune that, since what's acceptable may vary a lot, but as a default I would say one log line per metric cardinality per minute would be a good starting point. Ideally we'd have some indication how many lines we've rate limited away when we do finally flush them.

In general, this would allow us to elevate many of the "something is wrong with this event" logs to Warning, so they're shown and stand out by default.

On this log line in particular:

  • how useful is it in isolation? How much do you need to know about the full incoming event to make sense of it?
  • In this function, there's two uses of the "malformed name tag" message, making it indistinguishable which error path you hit. When touching this, please give them distinct messages.

@matthiasr matthiasr added enhancement help wanted PRs for this issue are especially welcome labels Aug 18, 2024
@matthiasr matthiasr changed the title Malformed name tag - Log level should be higher Safe logging of problems in incoming events [was: Malformed name tag - Log level should be higher] Aug 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement help wanted PRs for this issue are especially welcome
Projects
None yet
Development

No branches or pull requests

2 participants