Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database_observability: report health of component and collectors #2392

Merged
merged 2 commits into from
Jan 14, 2025

Conversation

cristiangreco
Copy link
Collaborator

@cristiangreco cristiangreco commented Jan 13, 2025

PR Description

Report unhealthy in case of errors when starting up the collectors or of any collector is stopped during operations.

Which issue(s) this PR fixes

n.a.

Notes to the Reviewer

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

@cristiangreco cristiangreco force-pushed the cristian/dbo11y-components-health branch from 6e00e77 to b941a40 Compare January 13, 2025 11:31
@@ -127,7 +132,7 @@ func (c *QuerySample) fetchQuerySamples(ctx context.Context) error {
}

if strings.HasSuffix(sampleText, "...") {
level.Info(c.logger).Log("msg", "skipping parsing truncated query", "digest", digest)
level.Debug(c.logger).Log("msg", "skipping parsing truncated query", "digest", digest)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by remove noisy info log

Comment on lines +174 to +178
if len(schemas) == 0 {
level.Info(c.logger).Log("msg", "no schema detected from information_schema.schemata")
return nil
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by: log if no schema is detected

@cristiangreco cristiangreco force-pushed the cristian/dbo11y-components-health branch from b941a40 to dc57779 Compare January 13, 2025 11:36
Report unhealthy in case of errors when starting up the collectors or
of any collector is stopped during operations.
@cristiangreco cristiangreco force-pushed the cristian/dbo11y-components-health branch from dc57779 to 2d5c5de Compare January 13, 2025 11:39
@cristiangreco cristiangreco marked this pull request as ready for review January 13, 2025 11:49
@cristiangreco cristiangreco requested review from matthewnolf and a team as code owners January 13, 2025 11:49
Copy link
Contributor

@wildum wildum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I just suggested a different approach to give the collectors more flexibility on their health status but feel free to ignore

Start(context.Context) error
Stopped() bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, you might have collectors that can be considered unhealthy but are still running. A different approach to support this would be to have a CurrentHealth function in the collector interface that returns the health object. Then you would not need the healthErr attribute anymore, you would just call CurrentHealth on all the collectors in the CurrentHealth function of the component.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's a great point. I wanted to start simple for now, as collectors are anyway not resilient at all (they'll stop as soon as any error is hit). Agree that in the future we might want to delegate the logic to the collectors themselves.

@cristiangreco cristiangreco enabled auto-merge (squash) January 14, 2025 08:30
@cristiangreco cristiangreco merged commit 55d952e into main Jan 14, 2025
18 checks passed
@cristiangreco cristiangreco deleted the cristian/dbo11y-components-health branch January 14, 2025 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants