-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RemoteConfigStatus cannot return status per configuration passed in AgentRemoteConfig AgentConfigMap #144
Comments
The original discussion was about the agent health and the request was about being able to report more details about the health. This issue seems to address a different problem - the ability to report more fine grained status/errors about the configs received via AgentRemoteConfig message. I think I am confused about why we think this is the same problem. |
Discussed this in workgroup meeting today and decided that we would like to see more specific examples and use cases for this capability. We will likely be able to add this additional per config status before or after 1.0 release in a backward compatible way, as an optional field. But before we do that we need to understand the use cases better. |
Agreed that the problem that was originally discussed isn't necessarily coupled to AgentConfigMap, though depending on the agent, its health statuses may correlate well to separate config files. We probably should decouple the initial conversation from this. In Elastic Agent, we currently report health at a few different levels of granularity. Regardless of which level of granularity we report the same two fields:
This is pretty similar to Here are the levels of granularity we report these on:
This allows us to answer questions like:
The reasons we prefer to include this information in the agent management protocol instead of shipping it directly to the telemetry backend is three-fold, though they are related reasons:
All of that said, there does need to be some "line in the sand" here and I think we should try to discuss where that should be. For instance, there are all kinds of metrics that a user may want to filter agents based on, such as write throughput ("give me all agents writing more than 10k EPS"). I don't think it's feasible to report all of these on the management protocol. From my perspective, we could draw the line at health status, since I think this is the most useful for aggregating on in the UI / management layer. Aggregations are less useful for metrics, and filtering may be good enough for metrics. As I mentioned above, I think aggregating across two different datastores is more difficult and health status is likely common to be aggregated on.
|
We are preparing to declare stability of the spec soon. I would prefer not to make big changes to the spec before the stability is declared. Additive changes of course will be allowed to the spec after it is declared stable. What I think is important to do now is to understand if addition of the extra details about per-component status can be done in non-breaking manner, as an additive change. From my cursory reading and incomplete understanding of the use case it appears to be possible. If anyone thinks that this this not the case please speak up, otherwise we will postpone the resolution of this issue until OpAMP spec 1.0 is released. |
Discussed in the workgroup meeting today and decided to postpone this unless we hear new arguments about why this is needed before 1.0 release. |
It was mentioned in this discussion that it would be useful to be able to associate the status in RemoteConfigStatus with a particular configuration supplied in AgentRemoteConfig. Currently there is only one top-level status and error_message.
One solution would add a corresponding map to RemoteConfigStatus where each individual status could be reported.
The text was updated successfully, but these errors were encountered: