-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider the interplay between metrics expiration and handles. #314
Comments
Possible answer/solution: warn people about it.It could be sufficient to warn people that when using metrics handles, they cannot or should not drop idle metrics. For use cases where the metric cardinality doesn't grow over time, and using the "idle timeout" feature of This is only sufficient in those use cases, though. |
Possible answer/solution: provide a way to register handles in a strong or weak fashion.As a play on the strong versus weak pointers of The registry would expose this, allowing exporters to understand what metrics they should or shouldn't remove. This is still susceptible to problems with unbounded cardinality, and requires knowing upfront that your strong handles -- |
Possible answer/solution: allow handles to re-register.Given that the recorder is exposed such that there's always a When trying to update a handle, we would attempt to upgrade the weak reference in order to actually access the storage. If that failed, we would re-register ourselves with the recorder, acquiring a new weak reference. Biggest downsides here that I can think of are:
Other thoughts:
|
It's been an incredibly long time such I've touched this, but recently I've poked around at ways to try and solve this. What hasn't worked (so far)First up, I tried to approach this the most complex angle, which is essentially writing our own While it's certainly possible to do this in some way, the biggest issue is that we would need to have a deeply specialized implementation here, and we would likely have to do some very unergonomic things, like exposing virtual tables for each metric type, and writing a lot of specialized code to handle both the happy path of loading the smart pointer, as well as correctly replacing it while under concurrent accesses. Most of this is hinged on the bit about virtual tables: we can't carry generic types in the metric handles, so we need What could workWhat would likely work just fine is double indirection in the form of something like I've been avoiding this because it just feels icky and slow, but I'm probably prematurely optimizing. Practically, it handles the two biggest constraints: atomic access to the underlying data in a way that allows for expiration (strong vs weak references) and the ability to atomically update the storage pointer held by metric handles when they need to re-register since we can handle a thin pointer atomically without issue. The biggest (relatively speaking) downside to this approach is that it doesn't have a trivial way to mark the storage pointer as "closed". Essentially, we want the registry to be able to mark the storage as closed when it decides that a metric is idle, or if it decides to forcefully remove the metric, so that a constant stream of activity can't keep it active in a way that prevents dropping the That scenario is only relevant during expiration (where, you know, if it's gotten to that point, the metric obviously isn't going to be highly contended) and forceful removal. Forceful removal isn't currently a thing, but has been requested. Despite these potential issues, it's likely the simplest approach to take in the meantime because it actively solves a real problem and in a way that doesn't require us to compromise in terms of MSRV or additional dark magic atomics/unsafe code, etc. |
Chatting with a user about the idle timeout support in
metrics-exporter-prometheus
made me realize that there's one particular -- maybe fatal? -- flaw with the handle-based design: orphaned metrics.Consider a normal invocation of
counter!("key", 1)
:As is, with the handle-based design, that handle is wrapping an
Arc<T>
, but we query the handle right before modifying it. Either the metric already exists, or it doesn't -- either because it's brand-new or because it was previously expired and removed from the registry -- but we always get a valid handle as of the time of the call to the registry.Now, consider when we take a handle to a metric via
register_counter!(...)
. The steps to do so are nearly identical to before, specifically in that we query the registry, and the handle is either found if it exists, or is created if it doesn't... but then we never query for it again. If the metric is removed from the registry, the underlying storage may live on (Arc<T>
) but it is disconnected from the registry where it originated.The text was updated successfully, but these errors were encountered: