Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add support for publishing kernel statistics to oximeter
- Add a `kstat` module in `oximeter_instruments`. This includes a trait for describing how to map from one or more kstats into oximeter targets and metrics. It can be used to convert name-value kernel statistics in a pretty straightforward way into oximeter samples. - Add a `KstatSampler`, which is used to register kstat targets, and will periodically poll them to generate samples. It is an `oximeter::Producer`, so that it can be easily hooked up to produce data for an `oximeter` collector. - Add targets for tracking physical, virtual, and guest datalinks. - Add metrics for bytes in/out, packets in/out, and errors in/out for the above. - Use the `KstatStampler` in a new `MetricsManager` type in the sled agent, and track the physical (underlay) data links on a system. Does not yet track any virtual or guest links. The manager can be used to also collect other statistics, such as HTTP request latencies similar to nexus, or any kstats through the sampler. Remove local config, some cross-OS cfg directives Update sled-agent OpenAPI for metrics reporting Review feedback - Make a queue per tracked kstat target, so that noisy producers don't impact samples from the other targets. - Add overview doc to `oximeter_instruments::kstat`, and make most docstrings public for visibility. - Delegate future impl to `Sleep` in `YieldIdAfter`, rather than spawning a new task. - Don't fork to get hostname on every call, only when needed. - Fixup to naming for test Etherstubs, to avoid invalid names. What is time anyway? - Improve tests by manually manipulating time to expected events. - Switch out `tokio::time::Instant` and `chrono::DateTime<Utc>` dending on test configuration, so that we can move time around during tests. - Add a queue to which test code can subscribe, onto which the sampler task places the actual number of samples it collects. Use this to test the actual number of samples / error samples we get. - Test expiration behavior by manually stepping time Add a time reference for converting hrtime to UTC - Adds a `TimeReference` type for consistently converting from hrtime to UTC, using the mapping from a single point in time. This is stored in each `SampledKstat` now, and passed to `KstatTarget::to_samples()`. The implementer can use this to consistently derive a start time for their kstats now, though this isn't (can't be?) consistent if you remove and re-add a target. - Add some quick sanity tests for `TimeReference`, more tests verifying start time consistency - Cleanup unused code Further improvements around kstat creation times - Update the kstat chain more frequently, including when targets are added and sampled. - Handle situation when a target signals interest in zero kstats, and clarify documentation. This is actually what happens when a kstat itself disappears and we then update the chain (previously, it was an error because the sampling method did _not_ update the chain). Treat this like an error, and increment the expiration counters. Make clear in the documentation that this situation is included in those counters. - Add a per-kstat (not target) mapping that stores the creation times. These are included whenever we add a target, and also at sampling time, if needed. This lets us track the creation time in UTC reliably, while also providing it accurately to the `KstatTarget::to_samples()` method. These are removed only when the kstat itself goes away, which we check for periodically in the main `run()` loop, to avoid keeping them around forever. Rebase fixup Reword expiration after first failure
- Loading branch information