Add support for publishing kernel statistics to oximeter · oxidecomputer/omicron@8b79a19

Commit

Add support for publishing kernel statistics to oximeter

- Add a `kstat` module in `oximeter_instruments`. This includes a trait
  for describing how to map from one or more kstats into oximeter
  targets and metrics. It can be used to convert name-value kernel
  statistics in a pretty straightforward way into oximeter samples.
- Add a `KstatSampler`, which is used to register kstat targets, and
  will periodically poll them to generate samples. It is an
  `oximeter::Producer`, so that it can be easily hooked up to produce
  data for an `oximeter` collector.
- Add targets for tracking physical, virtual, and guest datalinks.
- Add metrics for bytes in/out, packets in/out, and errors in/out for
  the above.
- Use the `KstatStampler` in a new `MetricsManager` type in the sled
  agent, and track the physical (underlay) data links on a system. Does
  not yet track any virtual or guest links. The manager can be used to
  also collect other statistics, such as HTTP request latencies similar
  to nexus, or any kstats through the sampler.

Remove local config, some cross-OS cfg directives

Update sled-agent OpenAPI for metrics reporting

Review feedback

- Make a queue per tracked kstat target, so that noisy producers don't
  impact samples from the other targets.
- Add overview doc to `oximeter_instruments::kstat`, and make most
  docstrings public for visibility.
- Delegate future impl to `Sleep` in `YieldIdAfter`, rather than
  spawning a new task.
- Don't fork to get hostname on every call, only when needed.
- Fixup to naming for test Etherstubs, to avoid invalid names.

What is time anyway?

- Improve tests by manually manipulating time to expected events.
- Switch out `tokio::time::Instant` and `chrono::DateTime<Utc>` dending
  on test configuration, so that we can move time around during tests.
- Add a queue to which test code can subscribe, onto which the sampler
  task places the actual number of samples it collects. Use this to test
  the actual number of samples / error samples we get.
- Test expiration behavior by manually stepping time

Add a time reference for converting hrtime to UTC

- Adds a `TimeReference` type for consistently converting from hrtime to
  UTC, using the mapping from a single point in time. This is stored in
  each `SampledKstat` now, and passed to `KstatTarget::to_samples()`.
  The implementer can use this to consistently derive a start time for
  their kstats now, though this isn't (can't be?) consistent if you
  remove and re-add a target.
- Add some quick sanity tests for `TimeReference`, more tests verifying
  start time consistency
- Cleanup unused code

Further improvements around kstat creation times

- Update the kstat chain more frequently, including when targets are
  added and sampled.
- Handle situation when a target signals interest in zero kstats, and
  clarify documentation. This is actually what happens when a kstat
  itself disappears and we then update the chain (previously, it was an
  error because the sampling method did _not_ update the chain). Treat
  this like an error, and increment the expiration counters. Make clear
  in the documentation that this situation is included in those
  counters.
- Add a per-kstat (not target) mapping that stores the creation times.
  These are included whenever we add a target, and also at sampling
  time, if needed. This lets us track the creation time in UTC reliably,
  while also providing it accurately to the `KstatTarget::to_samples()`
  method. These are removed only when the kstat itself goes away, which
  we check for periodically in the main `run()` loop, to avoid keeping
  them around forever.

Rebase fixup

Reword expiration after first failure

Loading branch information

bnaecker committed Nov 8, 2023

1 parent 03c7f12 commit 8b79a19

Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Cargo.toml

-Original file line number
+Diff line change
@@ Expand Up / @@ -224,6 +224,7 @@ ipcc-key-value = { path = "ipcc-key-value" } @@
     ipnetwork = { version = "0.20", features = ["schemars"] }
     itertools = "0.11.0"
     key-manager = { path = "key-manager" }
+    kstat-rs = "0.2.3"
     lazy_static = "1.4.0"
     libc = "0.2.150"
     linear-map = "1.2.0"
@@ Expand Down @@

0 comments on commit `8b79a19`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `8b79a19`

Commit

There are no files selected for viewing

0 comments on commit 8b79a19

0 comments on commit `8b79a19`