Add virtual disk timeseries schema #6420

bnaecker · 2024-08-23T17:12:32Z

This adds a new set of timeseries that track block operations on virtual disks. This builds on and replaces the pre-existing Crucible data, adding more information about the disk and instance it's attached to. It also tracks I/O latencies and sizes in histograms.

bnaecker · 2024-08-23T18:38:36Z

I've used this while testing oxidecomputer/propolis#746, which has examples of the data this publishes. But to summarize, we have

Cumulative counters for the number of reads, writes and flushes
Cumulative counters for the number of bytes read and written
Cumulative counters for the number of failed reads, writes and flushes. These are additionally broken out by a string indicating the failure reason.
Histogram of I/O latencies for reads, writes and flushes. The bins are actually defined in Publish richer virtual disk statistics to oximeter propolis#746, but they are log-linear. There are 10 bins in each power of 10, from 1 microsecond up to 10 seconds.
Histogram of I/O sizes for reads and writes. Again the bins are in Propolis, but there are straight power-of-two bins from 4KiB to 1GiB. The is an additional bin on either side capturing anything outside that range.

The thing I'd like some input on is related to #5267. I had originally put the sled identifiers on this timeseries. Adam raised some concerns about actionability of those fields; how they might confuse developers, who don't really know about and can't see the sleds; and that it will complicate the rough idea for an authz model. I agree with those points, but would appreicate some other perspectives too.

bnaecker · 2024-08-23T18:42:20Z

It occurs to me we should reduce the lowest I/O size histogram bin to 512 bytes. That's the minimum block size the control plane allows, so should be the smallest actual block operation size.

oximeter/oximeter/schema/virtual-disk.toml

bnaecker · 2024-08-26T23:31:53Z

Holding off on this while we sort out how to cut the R10 release branch / commit with the new process.

bnaecker · 2024-08-27T04:31:30Z

Looks like we ran afoul #6300 of here. Going to rerun that workflow.

bnaecker requested review from rmustacc, ahl and leftwo August 23, 2024 18:32

This was referenced Aug 23, 2024

Update oximeter dependency oxidecomputer/crucible#1429

Merged

Publish richer virtual disk statistics to oximeter oxidecomputer/propolis#746

Merged

[network metrics] instance network interface schema #6414

Merged

leftwo approved these changes Aug 26, 2024

View reviewed changes

bnaecker commented Aug 26, 2024

View reviewed changes

oximeter/oximeter/schema/virtual-disk.toml Outdated Show resolved Hide resolved

Slightly better descriptions

9806070

bnaecker enabled auto-merge (squash) August 26, 2024 22:56

bnaecker disabled auto-merge August 26, 2024 23:22

bnaecker merged commit 6207e19 into main Aug 27, 2024
22 checks passed

bnaecker deleted the new-virtual-disk-timeseries-definition branch August 27, 2024 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add virtual disk timeseries schema #6420

Add virtual disk timeseries schema #6420

bnaecker commented Aug 23, 2024

bnaecker commented Aug 23, 2024

bnaecker commented Aug 23, 2024

bnaecker commented Aug 26, 2024

bnaecker commented Aug 27, 2024

Add virtual disk timeseries schema #6420

Add virtual disk timeseries schema #6420

Conversation

bnaecker commented Aug 23, 2024

bnaecker commented Aug 23, 2024

bnaecker commented Aug 23, 2024

bnaecker commented Aug 26, 2024

bnaecker commented Aug 27, 2024