Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add virtual disk timeseries schema #6420

Merged
merged 2 commits into from
Aug 27, 2024

Conversation

bnaecker
Copy link
Collaborator

This adds a new set of timeseries that track block operations on virtual disks. This builds on and replaces the pre-existing Crucible data, adding more information about the disk and instance it's attached to. It also tracks I/O latencies and sizes in histograms.

This adds a new set of timeseries that track block operations on virtual
disks. This builds on and replaces the pre-existing Crucible data,
adding more information about the disk and instance it's attached to. It
also tracks I/O latencies and sizes in histograms.
@bnaecker bnaecker requested review from rmustacc, ahl and leftwo August 23, 2024 18:32
@bnaecker
Copy link
Collaborator Author

I've used this while testing oxidecomputer/propolis#746, which has examples of the data this publishes. But to summarize, we have

  • Cumulative counters for the number of reads, writes and flushes
  • Cumulative counters for the number of bytes read and written
  • Cumulative counters for the number of failed reads, writes and flushes. These are additionally broken out by a string indicating the failure reason.
  • Histogram of I/O latencies for reads, writes and flushes. The bins are actually defined in Publish richer virtual disk statistics to oximeter propolis#746, but they are log-linear. There are 10 bins in each power of 10, from 1 microsecond up to 10 seconds.
  • Histogram of I/O sizes for reads and writes. Again the bins are in Propolis, but there are straight power-of-two bins from 4KiB to 1GiB. The is an additional bin on either side capturing anything outside that range.

The thing I'd like some input on is related to #5267. I had originally put the sled identifiers on this timeseries. Adam raised some concerns about actionability of those fields; how they might confuse developers, who don't really know about and can't see the sleds; and that it will complicate the rough idea for an authz model. I agree with those points, but would appreicate some other perspectives too.

@bnaecker
Copy link
Collaborator Author

It occurs to me we should reduce the lowest I/O size histogram bin to 512 bytes. That's the minimum block size the control plane allows, so should be the smallest actual block operation size.

@bnaecker bnaecker enabled auto-merge (squash) August 26, 2024 22:56
@bnaecker bnaecker disabled auto-merge August 26, 2024 23:22
@bnaecker
Copy link
Collaborator Author

Holding off on this while we sort out how to cut the R10 release branch / commit with the new process.

@bnaecker
Copy link
Collaborator Author

Looks like we ran afoul #6300 of here. Going to rerun that workflow.

@bnaecker bnaecker merged commit 6207e19 into main Aug 27, 2024
22 checks passed
@bnaecker bnaecker deleted the new-virtual-disk-timeseries-definition branch August 27, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants