Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define timeseries schema in TOML #5889

Merged
merged 1 commit into from
Jun 25, 2024
Merged

Conversation

bnaecker
Copy link
Collaborator

This is the first commit in a thread of work to support updates to timeseries schema. In our first step, we're moving the definition of timeseries from the Rust structs we use today, to TOML text files. Each file describes one target and all the metrics associated with it, from which we generate exactly the same Rust code that it replaces.

This opens the door to a lot of downstream work, including well-defined updates to schema; detection of conflicting schema; improved metadata for timeseries; and generation of documentation. As an example, this moves exactly one (1) timeseries into the new format, the physical datalink statistics tracked through illumos kstats.

  • Move the oximeter crate into a private implementation crate, and re-export it at its original path
  • Add intermediate representation of timeseries schema, and code for deserializing TOML timeseries definitions, checking them for conflicts, and handling updates. Note that no actual updates are currently supported. The ingesting code includes a check that version numbers are exactly 1, except in tests. We include tests that check updates in a number of ways, but until the remainder of the work is done, we limit timeseries in the wild to their current version of 1.
  • Add a macro for consuming timeseries definitions in TOML, and generating the equivalent Rust code. Developers should use this new oximeter::use_timeseries!() proc macro to create new timeseries.
  • Add an integration test in Nexus that pulls in timeseries schema ingested with with oximeter::use_timeseries!(), and checks them all for conflicts. As developers add new timeseries in TOML format, they must be added here as well to ensure we detect problems at CI time.
  • Updates kstat-rs dep to simplify platform-specific code. This is part of this commit because we need the definitions of the physical-data-link timeseries for our integration test, but those were previously only compiled on illumos platforms. They're now included everywhere, while the tests remain illumos-only.

@bnaecker bnaecker marked this pull request as draft June 12, 2024 21:03
@bnaecker bnaecker force-pushed the a-shell-of-your-former-self branch 4 times, most recently from eff0a37 to 13796f2 Compare June 12, 2024 22:10
@bnaecker
Copy link
Collaborator Author

Note to self: make issues for sequence of work, and refer to them in this PR instead of TODO(ben).

@bnaecker bnaecker force-pushed the a-shell-of-your-former-self branch 2 times, most recently from 0933ece to 3199953 Compare June 13, 2024 15:36
@bnaecker bnaecker marked this pull request as ready for review June 14, 2024 14:59
@bnaecker bnaecker requested review from ahl, jgallagher and smklein June 14, 2024 14:59
Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the many nits and questions / thanks for your patience! I haven't done a lot with oximeter so I hope these aren't way off base.

schema/rss-sled-plan.json Show resolved Hide resolved
openapi/bootstrap-agent.json Show resolved Hide resolved
///
/// As developers define new timeseries, those should be added to this test for
/// checking. To define metrics in their own crates, developers use the
/// proc-macro `oximeter::use_timeseries!()`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We chatted about this quite a lot offline. Trying to summarize the highlights: I'd vote to centralize the TOML files and definitions of all the timeseries types, either into its own crate in omicron or its own crate in a separate repo altogether. Some pros:

  • Needing to add newly added timeseries types to this test seems really easy to forget to do, especially if the timeseries is defined outside omicron entirely. If all of them are centralized, presumably that crate can ensure that any tests it wants to run includes all of them.
  • We don't yet know exactly how we're going to support live upgrades where we might have different versions of Nexus and/or different versions of metrics producers concurrently operating within one rack, but we've talked about some ideas for how to handle that that involve tooling to check for compatibility across versions on the OpenAPI side. If we go that route and want to write similar tooling for Oximeter, my gut feeling is that will be easier to do if all the TOMLs and types are in one place.
  • Similarly, if we decide to make changes to the TOML format itself, that's much easier if they're all living in a centralized place.

The biggest downside (we think?) to centralizing them is for timeseries that are defined outside omicron; updating or adding such will require

  1. A PR to omicron to update the TOML and generate new types
  2. A PR to the outside-omicron repo to pull in the new omicron
  3. Another PR to omicron to pull in the new outside-omicron bits from 2

(This is a really unfortunate cross-repo cycle already, but I think that's the world we're living in now regardless.) If this is a major pain point but we prefer the centralized timeseries/TOMLs otherwise, that might be an argument for moving them to their own repo that both omicron and other timeseries-users could depend on.

openapi/nexus.json Show resolved Hide resolved
oximeter/impl/src/schema/codegen.rs Outdated Show resolved Hide resolved
oximeter/impl/src/schema/mod.rs Outdated Show resolved Hide resolved
oximeter/instruments/schema/physical-data-link.toml Outdated Show resolved Hide resolved
oximeter/instruments/src/kstat/mod.rs Outdated Show resolved Hide resolved
oximeter/oximeter/src/bin/oximeter-schema.rs Show resolved Hide resolved
oximeter/oximeter/src/lib.rs Show resolved Hide resolved
@bnaecker
Copy link
Collaborator Author

Ok, I believe I've addressed most of @jgallagher's comments. The big outstanding chunk I'd like to do is bite the bullet and centralize everything in one crate, probably oximeter itself. That will change the use_timeseries!() macro, since it'll just refer to the target and possibly metric, rather than the full file path. It will also obviate the integration test I have in Nexus, which I'll replace with one or more similar tests in oximeter itself.

@bnaecker
Copy link
Collaborator Author

Ok @jgallagher I've also centralized the timeseries definitions into the oximeter/oximeter/schema subdirectory. The use_timeseries!() proc-macro now takes the name of a file only, inside that subdirectory, and generates the Rust code from its contained timeseries. We can now also easily run a test in oximeter that checks the consistency across all timeseries defined in that directory. That should make it easy to detect conflicts, without developers needing to remember to add files anywhere.

Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just one note / question about how strict we want to be with schema location.

oximeter/timeseries-macro/src/lib.rs Outdated Show resolved Hide resolved
@bnaecker
Copy link
Collaborator Author

Thanks for the review @jgallagher! All the comments were really helpful. I'm going to give Adam a chance to take a look, and then I'll fix up the conflicts on the way in.

Copy link
Contributor

@ahl ahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I found a bug... or I'm confused

oximeter/impl/src/schema/codegen.rs Outdated Show resolved Hide resolved
@bnaecker
Copy link
Collaborator Author

Aight, I'm going to start rebasing on main to resolve the conflicts before smashing that merge button.

@bnaecker bnaecker force-pushed the a-shell-of-your-former-self branch from 3dddfc3 to a675cf4 Compare June 25, 2024 00:59
@bnaecker bnaecker enabled auto-merge (squash) June 25, 2024 00:59
@bnaecker bnaecker force-pushed the a-shell-of-your-former-self branch 2 times, most recently from b1be993 to 1fa8020 Compare June 25, 2024 03:18
This is the first commit in a thread of work to support updates to
timeseries schema. In our first step, we're moving the definition of
timeseries from the Rust structs we use today, to TOML text files. Each
file describes one target and all the metrics associated with it, from
which we generate exactly the same Rust code that it replaces.

This opens the door to a lot of downstream work, including well-defined
updates to schema; detection of conflicting schema; improved metadata
for timeseries; and generation of documentation. As an example, this
moves exactly one (1) timeseries into the new format, the physical
datalink statistics tracked through illumos kstats.

- Move the `oximeter` crate into a private implementation crate, and
  re-export it at its original path
- Add intermediate representation of timeseries schema, and code for
  deserializing TOML timeseries definitions, checking them for
  conflicts, and handling updates. Note that **no actual updates** are
  currently supported. The ingesting code includes a check that version
  numbers are exactly 1, except in tests. We include tests that check
  updates in a number of ways, but until the remainder of the work is
  done, we limit timeseries in the wild to their current version of 1.
  The actual file format also has version number associated with it,
  currently asserted to be 1, so that we can more easily update the
  entire format if needed.
- Add a macro for consuming timeseries definitions in TOML, and
  generating the equivalent Rust code. Developers should use this new
  `oximeter::use_timeseries!()` proc macro to create new timeseries.
- Add an integration test in `oximeter` that pulls in all the
  centrally-defined timeseries schema, and checks them all for
  conflicts. As developers add new timeseries in TOML format, they will
  be picked up automatically by this test, to detect problems at merge
  time.
- Updates `kstat-rs` dep to simplify platform-specific code. This is
  part of this commit because we need the definitions of the
  physical-data-link timeseries for our integration test, but those were
  previously only compiled on illumos platforms. They're now included
  everywhere, while the tests remain illumos-only.
@bnaecker bnaecker force-pushed the a-shell-of-your-former-self branch from 1fa8020 to 72a1a6c Compare June 25, 2024 03:34
@bnaecker bnaecker merged commit fd5f1f3 into main Jun 25, 2024
22 checks passed
@bnaecker bnaecker deleted the a-shell-of-your-former-self branch June 25, 2024 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants