Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for updating timeseries schema #5941

Open
1 of 5 tasks
bnaecker opened this issue Jun 24, 2024 · 0 comments
Open
1 of 5 tasks

Tracking issue for updating timeseries schema #5941

bnaecker opened this issue Jun 24, 2024 · 0 comments
Assignees
Labels

Comments

@bnaecker
Copy link
Collaborator

bnaecker commented Jun 24, 2024

This issue tracks the work required to support updating timeseries schema.

Background

The current implementation of oximeter makes it impossible to update timeseries schema. As oximeter collects samples from producers, it derives a schema for them. It then checks for an existing schema in the ClickHouse database (based on the timeseries name only). If one does not exist, the schema is inserted. If one does exist, every sample thereafter must match that inserted schema.

RFD 467 (internal-only) discusses two options for making this a reality. A few outstanding questions remain, but a lot of the work can be done now. Here are the

  • Describe timeseries schema in text files, rather than code. This lets us describe updates and attach much more metadata to the schema, independent of the stream of samples oximeter collects. The most important metadata is the version of each timeseries, though more will be included.
  • Track timeseries schema in CockroachDB. They will likely remain in ClickHouse to make querying via oxql easier, but the schema will be sent to oximeter from nexus, rather than derived from each sample. The simplest approach here would be to load these when nexus starts up, though we may also want individual producers to send them at registration time.
  • Update ClickHouse database to include timeseries version information. This is essentially attaching a version column to every table, in addition the current timeseries_name and timeseries_key column. This will require changing the sorting key and timeseries keys for all samples, meaning we need to drop all the historical data unfortunately.

This is broken down into individual work items below.

Tickets

There are a few things that will need to be fleshed out as we implement this. These are left as open questions on RFD 467. This includes:

  • Do we have exactly one current version of a timeseries schema at any one time? The alternative would be supporting more than one, such as by returning data from schema versions consistent with a query; or allowing / requiring that users query a specific version of a timeseries. If we do have one version, we need to do some kind of data migration in ClickHouse, such as adding or removing fields on old samples.
  • How do we support backwards-incompatible changes?
  • How do schema get into CockroachDB? One approach would be to load them on startup from Nexus, similar to how we load other fixed data like VPC Firewall rules. An alternative is having producers include them at registration time. I'm not sure what the right approach is.

More details

In addition to RFD 467, I wrote up this draft issue (internal-only).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant