You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue tracks the work required to support updating timeseries schema.
Background
The current implementation of oximeter makes it impossible to update timeseries schema. As oximeter collects samples from producers, it derives a schema for them. It then checks for an existing schema in the ClickHouse database (based on the timeseries name only). If one does not exist, the schema is inserted. If one does exist, every sample thereafter must match that inserted schema.
RFD 467 (internal-only) discusses two options for making this a reality. A few outstanding questions remain, but a lot of the work can be done now. Here are the
Describe timeseries schema in text files, rather than code. This lets us describe updates and attach much more metadata to the schema, independent of the stream of samples oximeter collects. The most important metadata is the version of each timeseries, though more will be included.
Track timeseries schema in CockroachDB. They will likely remain in ClickHouse to make querying via oxql easier, but the schema will be sent to oximeter from nexus, rather than derived from each sample. The simplest approach here would be to load these when nexus starts up, though we may also want individual producers to send them at registration time.
Update ClickHouse database to include timeseries version information. This is essentially attaching a version column to every table, in addition the current timeseries_name and timeseries_key column. This will require changing the sorting key and timeseries keys for all samples, meaning we need to drop all the historical data unfortunately.
This is broken down into individual work items below.
Tickets
Define timeseries schema in TOML #5889. This defines timeseries schema in text files, rather than code. Instead the equivalent code is generated from the text files, which also contain a good bit more metadata than we currently maintain about the schema.
Populate ClickHouse timeseries schema from TOML definitions #5942. The schema need to be in ClickHouse to support oxql queries, and more generally understanding the data in a self-contained way, when one only has the ClickHouse database. As part of this, we'll update the oximeter database schema to include the new metadata. This will unfortunately require a full drop of the DB and all historical data, since we're changing the sorting key of the tables and timeseries keys on all the samples.
Move existing timeseries schema into the central library. This is really a bunch of smaller issues in each producer repo.
There are a few things that will need to be fleshed out as we implement this. These are left as open questions on RFD 467. This includes:
Do we have exactly one current version of a timeseries schema at any one time? The alternative would be supporting more than one, such as by returning data from schema versions consistent with a query; or allowing / requiring that users query a specific version of a timeseries. If we do have one version, we need to do some kind of data migration in ClickHouse, such as adding or removing fields on old samples.
How do we support backwards-incompatible changes?
How do schema get into CockroachDB? One approach would be to load them on startup from Nexus, similar to how we load other fixed data like VPC Firewall rules. An alternative is having producers include them at registration time. I'm not sure what the right approach is.
More details
In addition to RFD 467, I wrote up this draft issue (internal-only).
The text was updated successfully, but these errors were encountered:
This issue tracks the work required to support updating timeseries schema.
Background
The current implementation of
oximeter
makes it impossible to update timeseries schema. Asoximeter
collects samples from producers, it derives a schema for them. It then checks for an existing schema in the ClickHouse database (based on the timeseries name only). If one does not exist, the schema is inserted. If one does exist, every sample thereafter must match that inserted schema.RFD 467 (internal-only) discusses two options for making this a reality. A few outstanding questions remain, but a lot of the work can be done now. Here are the
oximeter
collects. The most important metadata is the version of each timeseries, though more will be included.oxql
easier, but the schema will be sent tooximeter
fromnexus
, rather than derived from each sample. The simplest approach here would be to load these whennexus
starts up, though we may also want individual producers to send them at registration time.version
column to every table, in addition the currenttimeseries_name
andtimeseries_key
column. This will require changing the sorting key and timeseries keys for all samples, meaning we need to drop all the historical data unfortunately.This is broken down into individual work items below.
Tickets
oxql
queries, and more generally understanding the data in a self-contained way, when one only has the ClickHouse database. As part of this, we'll update theoximeter
database schema to include the new metadata. This will unfortunately require a full drop of the DB and all historical data, since we're changing the sorting key of the tables and timeseries keys on all the samples.There are a few things that will need to be fleshed out as we implement this. These are left as open questions on RFD 467. This includes:
More details
In addition to RFD 467, I wrote up this draft issue (internal-only).
The text was updated successfully, but these errors were encountered: