Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nexus] Project-scoped OxQL endpoint #6873

Merged
merged 9 commits into from
Dec 3, 2024
Merged

Conversation

david-crespo
Copy link
Contributor

@david-crespo david-crespo commented Oct 15, 2024

#7047 moves the existing /v1/timeseries/query endpoint (which requires the fleet viewer role) to /v1/system/timeseries/query. This PR is on top of #7047 and adds /v1/timeseries/query?project=my-project, using a sneaky trick to let us ensure that the user has read permissions on a project before we let them see metrics for that project. See #5298 (comment) for a discussion of the OxQL authz problem more broadly.

  • Add /v1/timeseries/query with required project query param
  • Integration tests showing the authz works in an expected scenarios
  • Add a list schemas endpoint that only lists schemas with a project_id field
  • Move schema filtering logic inside oximeter client
  • Fully test schema list endpoint

The trick

  1. Require the user to say (in a query param) what project they're interested in
  2. Look up that project and make sure they can read it
  3. Jam | filter silo_id == "<silo_id>" && project_id == "<project_id>" on the end of whatever query they passed in

It sounds silly, but I talked it over with @bnaecker and we couldn't find any holes. If the user tries to fetch metrics from another project inside their query, the query will end up with the filter project_id == "def" && project_id == "xyz" and the result set will always be empty. If they try to query a metric that doesn't have a project_id on it, it will error out. It works!

API design

Initially I had the endpoint as /v1/timeseries/query/project/{project}. This is really horrible. It should be /projects/, but that doesn't feel any better. I also considered /v1/projects/{project}/timeseries/query, which has some precedent:

instance_reboot POST /v1/instances/{instance}/reboot
instance_serial_console GET /v1/instances/{instance}/serial-console
instance_serial_console_stream GET /v1/instances/{instance}/serial-console/stream
instance_ssh_public_key_list GET /v1/instances/{instance}/ssh-public-keys
instance_start POST /v1/instances/{instance}/start
instance_stop POST /v1/instances/{instance}/stop

But it also feels awful.

I like the query param approach. Right now, there are only fleet-scoped metrics and project-scoped metrics, so requiring the project ID makes sense, but the query params are nicely flexible in that we could make project optional or add other optional fields and just do different auth checks based on what's passed in. Neither path nor operation ID mention projects.

Examples of fleet- and project-scoped metrics
[target]
name = "virtual_machine"
description = "A guest virtual machine instance"
authz_scope = "project"
versions = [
{ version = 1, fields = [ "instance_id", "project_id", "silo_id", "sled_id", "sled_model", "sled_revision", "sled_serial" ] },
]
[[metrics]]
name = "vcpu_usage"
description = "Cumulative time each vCPU has spent in a state"
units = "nanoseconds"
datum_type = "cumulative_u64"
versions = [
{ added_in = 1, fields = [ "state", "vcpu_id" ] }
]

[target]
name = "switch_data_link"
description = "A network data link on an Oxide switch"
authz_scope = "fleet"
versions = [
{ version = 1, fields = [ "rack_id", "sled_id", "sled_model", "sled_revision", "sled_serial", "switch_id", "switch_fab", "switch_lot", "switch_wafer", "switch_wafer_loc_x", "switch_wafer_loc_y", "switch_model", "switch_revision", "switch_serial", "switch_slot" ] },
]
[[metrics]]
name = "bytes_sent"
description = "Total number of bytes sent on the data link"
units = "bytes"
datum_type = "cumulative_u64"
versions = [
{ added_in = 1, fields = [ "port_id", "link_id" ] }
]

No list endpoint yet

I backed out the schema list endpoint. We can't list project-scoped schemas because authz_scope is not in the database! See #5942. Currently the schema list endpoint hard-codes authz_scope: Fleet.

authz_scope: AuthzScope::Fleet,

I am using the tag hidden on the query endpoint so that it goes in the OpenAPI definition (so I can use it in the console) but it will not show up on the docs site.

@david-crespo david-crespo added this to the 12 milestone Nov 4, 2024
@david-crespo david-crespo force-pushed the project-scoped-oxql branch 2 times, most recently from 02d0476 to 75b035f Compare November 6, 2024 16:49
@david-crespo david-crespo changed the title Project-scoped OxQL endpoint [nexus] Project-scoped OxQL endpoint Nov 12, 2024
@david-crespo david-crespo changed the base branch from main to system-oxql-endpoints November 12, 2024 21:27
@david-crespo david-crespo force-pushed the project-scoped-oxql branch 2 times, most recently from 8952807 to 5d69621 Compare November 13, 2024 00:18
query.as_ref(),
authz_silo.id(),
authz_project.id()
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the crux of the PR.

@david-crespo david-crespo marked this pull request as ready for review November 13, 2024 17:46
@david-crespo david-crespo force-pushed the project-scoped-oxql branch 2 times, most recently from 6d9d1fa to f3e8e0c Compare November 13, 2024 18:58
david-crespo added a commit that referenced this pull request Nov 13, 2024
Discussed this with @bnaecker a couple of weeks ago. Was going to do
this in #6873 but I decided to split it out because it's just noise. The
idea here is that (for now) rather than getting elaborate with
permissions on the OxQL endpoint a la #5298, we just keep the existing
endpoint fleet-viewer scoped and move it to
`/v1/system/timeseries/query` with all the other operator endpoints.
#6873 goes on top of this one and adds a project-scoped endpoint at
`/v1/timeseries/query?project=my-proj`.
Base automatically changed from system-oxql-endpoints to main November 13, 2024 21:00
Copy link
Collaborator

@bnaecker bnaecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for doing this! One small comment, but otherwise ready to go I think.

.await
.map(|result| result.tables)
.map_err(|e| match e {
oximeter_db::Error::DatabaseUnavailable(_) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to put the Connection error variant in here too. We get that when qorb fails to loan out a connection from its pool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll rebase on main and do that.

@david-crespo david-crespo enabled auto-merge (squash) December 3, 2024 22:46
@david-crespo david-crespo merged commit 289146b into main Dec 3, 2024
17 checks passed
@david-crespo david-crespo deleted the project-scoped-oxql branch December 3, 2024 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants