Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds functionality to run oximeter standalone #4117

Merged
merged 2 commits into from
Oct 4, 2023
Merged

Conversation

bnaecker
Copy link
Collaborator

  • Adds a "standalone" mode for the oximeter-collector crate, including the binary and main inner types. This runs in a slightly different mode, in which the ClickHouse database itself isn't strictly required. In this case, a task to simply print the results will be spawned in place of the normal results-sink task which inserts records into the database.
  • Creates a tiny fake Nexus server, which includes only the API needed to register collectors and producers. This is started automatically when running oximeter standalone, and used to assign producers / collectors as the real Nexus does, but without a database. The assignments are only in memory.
  • Adds internal oximeter API for listing / deleting a producer for each oximeter collector, and an omdb subcommand which exercises the listing.

@bnaecker
Copy link
Collaborator Author

This should resolve #4063 and #3956. I added the ability to build a producer server with an existing ProducerRegistry, which means an application could start generating and tracking metrics while the producer server attempts to register with Nexus in another task. When it succeeds, it'll start using the registry to produce data.

@bnaecker bnaecker force-pushed the oximeter-standalone branch 3 times, most recently from f0a8e5c to 1c23ff9 Compare September 21, 2023 17:30
Copy link
Contributor

@jordanhendricks jordanhendricks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! Do you have examples of running in standalone mode? Seems worth putting in the PR or a README somewhere.

omdb/src/bin/omdb/main.rs Outdated Show resolved Hide resolved
oximeter/collector/src/bin/oximeter.rs Show resolved Hide resolved
oximeter/collector/src/bin/oximeter.rs Show resolved Hide resolved
oximeter/collector/src/lib.rs Show resolved Hide resolved
oximeter/collector/src/standalone.rs Show resolved Hide resolved
@bnaecker
Copy link
Collaborator Author

Thanks for the review @jordanhendricks. Here's an example of running things in standalone mode, using the example in the oximeter-producer crate. I'll also add a few notes to the one of the how-to-run docs about this as well.

bnaecker@shale : ~/omicron/oximeter/collector $ cargo r -- standalone
    Finished dev [unoptimized + debuginfo] target(s) in 0.38s
     Running `/home/bnaecker/omicron/target/debug/oximeter standalone`
Sep 26 17:39:24.130 INFO listening, local_addr: [::1]:12221, file: /home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/99cea06/dropshot/src/server.rs:195
Sep 26 17:39:24.131 INFO created standalone nexus server for metric collections, address: [::1]:12221, file: oximeter/collector/src/standalone.rs:248
Sep 26 17:39:24.131 INFO listening, local_addr: [::1]:12223, component: dropshot, component: nexus-standalone, file: /home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/99cea06/dropshot/src/server.rs:195
Sep 26 17:39:24.132 INFO started oximeter standalone server, component: nexus-standalone, file: oximeter/collector/src/lib.rs:846
Sep 26 17:39:24.134 INFO accepted connection, remote_addr: [::1]:59649, local_addr: [::1]:12221, file: /home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/99cea06/dropshot/src/server.rs:769
Sep 26 17:39:24.135 INFO request completed, latency_us: 499, response_code: 204, uri: /metrics/collectors, method: POST, req_id: 7dec887a-6925-4e97-b15e-13e3d852305f, remote_addr: [::1]:59649, local_addr: [::1]:12221, file: /home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/99cea06/dropshot/src/server.rs:853
Sep 26 17:39:25.700 INFO accepted connection, remote_addr: [::1]:39103, local_addr: [::1]:12221, file: /home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/99cea06/dropshot/src/server.rs:769
Sep 26 17:39:25.701 INFO accepted connection, remote_addr: [::1]:41212, local_addr: [::1]:12223, component: dropshot, component: nexus-standalone, file: /home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/99cea06/dropshot/src/server.rs:769
Sep 26 17:39:25.702 INFO request completed, latency_us: 225, response_code: 204, uri: /producers, method: POST, req_id: 1145ea45-6753-4d49-8801-81608ef3e45f, remote_addr: [::1]:41212, local_addr: [::1]:12223, component: dropshot, component: nexus-standalone, file: /home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/99cea06/dropshot/src/server.rs:853
Sep 26 17:39:25.702 INFO request completed, latency_us: 1652, response_code: 204, uri: /metrics/producers, method: POST, req_id: 81b6c2e4-bef9-4fb4-9db8-35029b3a3ffb, remote_addr: [::1]:39103, local_addr: [::1]:12221, file: /home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/99cea06/dropshot/src/server.rs:853
Sep 26 17:39:35.705 INFO sample: Sample { measurement: Measurement { timestamp: 2023-09-26T17:39:35.703982016Z, datum: CumulativeF64(Cumulative { start_time: 2023-09-26T17:39:25.697506800Z, value: 10.006469535 }) }, timeseries_name: "virtual_machine:cpu_busy", target: FieldSet { name: "virtual_machine", fields: {"instance_id": Field { name: "instance_id", value: Uuid(8e45e16c-cdfb-48bf-b59e-568156cf8667) }, "project_id": Field { name: "project_id", value: Uuid(e317ac46-b7d5-43b1-a3e7-f00d8730f15f) }} }, metric: FieldSet { name: "cpu_busy", fields: {"cpu_id": Field { name: "cpu_id", value: I64(0) }} } }, component: results-sink, collector_id: 580b9c7d-caf8-4bf5-a7e3-f0b1f4aca13c, component: oximeter-standalone, component: nexus-standalone, file: oximeter/collector/src/lib.rs:280
Sep 26 17:39:35.706 INFO sample: Sample { measurement: Measurement { timestamp: 2023-09-26T17:39:35.704119617Z, datum: CumulativeF64(Cumulative { start_time: 2023-09-26T17:39:25.697507160Z, value: 10.006469535 }) }, timeseries_name: "virtual_machine:cpu_busy", target: FieldSet { name: "virtual_machine", fields: {"instance_id": Field { name: "instance_id", value: Uuid(8e45e16c-cdfb-48bf-b59e-568156cf8667) }, "project_id": Field { name: "project_id", value: Uuid(e317ac46-b7d5-43b1-a3e7-f00d8730f15f) }} }, metric: FieldSet { name: "cpu_busy", fields: {"cpu_id": Field { name: "cpu_id", value: I64(1) }} } }, component: results-sink, collector_id: 580b9c7d-caf8-4bf5-a7e3-f0b1f4aca13c, component: oximeter-standalone, component: nexus-standalone, file: oximeter/collector/src/lib.rs:280
Sep 26 17:39:35.706 INFO sample: Sample { measurement: Measurement { timestamp: 2023-09-26T17:39:35.704131117Z, datum: CumulativeF64(Cumulative { start_time: 2023-09-26T17:39:25.697507340Z, value: 10.006469535 }) }, timeseries_name: "virtual_machine:cpu_busy", target: FieldSet { name: "virtual_machine", fields: {"instance_id": Field { name: "instance_id", value: Uuid(8e45e16c-cdfb-48bf-b59e-568156cf8667) }, "project_id": Field { name: "project_id", value: Uuid(e317ac46-b7d5-43b1-a3e7-f00d8730f15f) }} }, metric: FieldSet { name: "cpu_busy", fields: {"cpu_id": Field { name: "cpu_id", value: I64(2) }} } }, component: results-sink, collector_id: 580b9c7d-caf8-4bf5-a7e3-f0b1f4aca13c, component: oximeter-standalone, component: nexus-standalone, file: oximeter/collector/src/lib.rs:280
Sep 26 17:39:35.706 INFO sample: Sample { measurement: Measurement { timestamp: 2023-09-26T17:39:35.704145727Z, datum: CumulativeF64(Cumulative { start_time: 2023-09-26T17:39:25.697507470Z, value: 10.006469535 }) }, timeseries_name: "virtual_machine:cpu_busy", target: FieldSet { name: "virtual_machine", fields: {"instance_id": Field { name: "instance_id", value: Uuid(8e45e16c-cdfb-48bf-b59e-568156cf8667) }, "project_id": Field { name: "project_id", value: Uuid(e317ac46-b7d5-43b1-a3e7-f00d8730f15f) }} }, metric: FieldSet { name: "cpu_busy", fields: {"cpu_id": Field { name: "cpu_id", value: I64(3) }} } }, component: results-sink, collector_id: 580b9c7d-caf8-4bf5-a7e3-f0b1f4aca13c, component: oximeter-standalone, component: nexus-standalone, file: oximeter/collector/src/lib.rs:280

After starting this up, I ran the producer example from the oximeter-producer crate with:

bnaecker@shale : ~/omicron/oximeter/producer $ cargo r --example producer
    Finished dev [unoptimized + debuginfo] target(s) in 0.34s
     Running `/home/bnaecker/omicron/target/debug/examples/producer`
Sep 26 17:39:25.698 DEBG registered DTrace probes
Sep 26 17:39:25.699 DEBG registered endpoint, path: /collect/{producer_id}, method: GET, local_addr: [::1]:42530, component: dropshot
Sep 26 17:39:25.699 INFO listening, local_addr: [::1]:42530, component: dropshot, file: /home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/99cea06/dropshot/src/server.rs:195
Sep 26 17:39:25.699 DEBG successfully registered DTrace USDT probes, local_addr: [::1]:42530, component: dropshot
Sep 26 17:39:25.699 DEBG Requested any available port, Dropshot server has been bound to [::1]:42530
Sep 26 17:39:25.699 DEBG registering metric server as a producer
Sep 26 17:39:25.699 DEBG client request, body: Some(Body), uri: http://[::1]:12221/metrics/producers, method: POST
Sep 26 17:39:25.702 DEBG client response, result: Ok(Response { url: Url { scheme: "http", cannot_be_a_base: false, username: "", password: None, host: Some(Ipv6(::1)), port: Some(12221), path: "/metrics/producers", query: None, fragment: None }, status: 204, headers: {"x-request-id": "81b6c2e4-bef9-4fb4-9db8-35029b3a3ffb", "date": "Tue, 26 Sep 2023 17:39:25 GMT"} })
Sep 26 17:39:25.702 INFO starting oximeter metric producer server, interval: 10s, address: [::1]:42530, producer_id: 2c8b220b-d124-4046-b0e8-aced9c8ca7ca, route: /collect/2c8b220b-d124-4046-b0e8-aced9c8ca7ca, file: oximeter/producer/src/lib.rs:203
Sep 26 17:39:35.703 INFO accepted connection, remote_addr: [::1]:63822, local_addr: [::1]:42530, component: dropshot, file: /home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/99cea06/dropshot/src/server.rs:769
Sep 26 17:39:35.704 INFO request completed, latency_us: 733, response_code: 200, uri: /collect/2c8b220b-d124-4046-b0e8-aced9c8ca7ca, method: GET, req_id: 2e2cc50d-8451-4b5c-8a18-9195ff83c545, remote_addr: [::1]:63822, local_addr: [::1]:42530, component: dropshot, file: /home/bnaecker/.cargo/git/checkouts/dropshot-a4a923d29dccc492/99cea06/dropshot/src/server.rs:853

The important part is that the producer was started as normal. We still point it at nexus, which in this case is the mock inside the oximeter binary.

@bnaecker
Copy link
Collaborator Author

I'll rebase to resolve the conflicts once the review is done.

@bnaecker
Copy link
Collaborator Author

@jordanhendricks Let me know if I've addressed your comments, or if you've any other questions!

Copy link
Contributor

@jordanhendricks jordanhendricks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

- Adds a "standalone" mode for the `oximeter-collector` crate, including
  the binary and main inner types. This runs in a slightly different
  mode, in which the ClickHouse database itself isn't strictly required.
  In this case, a task to simply print the results will be spawned in
  place of the normal results-sink task which inserts records into the
  database.
- Creates a tiny fake Nexus server, which includes only the API needed
  to register collectors and producers. This is started automatically
  when running `oximeter standalone`, and used to assign producers /
  collectors as the real Nexus does, but without a database. The
  assignments are only in memory.
- Adds internal `oximeter` API for listing / deleting a producer for
  each oximeter collector, and an `omdb` subcommand which exercises the
  listing.
- Clarify language around mock `nexus`
- Add example to `how-to-run.adoc`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants