Skip to content

Latest commit

 

History

History
195 lines (163 loc) · 7.88 KB

demo-saga.adoc

File metadata and controls

195 lines (163 loc) · 7.88 KB

Demo saga

Nexus ships with a "demo" saga that can be used to interactively experiment with sagas, saga recovery, and saga transfer (after Nexus zone expungement). The demo saga consists of a single action that blocks until it’s instructed to proceed. You instruct it to proceed using a request to the Nexus internal API.

In the example below, we’ll:

  1. Use omicron-dev run-all to run a simulated control plane stack

  2. Start a second Nexus whose execution we can control precisely

  3. Use the omdb nexus sagas demo-create command to kick off a demo saga

  4. Use the omdb nexus sagas demo-complete command to instruct that saga to finish

For steps 1-2, we’re just following the docs for running a simulated stack and a second Nexus. First, run omicron-dev run-all:

$ cargo xtask omicron-dev run-all
...
omicron-dev: setting up all services ...
log file: /dangerzone/omicron_tmp/omicron-dev-omicron-dev.7162.0.log
note: configured to log to "/dangerzone/omicron_tmp/omicron-dev-omicron-dev.7162.0.log"
DB URL: postgresql://root@[::1]:43428/omicron?sslmode=disable
DB address: [::1]:43428
log file: /dangerzone/omicron_tmp/omicron-dev-omicron-dev.7162.2.log
note: configured to log to "/dangerzone/omicron_tmp/omicron-dev-omicron-dev.7162.2.log"
log file: /dangerzone/omicron_tmp/omicron-dev-omicron-dev.7162.3.log
note: configured to log to "/dangerzone/omicron_tmp/omicron-dev-omicron-dev.7162.3.log"
omicron-dev: services are running.
omicron-dev: nexus external API:    127.0.0.1:12220
omicron-dev: nexus internal API:    [::1]:12221
omicron-dev: cockroachdb pid:       7166
omicron-dev: cockroachdb URL:       postgresql://root@[::1]:43428/omicron?sslmode=disable
omicron-dev: cockroachdb directory: /dangerzone/omicron_tmp/.tmpkzPi6h
omicron-dev: internal DNS HTTP:     http://[::1]:55952
omicron-dev: internal DNS:          [::1]:36474
omicron-dev: external DNS name:     oxide-dev.test
omicron-dev: external DNS HTTP:     http://[::1]:64396
omicron-dev: external DNS:          [::1]:35977
omicron-dev:   e.g. `dig @::1 -p 35977 test-suite-silo.sys.oxide-dev.test`
omicron-dev: management gateway:    http://[::1]:33325 (switch0)
omicron-dev: management gateway:    http://[::1]:61144 (switch1)
omicron-dev: silo name:             test-suite-silo
omicron-dev: privileged user name:  test-privileged

Then follow those docs to configure and start a second Nexus:

$ cargo run --bin=nexus -- config-second.toml
...
Aug 12 20:16:25.405 INFO listening, local_addr: [::1]:12223, component: dropshot_internal, name: a4ef738a-1fb0-47b1-9da2-4919c7ec7c7f, file: /home/dap/.cargo/git/checkouts/dropshot-a4a923d29dccc492/52d900a/dropshot/src/server.rs:205
...

The rest of these instructions will use omdb pointed at the second Nexus instance, so we’ll set OMDB_NEXUS_URL in the environment:

$ export OMDB_NEXUS_URL=http://[::1]:12223

Now we can use omdb nexus sagas list to list the sagas that have run in that second Nexus process only:

$ cargo run --bin=omdb -- nexus sagas list
...
note: using Nexus URL http://[::1]:12223
NOTE: This command only reads in-memory state from the targeted Nexus instance.
Sagas may be missing if they were run by a different Nexus instance or if they
finished before this Nexus instance last started up.
SAGA_ID STATE

Now we can create a demo saga:

$ cargo run --bin=omdb -- --destructive nexus sagas demo-create
...
note: using Nexus URL http://[::1]:12223
saga id:      f7765d6a-6e45-4c13-8904-2677b79a97eb
demo saga id: 88eddf09-dda3-4d70-8d99-1d3b441c57da (use this with `demo-complete`)

We have to use the --destructive option because this command by nature changes state in Nexus and omdb won’t allow commands that change state by default.

We can see the new saga in the list of sagas now. It’s running:

$ cargo run --bin=omdb -- nexus sagas list
...
note: using Nexus URL http://[::1]:12223
NOTE: This command only reads in-memory state from the targeted Nexus instance.
Sagas may be missing if they were run by a different Nexus instance or if they
finished before this Nexus instance last started up.
SAGA_ID                              STATE
f7765d6a-6e45-4c13-8904-2677b79a97eb running

and it will stay running indefinitely until we run demo-complete. Let’s do that:

$ cargo run --bin=omdb -- --destructive nexus sagas demo-complete 88eddf09-dda3-4d70-8d99-1d3b441c57da
...
note: using Nexus URL http://[::1]:12223

and then list sagas again:

$ cargo run --bin=omdb -- nexus sagas list
...
note: using Nexus URL http://[::1]:12223
NOTE: This command only reads in-memory state from the targeted Nexus instance.
Sagas may be missing if they were run by a different Nexus instance or if they
finished before this Nexus instance last started up.
SAGA_ID                              STATE
f7765d6a-6e45-4c13-8904-2677b79a97eb succeeded

It works across recovery, too. You can go through the same loop again, but this time kill Nexus and start it again:

$ cargo run --bin=omdb -- --destructive nexus sagas demo-create
...
note: using Nexus URL http://[::1]:12223
saga id:      65253cb6-4428-4aa7-9afc-bf9b42166cb5
demo saga id: 208ebc89-acc6-42d3-9f40-7f5567c8a39b (use this with `demo-complete`)

Now restart Nexus (^C the second invocation and run it again). Now if we use omdb we don’t see the earlier saga because it was finished when this new Nexus process started. But we see the one we created later because it was recovered:

$ cargo run --bin=omdb -- nexus sagas list
...
note: using Nexus URL http://[::1]:12223
NOTE: This command only reads in-memory state from the targeted Nexus instance.
Sagas may be missing if they were run by a different Nexus instance or if they
finished before this Nexus instance last started up.
SAGA_ID                              STATE
65253cb6-4428-4aa7-9afc-bf9b42166cb5 running

Side note: we can see it was recovered:

$ cargo run --bin=omdb -- nexus background-tasks show
...
task: "saga_recovery"
  configured period: every 10m
  currently executing: no
  last completed activation: iter 1, triggered by a periodic timer firing
    started at 2024-08-12T20:20:41.714Z (44s ago) and ran for 79ms
    since Nexus started:
        sagas recovered:           1
        sagas recovery errors:     0
        sagas observed started:    0
        sagas inferred finished:   0
        missing from SEC:          0
        bad state in SEC:          0
    last pass:
        found sagas:   1 (in-progress, assigned to this Nexus)
        recovered:     1 (successfully)
        failed:        0
        skipped:       0 (already running)
        removed:       0 (newly finished)
    recently recovered sagas (1):
        TIME                 SAGA_ID
        2024-08-12T20:20:41Z 65253cb6-4428-4aa7-9afc-bf9b42166cb5
    no saga recovery failures
...

Now we can complete that saga:

$ cargo run --bin=omdb -- --destructive nexus sagas demo-complete 208ebc89-acc6-42d3-9f40-7f5567c8a39b
...
note: using Nexus URL http://[::1]:12223

and see it finish:

$ cargo run --bin=omdb -- nexus sagas list
...
note: using Nexus URL http://[::1]:12223
NOTE: This command only reads in-memory state from the targeted Nexus instance.
Sagas may be missing if they were run by a different Nexus instance or if they
finished before this Nexus instance last started up.
SAGA_ID                              STATE
65253cb6-4428-4aa7-9afc-bf9b42166cb5 succeeded

Note too that the completion is not synchronous with the demo-complete command, though it usually is pretty quick. It’s possible you’ll catch it running if you run nexus sagas list right after running nexus sagas demo-complete, but you should quickly see it succeeded if you keep running nexus sagas list.