Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rework background task initialization #5962

Merged
merged 36 commits into from
Jun 29, 2024
Merged

Conversation

davepacheco
Copy link
Collaborator

@davepacheco davepacheco commented Jun 27, 2024

The goal here is to be able to initialize enough of the background task subsystem to stuff it into Nexus before actually starting the tasks. That will allow the tasks themselves to be able to receive an Arc<Nexus>, which in turn would allow them to do things like kick off sagas directly and do saga recovery.

There are two main motivations:

  • The current "saga recovery" process could be folded into the background task subsystem. This would be nice because in order to implement nexus expungement: assign an old Nexus instance's sagas to another Nexus instance #5136 we'll want to re-run saga recovery after Nexus may have been up for a while and the background tasks subsystem would let us just re-activate the task and not otherwise have to worry about stacked activations, etc. But in order to do this, the background task itself needs access to the Arc<Nexus> to construct a SagaContext in order to resume sagas. (This is the same problem as allowing background tasks to run sagas directly. We solved that differently using a channel. We could do something similar here, but the solution in this PR is much more general -- see below.)
  • nexus expungement: assign an old Nexus instance's sagas to another Nexus instance #5136 will also require that we activate that new saga recovery background task from the blueprint executor, which is currently not easy to do conditionally.

More generally, this approach should make it possible for:

  • background tasks to activate other background tasks
  • background tasks to directly use other Nexus subsystems (like sagas) that themselves may want to activate background tasks

This design came out of a discussion Friday with several folks about how to better structure Nexus.

I hope to do some follow-on work in subsequent PRs: seeing if we can remove that saga channel (that would prove that this does what we think it should do), seeing if we can remove the special v2p channel, and eventually move saga recovery into a proper background task, and move on to #5136.

@davepacheco davepacheco marked this pull request as ready for review June 27, 2024 17:22
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I left a lot of suggestions, but they're all quite minor.

dev-tools/omdb/tests/successes.out Show resolved Hide resolved
nexus/src/app/background/driver.rs Show resolved Hide resolved
nexus/src/app/background/driver.rs Outdated Show resolved Hide resolved
nexus/src/app/background/driver.rs Outdated Show resolved Hide resolved
nexus/src/app/background/driver.rs Show resolved Hide resolved
task_internal_dns_propagation: Activator,
task_external_dns_propagation: Activator,

// Data exposed by various background tasks to the rest of Nexus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a typo:

Suggested change
// Data exposed by various background tasks to the rest of Nexus
/// Data exposed by various background tasks to the rest of Nexus

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, it's not: there are three categories of fields in this struct and non-doc comments describe each section. Right now there's only one thing in this bucket (but I expect we'll have more).

nexus/src/app/background/init.rs Show resolved Hide resolved

// The following fields can be safely ignored here because they're
// already wired up as needed.
external_endpoints: _external_endpoints,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, take it or leave it: i think this can just be an _ since we don't need the binding to live until the function returns?

Suggested change
external_endpoints: _external_endpoints,
external_endpoints: _,

Comment on lines +664 to +666
task_config: &Activator,
task_servers: &Activator,
task_propagation: &Activator,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always going to call out cases where we take multiple positional arguments of the same type, since it's easy to swap them by accident. Since this isn't an API that's called externally to this module, it's not as big a concern: this is only called once, and the arguments are passed in the right order, and we don't really expect new calls to this function will be added. Adding an args struct so that we can name these arguments just for one call seems kinda like overkill.

However, I can think of a pretty simple way to change this so we don't have to pass these as positionals that can be swapped, and also make the clippy::too_many_arguments go away: why not just change this function to a method on &BackgroundTasks (or pass a &BackgroundTasks argument if we don't want to re-indent the entire function in order to make it a method?). That way, this code can refer to the various activators by name, and the caller can't mix them up. I don't think this function also being able to touch the other activators is a problem, since it can just...not do that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that this function is called twice with two different sets of activators (internal DNS and then external DNS). The first time, the caller uses BackgroundTasks.task_internal_dns_config, while the second time it uses BackgroundTasks.task_external_dns_config. Same for the other two Activator arguments. So I think these have to be parameters.

Comment on lines 518 to 525
if let Err(_) =
task_nexus.background_tasks_driver.set(driver)
{
panic!(
"concurrent initialization of \
background_tasks_driver?!"
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was briefly unsure about why this doesn't just use expect(...) but i realized that it requires the error value to be Debug, which i assume this isn't...

producer_registry: &ProducerRegistry,
) -> BackgroundTasks {
let mut driver = Driver::new();
v2p_watcher: (watch::Sender<()>, watch::Receiver<()>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the changes in this PR should, hopefully, allow us to get rid of this (yay!). I think there's kind of a tradeoff between doing that in this PR, which gives us a nice worked example of how to use the new interface, but also makes this diff touch a bunch more files, and doing it a follow-up, which results in a much smaller diff. I don't have a particularly strong opinion here, but wanted to ask about your thoughts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this one offline as well, and right now, my preference is probably to get rid of it separately in the interest of merging this PR sooner and reducing merge conflict opportunities, but 🤷‍♀️ it's probably not a huge deal either way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do plan to look at this in a follow-up.

@davepacheco davepacheco merged commit f275827 into main Jun 29, 2024
19 checks passed
@davepacheco davepacheco deleted the dap/bgtask-init-rework branch June 29, 2024 01:13
hawkw added a commit that referenced this pull request Jul 1, 2024
Currently, a `tokio::sync::watch` channel is explicitly passed around to
allow both Nexus and other background tasks to notify the `v2p_manager`
background task. This must be passed explicitly through most of Nexus,
which is a bit awkward.

The new `background::Activator` API introduced in PR #5962 makes it a
bit easier to nicely activate background tasks. Now, we have an
`Activator` type which represents the `tokio::sync::Notify` used to wake
the background task, and this can be passed into the various functions
which must activate it. Because `Activator`s are constructed _before_
background tasks are started, the `Activator` can be passed to the
background tasks that must activate the `v2p_manager` task (in this
case, the `instance_watcher` task).

This branch removes the `tokio::sync::watch` channel used to activate
the `v2p_manager`, and replaces it with a use of the `Activator` API.
Because the `Activator` type is not currently `Clone`, I refactored it
slightly so that both the wired-up flag _and_ the `Notify` live within
an `Arc`, allowing a clone of the `Activator` for the `v2p_manager` task
to be passed into the `instance_watcher` task when it's constructed. I
don't think this really introduces any new opportunities for accidental
`Activator` misuse, as the assertion that an activator is not wired up
twice still stands.
hawkw added a commit that referenced this pull request Jul 1, 2024
Currently, a `tokio::sync::watch` channel is explicitly passed around to
allow both Nexus and other background tasks to notify the `v2p_manager`
background task. This must be passed explicitly through most of Nexus,
which is a bit awkward.

The new `background::Activator` API introduced in PR #5962 makes it a
bit easier to nicely activate background tasks. Now, we have an
`Activator` type which represents the `tokio::sync::Notify` used to wake
the background task, and this can be passed into the various functions
which must activate it. Because `Activator`s are constructed _before_
background tasks are started, the `Activator` can be passed to the
background tasks that must activate the `v2p_manager` task (in this
case, the `instance_watcher` task).

This branch removes the `tokio::sync::watch` channel used to activate
the `v2p_manager`, and replaces it with a use of the `Activator` API.
Because the `Activator` type is not currently `Clone`, I refactored it
slightly so that both the wired-up flag _and_ the `Notify` live within
an `Arc`, allowing a clone of the `Activator` for the `v2p_manager` task
to be passed into the `instance_watcher` task when it's constructed. I
don't think this really introduces any new opportunities for accidental
`Activator` misuse, as the assertion that an activator is not wired up
twice still stands.
@@ -2,7 +2,90 @@
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at https://mozilla.org/MPL/2.0/.

//! Specific background task initialization
//! Initialize Nexus background tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is excellent @davepacheco! Thanks for writing it up so thoroughly.

hawkw added a commit that referenced this pull request Jul 1, 2024
Currently, a `tokio::sync::watch` channel is explicitly passed around to
allow both Nexus and other background tasks to notify the `v2p_manager`
background task. This must be passed explicitly through most of Nexus,
which is a bit awkward.

The new `background::Activator` API introduced in PR #5962 makes it a
bit easier to nicely activate background tasks. Now, we have an
`Activator` type which represents the `tokio::sync::Notify` used to wake
the background task, and this can be passed into the various functions
which must activate it. Because `Activator`s are constructed _before_
background tasks are started, the `Activator` can be passed to the
background tasks that must activate the `v2p_manager` task (in this
case, the `instance_watcher` task).

This branch removes the `tokio::sync::watch` channel used to activate
the `v2p_manager`, and replaces it with a use of the `Activator` API.
Because the `Activator` type is not currently `Clone`, I refactored it
slightly so that both the wired-up flag _and_ the `Notify` live within
an `Arc`, allowing a clone of the `Activator` for the `v2p_manager` task
to be passed into the `instance_watcher` task when it's constructed. I
don't think this really introduces any new opportunities for accidental
`Activator` misuse, as the assertion that an activator is not wired up
twice still stands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants