Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add task/scheduler cancellation API #557

Merged
merged 3 commits into from
Aug 15, 2024
Merged

Add task/scheduler cancellation API #557

merged 3 commits into from
Aug 15, 2024

Conversation

jpsamaroo
Copy link
Member

It's a frequent situation where a task runs for a really long time, or just hangs (maybe due to a bug, or intentionally), and we just want to stop the task and move on with life. You might think that using Ctrl+C is the right way to do this, but you'll find that with Julia (and many other languages) that this frequently does not do what you want, and is just as likely to hang or crash your Julia process. This is because the request to "cancel" some running code isn't targeted, and so Julia just interrupts whatever task is running currently, which is frequently not the task that you actually wanted to cancel.

This PR adds a new function, Dagger.cancel!, which allows for cancelling Dagger DTasks in a safe way. Unlike Ctrl+C, this doesn't force the underlying task to stop (that is generally considered unsafe and impossible to always do safely and in a timely manner), but instead just "abandons" the task and lets Dagger's runtime and scheduler move on to working on other queued tasks. This releases any calls to wait or fetch that were waiting on the cancelled DTask, and unblocks the processor queues so that other tasks may run.

It also provides a way to halt the scheduler and allow it to restart automatically, which can prove useful for automated testing and when certain kinds of hangs occur within the scheduler.

It's expected that this functionality will eventually be wired up to a smarter Ctrl-C, so that users can regain control of a seemingly unresponsive system, or to allow prototyping algorithms in the REPL which may run for a really long time.

@jpsamaroo jpsamaroo merged commit 557269d into master Aug 15, 2024
7 of 11 checks passed
@jpsamaroo jpsamaroo deleted the jps/cancellation branch August 15, 2024 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant