Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: future::with_plan() and/or future::local_plan() #263

Open
wlandau opened this issue Nov 12, 2018 · 10 comments
Open

Suggestion: future::with_plan() and/or future::local_plan() #263

wlandau opened this issue Nov 12, 2018 · 10 comments

Comments

@wlandau
Copy link

wlandau commented Nov 12, 2018

I am picturing something similar to the with_*() and local_*() functions from withr. Example use case: ropensci/drake#561. When a drake user selects a distributed parallel backend, the imported functions and files are still processed locally. (If we are just hashing an input data file, it does not make sense to send the job to a compute node on a cluster.) In drake, it might be nice to write something like this:

make <- function(...){
  future::with_plan(multiprocess, {
    process_imports(...)
  }
  build_targets(...) # Uses whatever future::plan() the user selects before make().
}

Or alternatively:

make <- function(...){
  process_imports(...)
  build_targets(...) # Uses whatever future::plan() the user selects before make().
}

where

process_imports <- function(...){
  if (jobs > 1L) {
    future::local_plan(multiprocess)
    launch_futures_for_imports(...)
  } else {
    just_loop_through_imports(...)
  }
}
@HenrikBengtsson
Copy link
Collaborator

HenrikBengtsson commented Dec 6, 2018

As you've might have noticed (and for "new comers"), I'm really hesitant suggesting that developers could/should set plan locally in their code. I really want to avoid design patterns such as:

foo <- function(..., cores = 1) {
  oplan <- plan("list")
  on.exit(plan(oplan))  # <= super important!
  plan(multiprocess, workers = cores)
  ...
}

that may appear out of habit and will just lock the users into a specific parallel backend although some users may have way more powerful alternatives at hand, e.g. 1,000's of cores on a compute cluster.
Then, it is much better to leave it to the end user to control how to parallelize.

Having said this, I do understand that there are use cases where you need to temporarily use another future backend than what is already set (by the user) with plan(). Then, yes, something like:

withPlan <- function(expr, ..., envir = parent.frame()) {
   expr <- substitute(expr)
   oplan <- plan("list")
   on.exit(plan(oplan))
   plan(...)
   eval(expr, envir = envir)
}

is handy. I prefer to hold back on adding such a function to the future package because of the above concerns. On the other hand, the advantage of providing such a function would be lowered risk for code/packages messing with the user's plan() settings. I need to digest this feature request more before make a decision. In the meanwhile, cut'n'paste the above withPlan() but please consider if it is really needed.

Long-term thoughts: Related to our discussion in Issue #181, it would be better if the developer needs could be handled by only specifying what type of resources they need, e.g. "give me a local parallel backend - I don't care exactly which". Now, such a framework will not be in place anytime soon (it's only being processed on idle speed far back somewhere in my head).

EDIT 2019-06-26: Previous version claimed that oplan <- plan(...) was enough, but that will only return the future strategy on top of the stack, not all of the stack in case there are nested strategies set. Only order to get ahold of the stack, one need to call plan("list").

@wlandau
Copy link
Author

wlandau commented Dec 7, 2018

That makes sense, and I respect your decision to separate the choice of backend from the hard-coded workflow. Because of ropensci/drake#598 (comment), my original point in this thread might actually become moot.

For the original motivating use case of #181, we might actually be able to stick with a single user-defined future::plan(). Among other things, a couple users and I were hoping that different remote transient workers could have different walltimes, memory limits, etc.

@wlandau
Copy link
Author

wlandau commented Dec 7, 2018

On further reflection, this feels like something I could implement in drake using the resources argument to future().

@HenrikBengtsson
Copy link
Collaborator

Note that the resources argument is only acknowledged by future.batchtools and is not (yet) part of the Future API per because I don't what the exact content/format of resources should be. I'm open to proposal.

@wlandau
Copy link
Author

wlandau commented Dec 7, 2018

I think that it still suits the use case I was picturing.

@HenrikBengtsson
Copy link
Collaborator

UPDATE 2019-06-26: There was a mistake in my example code of comment #263 (comment). I've updated it accordingly. Please see the 'EDIT' note at the very end.

@HenrikBengtsson
Copy link
Collaborator

HenrikBengtsson commented Jun 26, 2019

... and, before you guys rush and update your code, I'll see if I can change the default value of plan() to return all of the stack rather than just the strategy on top of the stack. That's certainly a nice behavior. If that's doable, you do not have to change anything.

HenrikBengtsson added a commit that referenced this issue Jun 26, 2019
@HenrikBengtsson
Copy link
Collaborator

FYI, in the next release, it'll be slightly easier to undo the future plan, e.g.

my_fcn <- function(...) {
   oplan <- plan("multicore")
   on.exit(plan(oplan))
   ...
}

vs

my_fcn <- function(...) {
  oplan <- plan("list")
  on.exit(plan(oplan))
  plan("multicore")
  ...
}

In other words, when setting a new set of strategies, plan(...) will return the full list of previous strategies (== plan("list")) and not just the next strategy on the stack (== plan("next")). This means that the following will work:

plan(list(A = multicore, B = batchtools_local))

# temporary use sequential processing
oplan <- plan(sequential)
y <- future.apply::future_lapply(1:10, FUN = identical)
# undo strategy stack
plan(oplan)

Previously, the above code would effectively do:

plan(oplan[[1]])

that is, plan(multicore) not plan(list(A = multicore, B = batchtools_local)).

@HenrikBengtsson
Copy link
Collaborator

I overlooked your proposal of local_plan(); it turns out that I re-invented this idea myself last week for progressr::progressor() to auto-close itself "on.exit". I then realized that one could do the same for future::plan() where it auto-undo itself "on.exit". That would protect us from cases where it's forgotten - I keep spotting a few packages here and there that sets plan() internally without undoing it afterward.

@wlandau
Copy link
Author

wlandau commented Nov 7, 2020

Sounds like a great safeguard. Will it be optional like in progressr::progressor()? The reason I ask is targets has the capability to temporarily switch plans and I would like to still be able to faithfully restore the default plan after launching a future().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants