Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would be nice to have a "make everyting use future" function #162

Open
DarwinAwardWinner opened this issue Aug 28, 2017 · 10 comments
Open

Comments

@DarwinAwardWinner
Copy link

Using load hooks, it should be possible to implement a single function that tells every parallel package that can do so (BiocParallel, foreach, ...?) to use the future backend, even packages that haven't been loaded yet. This would replace the several lines of boilerplate required to do the same thing manually and hopefully make it much friendlier to new users of the package.

See https://stat.ethz.ch/R-manual/R-devel/library/base/html/userhooks.html

I might take a stab at implementing this.

@HenrikBengtsson
Copy link
Collaborator

Sorry, I'm a bit slow - you're talking packageEvent(..., event = "onLoad"), correct? Where should the hook function live, when should it be setup/added, in what cases should it be called, and what should it do?

@DarwinAwardWinner
Copy link
Author

Well, first consider the simple case, where every parallel package that can possibly use future as a backend has already been loaded. In that case, it's easy to write a function that configures all of them with their respective future backends. Let's call that function register_all_future_backends for argument's sake.

But if some of those packages aren't loaded, you don't want to force loading them if they're never going to be used, but you also want to ensure that if those packages are loaded later, they will use the future backend. So for the packages that aren't loaded, we can use hooks to run the future backend setup as soon as that package is loaded. I think we would need to use setHook, since packageEvent is used by packages to add a hook to their own loading, not to other packages' load events.

@DarwinAwardWinner
Copy link
Author

Wait, sorry, I think I've misunderstood the documentation at that page. It looks like packageEvent(PKGNAME, event="onLoad") is used to get the name of the hook, and that hook name is then passed to setHook.

@DarwinAwardWinner
Copy link
Author

I think this is the general idea, in pseudocode:

if (is.loaded(pkgA)) {
    register_pkgA_future_backend()
} else {
    setHook(packageEvent(pkgA, "onLoad"), register_pkgA_future_backend)
}

and so on for each package with a future backend.

@HenrikBengtsson
Copy link
Collaborator

Are you saying that when future is loaded, it should override whatever backend is already set for (frontend) foreach with doFuture::registerDoFuture(), and same for BiocParallel, and so on? (I understand that the hook handles when those packages are loaded after the future package).

@DarwinAwardWinner
Copy link
Author

I'm not saying that simply loading future should do this, I'm saying future should provide a function that does it, so that overriding all the backends becomes one line of code, and hooks are used so you don't need to worry about running that one line of code after loading all the packages.

@HenrikBengtsson
Copy link
Collaborator

I see, so a use_futures() function? Can you give some mockup code with and without such a function, because I'm still not 100% sure what the advantage would be compared to, say:

library("foreach")  # Not really needed
library("doFuture")
registerDoFuture()
plan(multiprocess)

and

library("BiocParallel")  # Not really needed
library("BiocParallel.FutureParam")
register(FutureParam())
plan(multiprocess)

DarwinAwardWinner added a commit to DarwinAwardWinner/future that referenced this issue Aug 30, 2017
DarwinAwardWinner added a commit to DarwinAwardWinner/future that referenced this issue Aug 30, 2017
@DarwinAwardWinner
Copy link
Author

I think this is basically what I'm proposing: https://github.com/DarwinAwardWinner/future/blob/6a000af1e9ea41674c85a5476cf5e8c6c9e75d80/R/use_futures.R

I'm probably got some of the namespace stuff wrong in that commit, but hopefully this gives an idea of what I'm going for. I haven't tested this yet, but the idea is that you can do:

library(future)
plan(multiprocess)
use_futures()

function_that_loads_and_uses_BiocParallel()
function_that_loads_and_uses_foreach()

And all functions will use multiprocess futures to implement parallelism, and you don't even need to know that those functions are using BiocParallel or foreach internally. You just know that use_futures() will ensure that all parallel packages with configurable backends use futures.

@DarwinAwardWinner
Copy link
Author

As a side note, I wonder if R.utils could use a function that abstracts the logic of "run this after PKG is loaded or right now if PKG is already loaded", like eval-after-load in emacs lisp.

@DarwinAwardWinner
Copy link
Author

DarwinAwardWinner commented May 4, 2018

After some further testing, it doesn't seem that setHook can do what is needed. I'm not sure exactly when the hook runs, but it seems like it's possible to run functions from e.g. BiocParallel without triggering the hook first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants