Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parabar with foreach()? #53

Open
jmbh opened this issue May 17, 2024 · 6 comments
Open

parabar with foreach()? #53

jmbh opened this issue May 17, 2024 · 6 comments
Labels
feature New feature or request

Comments

@jmbh
Copy link

jmbh commented May 17, 2024

Hi Mihai,

Thanks a lot for creating a package solution for this annoying problem. However, in the examples on your website I only see examples which work with parSapply() from (I guess) the parallel package.

Any chance your package also work with foreach() from the foreach package?

Best wishes,
Jonas

@mihaiconstantin
Copy link
Owner

mihaiconstantin commented May 18, 2024

Hi Jonas!

I took a look at the foreach package and I see it's possible.

Looking at the code for the doParallel package and the scarce documentation for foreach::foreach-ext, I think the main ideas are:

  1. Provide an implementation for the foreach::%dopar% operator using a parabar backend.
  2. Register that implementation with foreach via the foreach::setDoPar function.
  3. Provide a wrapper for registering a parabar backend with the foreach::%dopar%-compatible implementation.

Here is such an implementation, starting with (1):

# Implementation for the `%dopar%` operator.
doParabar <- function(obj, expr, envir, data) {
    # Extract the `backend` from the data argument.
    backend <- data$backend

    # Create an iterator object from the input object.
    iterator <- iterators::iter(obj)

    # Create an accumulator function for the iterator.
    accumulator <- foreach::makeAccum(iterator)

    # Prepare the items to be processed.
    items <- as.list(iterator)

    # Define the task to be evaluated for each item.
    task <- function(arguments) {
        # Evaluate the task for the current item.
        eval(expr, envir = arguments, enclos = envir)
    }

    # Export any objects to the cluster.
    parabar::export(backend, variables = ls(envir), environment = envir)

    # Apply the task function to each item using `parabar::par_lapply`.
    results <- parabar::par_lapply(backend, items, task)

    # Accumulate the results.
    accumulator(results, seq_along(results))

    # Return the results.
    return(foreach::getResult(iterator))
}

Then, for (2) and (3) things are much simpler:

# The user function for registering the `parabar`-compatible `%dopar%` implementation.
registerDoParabar <- function(backend) {
    # Register the `%dopar%` operator implementation.
    foreach::setDoPar(
        # The implementation.
        fun = doParabar,

        # Infomration to be passed to the regiserered implementation.
        data = list(backend = backend),

        # Information about the implementation.
        info = function(data, item) NULL
    )
}

Finally, you can use parabar with foreach as follows (i.e., assuming you have doParabar and registerDoParabar in your R session):

# Load packages.
library(parabar)
library(foreach)
library(iterators)

# Start an asynchronous `parabar` backend as usual.
backend <- parabar::start_backend(cores = 4, cluster_type = "psock", backend_type = "async")

# Register it with the `foreach` package.
registerDoParabar(backend)

# Use the `foreach` package as usual.
results <- foreach(i = 1:1000, .combine = c) %dopar% {
    # Sleep a bit.
    Sys.sleep(0.01)

    # Compute and return.
    i + 1
}

# Stop the backend.
parabar::stop_backend(backend)

Note that the doParabar function above implements only the .combine argument of foreach. Also, I see that that doParallel does a lot more error handling around the task execution and accumulation of results. Nevertheless, that's the main idea.

If you think it makes sense, we could consider placing the implementation for the foreach::%dopar% operator in a package of its own called doParabar. I am hesitant to add it to parabar directly because my intention is to stay as close as possible to parallel.

I hope this helps!

@jmbh
Copy link
Author

jmbh commented May 30, 2024

Hi Mihai,

Thanks a million for this!

The technical details are a bit over my head, but your code works like a charm.

One question I still have would be how I can make the progress bar optional. In my earlier solution with doSNOW (which I can't use because it is superseded in R as you know) I simply put an if-statement before setTxtProgressBar(), but since I don't fully see through your solution, this is less clear to me here. Does configure_bar() allow me to switch it off? If there is no obvious way that I am missing here, it could be nice to add another option "none" next to "modern" and "basic" to this function for this purpose.

Thanks again & all the best,
Jonas

@mihaiconstantin
Copy link
Owner

Hi Jonas,

Super happy to hear it works!

One question I still have would be how I can make the progress bar optional.

Not sure if this is what you mean, but you can disable the progress tracking by running:

# Set the progress tracking option to `FALSE`.
set_option("progress_track", FALSE)

Functions like par_sapply and the like always check if the user wants progress tracking:

parabar/R/UserApiConsumer.R

Lines 103 to 104 in 52234ba

# Whether to track progress or not.
progress <- get_option("progress_track")

And the progress is tracked only if the conditions are right:

parabar/R/UserApiConsumer.R

Lines 115 to 116 in 52234ba

# If progress is requested and the conditions are right.
if (progress && backend$supports_progress && interactive()) {

You can further control progress tracking options (e.g., the overhead) via the package options documented here, and restore things to default using the set_default_options function.

I hope this helps!

@jmbh
Copy link
Author

jmbh commented Jun 3, 2024

That's it, thank you!

@jmbh
Copy link
Author

jmbh commented Nov 11, 2024

Hi Mihai,

One more question on this one. I used your solution which works beautifully, but it seems like it forces the use of at least two cores:

n_cores <- 1
backend <- parabar::start_backend(cores = n_cores, cluster_type = "psock", backend_type = "async")
Warning message:
Argument cores must be greater than 1. Setting to 2.

Is there a way to also use only a single core? Of course, in almost all cases at least two cores are available, but if possible we'd also like to allow for n_cores=1.

Let me know in case you'd like me to open a separate issue on this.

Thanks,
Jonas

@mihaiconstantin
Copy link
Owner

Hi Jonas,

Thanks for the kind words!

Regarding cores = 1, I see no reason why not to allow for that. I opened #71 and will write more about it there. Thanks for mentioning it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

2 participants