parabar with foreach()? #53

jmbh · 2024-05-17T13:24:51Z

Hi Mihai,

Thanks a lot for creating a package solution for this annoying problem. However, in the examples on your website I only see examples which work with parSapply() from (I guess) the parallel package.

Any chance your package also work with foreach() from the foreach package?

Best wishes,
Jonas

mihaiconstantin · 2024-05-18T15:43:14Z

Hi Jonas!

I took a look at the foreach package and I see it's possible.

Looking at the code for the doParallel package and the scarce documentation for foreach::foreach-ext, I think the main ideas are:

Provide an implementation for the foreach::%dopar% operator using a parabar backend.
Register that implementation with foreach via the foreach::setDoPar function.
Provide a wrapper for registering a parabar backend with the foreach::%dopar%-compatible implementation.

Here is such an implementation, starting with (1):

# Implementation for the `%dopar%` operator.
doParabar <- function(obj, expr, envir, data) {
    # Extract the `backend` from the data argument.
    backend <- data$backend

    # Create an iterator object from the input object.
    iterator <- iterators::iter(obj)

    # Create an accumulator function for the iterator.
    accumulator <- foreach::makeAccum(iterator)

    # Prepare the items to be processed.
    items <- as.list(iterator)

    # Define the task to be evaluated for each item.
    task <- function(arguments) {
        # Evaluate the task for the current item.
        eval(expr, envir = arguments, enclos = envir)
    }

    # Export any objects to the cluster.
    parabar::export(backend, variables = ls(envir), environment = envir)

    # Apply the task function to each item using `parabar::par_lapply`.
    results <- parabar::par_lapply(backend, items, task)

    # Accumulate the results.
    accumulator(results, seq_along(results))

    # Return the results.
    return(foreach::getResult(iterator))
}

Then, for (2) and (3) things are much simpler:

# The user function for registering the `parabar`-compatible `%dopar%` implementation.
registerDoParabar <- function(backend) {
    # Register the `%dopar%` operator implementation.
    foreach::setDoPar(
        # The implementation.
        fun = doParabar,

        # Infomration to be passed to the regiserered implementation.
        data = list(backend = backend),

        # Information about the implementation.
        info = function(data, item) NULL
    )
}

Finally, you can use parabar with foreach as follows (i.e., assuming you have doParabar and registerDoParabar in your R session):

# Load packages.
library(parabar)
library(foreach)
library(iterators)

# Start an asynchronous `parabar` backend as usual.
backend <- parabar::start_backend(cores = 4, cluster_type = "psock", backend_type = "async")

# Register it with the `foreach` package.
registerDoParabar(backend)

# Use the `foreach` package as usual.
results <- foreach(i = 1:1000, .combine = c) %dopar% {
    # Sleep a bit.
    Sys.sleep(0.01)

    # Compute and return.
    i + 1
}

# Stop the backend.
parabar::stop_backend(backend)

Note that the doParabar function above implements only the .combine argument of foreach. Also, I see that that doParallel does a lot more error handling around the task execution and accumulation of results. Nevertheless, that's the main idea.

If you think it makes sense, we could consider placing the implementation for the foreach::%dopar% operator in a package of its own called doParabar. I am hesitant to add it to parabar directly because my intention is to stay as close as possible to parallel.

I hope this helps!

jmbh · 2024-05-30T08:14:43Z

Hi Mihai,

Thanks a million for this!

The technical details are a bit over my head, but your code works like a charm.

One question I still have would be how I can make the progress bar optional. In my earlier solution with doSNOW (which I can't use because it is superseded in R as you know) I simply put an if-statement before setTxtProgressBar(), but since I don't fully see through your solution, this is less clear to me here. Does configure_bar() allow me to switch it off? If there is no obvious way that I am missing here, it could be nice to add another option "none" next to "modern" and "basic" to this function for this purpose.

Thanks again & all the best,
Jonas

mihaiconstantin · 2024-05-30T11:03:09Z

Hi Jonas,

Super happy to hear it works!

One question I still have would be how I can make the progress bar optional.

Not sure if this is what you mean, but you can disable the progress tracking by running:

# Set the progress tracking option to `FALSE`.
set_option("progress_track", FALSE)

Functions like par_sapply and the like always check if the user wants progress tracking:

parabar/R/UserApiConsumer.R

Lines 103 to 104 in 52234ba

    
           # Whether to track progress or not. 
        
           progress <- get_option("progress_track")

And the progress is tracked only if the conditions are right:

parabar/R/UserApiConsumer.R

Lines 115 to 116 in 52234ba

    
           # If progress is requested and the conditions are right. 
        
           if (progress && backend$supports_progress && interactive()) {

You can further control progress tracking options (e.g., the overhead) via the package options documented here, and restore things to default using the set_default_options function.

I hope this helps!

jmbh · 2024-06-03T09:55:23Z

That's it, thank you!

jmbh · 2024-11-11T08:36:42Z

Hi Mihai,

One more question on this one. I used your solution which works beautifully, but it seems like it forces the use of at least two cores:

n_cores <- 1
backend <- parabar::start_backend(cores = n_cores, cluster_type = "psock", backend_type = "async")
Warning message:
Argument cores must be greater than 1. Setting to 2.

Is there a way to also use only a single core? Of course, in almost all cases at least two cores are available, but if possible we'd also like to allow for n_cores=1.

Let me know in case you'd like me to open a separate issue on this.

Thanks,
Jonas

mihaiconstantin · 2024-11-17T22:03:11Z

Hi Jonas,

Thanks for the kind words!

Regarding cores = 1, I see no reason why not to allow for that. I opened #71 and will write more about it there. Thanks for mentioning it!

mihaiconstantin added the feature New feature or request label May 18, 2024

mihaiconstantin added this to parabar May 18, 2024

github-project-automation bot moved this to Backlog in parabar May 18, 2024

mihaiconstantin mentioned this issue May 18, 2024

Feature: par_lapply, par_apply, foreach #30

Closed

mihaiconstantin mentioned this issue Nov 17, 2024

Allow to create backends with a single core #71

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parabar with foreach()? #53

parabar with foreach()? #53

jmbh commented May 17, 2024

mihaiconstantin commented May 18, 2024 •

edited

Loading

jmbh commented May 30, 2024

mihaiconstantin commented May 30, 2024

jmbh commented Jun 3, 2024

jmbh commented Nov 11, 2024

mihaiconstantin commented Nov 17, 2024

parabar with foreach()? #53

parabar with foreach()? #53

Comments

jmbh commented May 17, 2024

mihaiconstantin commented May 18, 2024 • edited Loading

jmbh commented May 30, 2024

mihaiconstantin commented May 30, 2024

jmbh commented Jun 3, 2024

jmbh commented Nov 11, 2024

mihaiconstantin commented Nov 17, 2024

mihaiconstantin commented May 18, 2024 •

edited

Loading