Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a progress bar by default in a package #78

Open
Bisaloo opened this issue May 2, 2020 · 10 comments
Open

Adding a progress bar by default in a package #78

Bisaloo opened this issue May 2, 2020 · 10 comments
Labels
question Further information is requested

Comments

@Bisaloo
Copy link
Contributor

Bisaloo commented May 2, 2020

Hi,

I'm slightly confused by this part of the documentation about with_progress():

IMPORTANT: This function is meant for end users only. It should not be used by R packages, which only task is to signal progress updates, not to decide if, when, and how progress should be reported.

I don't know whether it's because I don't fully understand everything about progressr or if it's a design choice where we may disagree.

Here is my use case:

I have a function that may take some time to run (couple dozens of minutes in most cases) and I would like to signal progress by default to users. Previously we were using the pbmcapply package but we moved away from it to get the flexibility from future (especially parallel processing on windows machines). In order to achieve this, we wrapped our future_lapply() loop with with_progress() in the function provided by our package.

Now my understanding is that if users don't like the progress bar, they can disable it by using handlers("void"). We even added a note in the function documentation to inform them of this option. (This is also a nice improvement compared to the previous pbmcapply solution.)

My issue with the approach you describe for progressr (the user should be in charge of using with_progress()) is that most users won't look into the docs and probably never bother adding the progress bar, even though they may enjoy it if they knew.

TL;DR: does it make sense to use with_progress() in a package to choose sensible defaults for the user?

@HenrikBengtsson
Copy link
Collaborator

Hi. I should start out to make it clear that this package is "experimental", that is, I consider it in a fairly early stage in its lifespan. This means that it's overall design and API is yet to be settled. Also, I'm very conservative when it comes to adding new features if there is the slightest risk that they might bring us down a path preventing us from adding other features in the future, including features we don't yet know about. Hopefully this brings some clarify of where we are today and where we might be heading and that nothing is set in stone at this point.

So, about with_progress():

One important question is, who should own that function: the developer or the end-user? I've taken the conservative approach of letting the end-user to own that one. It might be that it'll grow into a different role later where it is very clear that it needs to be owned by both parties.

Another thing that is on the radar is the support for globalCallingHandlers() we've got in R (>= 4.0.0). I'm not sure how it can be implemented right now, but that opens up for the possibility of registering a global calling handler for progress updates without having to use with_progress() at all. In an ideal world, this could mean that a call to handlers(...) by the end user will automatically enable rendering of progress updates. Knowing more about this and whether it would be possible or not will help clarify the role of with_progress().

A third thing is what should happen if one uses nested progress updates? Say that I have a package that report on progress when it calls one of the functions in your package, say, 5 times. My life's been good and I've got progress updates as wanted, 0%, 20%, ..., 100%. Now, you also discover the progressr package inside your functions and one of them is the one I already use in my package.
Would should happen when we do that? Currently, with only one with_progress() at the top level, we get reports from the first layer of progression while the second layer, the one produced by your packages, is masked by the first layer. However, you could imagine that the information from the second layer is used to increase the granuality of progression updates in the first layer, e.g. 0%, 2%, ..., 98%, 100%. One can also imagine fancy nested progress reporting, e.g.

|===========>--------------------------|  39%
 (1) |=================================| 100%
 (2) |========>------------------------| 100%
 (3) |---------------------------------| 100%
 (4) |---------------------------------| 100%
 (5) |---------------------------------| 100%

or something like:

|=====|==   |     |      |      |      |  39%

Imagine what this would/could do if we run in parallel.

Now, I don't know what the implications would be if you start using with_progress() in your package. Will that prevent the development of the above or not? This is one reason why I don't think it's a good idea at this point to start having with_progress() scattered in different packages. Hopefully, I/we know more after a few more major releases of the progressr package.

Some related issues: #56, #55, #36, #13

@Bisaloo
Copy link
Contributor Author

Bisaloo commented May 4, 2020

Thank you for the detailed and insightful answer!

with_progress() is already in 2 packages I'm involved in. So I'll keep it for now but I will closely follow the development of progressr and quickly react to possible API changes so this does not make your life difficult.

Of course, if you think of a better way to enable progress report by default (and then letting the user decide how they want this progress to be reported, of if it should be disabled), I'm totally open to suggestions. I'd like progress report to be opt-out, no opt-in.

@thierrygosselin
Copy link

I'm analyzing genomic data
I want a progress bar for some of my functions,
so that the end user can go do something else, like have a coffee and discuss science or do something else more meaningful than look and wait at the screen...

95% of the users of my packages are R beginners, if on top of learning R and genomics, my packages requires reading and more coding to enable a progress bar !!! I'll have no other option than to drop that feature because it will be asking too much, most people won't read it or will snooze on the doc.

If there is a vote on this: developper!

Cheers,
Thierry

@HenrikBengtsson
Copy link
Collaborator

See Issue #95 (BETA: Global progress handler). It removes the need for the user having to use with_progress() - all they need to do is to specify once whether or not they want to get progress updates. As package developers, we don't have to change anything. Hopefully simplifies how we think about progress updates.

@HenrikBengtsson
Copy link
Collaborator

HenrikBengtsson commented Dec 16, 2020

So, now we have global progress handling (progressr 0.7.0), where all the user need to do is to confirm they want to "subscribe" to all progress information signaled by calling:

progressr::handlers(global=TRUE)

and then progress is reported without the need for them to use with_progress().

Does this solve what you're asking for?

@thierrygosselin
Copy link

For me it's perfect
Much appreciated Henrik
Best
Thierry

@Bisaloo
Copy link
Contributor Author

Bisaloo commented Jan 29, 2021

So, now we have global progress handling (progressr 0.7.0), where all the user need to do is to confirm they want to "subscribe" to all progress information signaled by calling:

progressr::handlers(global=TRUE)

and then progress is reported without the need for them to use with_progress().

Does this solve what you're asking for?

Partially.

Thanks to this, I will not add with_progress() in my package code, which is very good news, and more in line with the philosophy you're defending for the progressr package.

But I might have mis-formulated my initial request: this still requires an action from the user part. And I'm believe most users will not read the documentation well enough to learn about this feature. In other words, this change (as far as I understand it!) doesn't allow the addition of a progress bar by default by package developers.

Please let me know if I misunderstood something and if this request goes against what you want for the progressr package.

@HenrikBengtsson
Copy link
Collaborator

Unfortunately, R will most likely (= I'm 99.99999% sure) never allow an R package to register a global calling handler during startup. For example, if you try to add:

.onLoad <- function(libname, pkgname) {
  globalCallingHandlers(foo = function(c) NULL)
}

then you'll get:

> loadNamespace("teeny")
Error in globalCallingHandlers(foo = function(c) NULL) :
  should not be called with handlers on the stack
> traceback()
8: globalCallingHandlers(foo = function(c) NULL)
7: fun(libname, pkgname)
6: doTryCatch(return(expr), name, parentenv, handler)
5: tryCatchOne(expr, names, parentenv, handlers[[1L]])
4: tryCatchList(expr, classes, parentenv, handlers)
3: tryCatch(fun(libname, pkgname), error = identity)
2: runHook(".onLoad", env, package.lib, package)
1: loadNamespace("teeny")

There's no workaround or hack around this. I tend to agree with this conservative approach of R. If allowed/possible, it would open up Pandora's Box, e.g. competing handlers breaking other packages, introduce side-effects in the R evaluation, etc. It's still early days and we might find safe cases. Time will tell. (See also https://twitter.com/henrikbengtsson/status/1337233005446742016)

What is possible though is to set it during the R startup process, i.e. a user can set it in ~/.Rprofile. This opens up for some semi-automated configuration. But again, for now, I'd recommend this to be a deliberate action of the end-user.

@gitdemont
Copy link

As a follow-up of the issue from {future} futureverse/future#639

Below is an illustration of an attempt to add a default progress bar in a package see here for original function https://github.com/gitdemont/IFCip/blob/master/R/ExtractFeatures.R

#' @param ... will be passed to \link[future]{plan} with the exception of 'strategy' and '...'.
#' @param main_lab,sub_lab main and sub labels of the progress bar.
#' @param num number of iterations
#' @param progress whether to display a progress bar. Default is TRUE.\cr
#' When NULL, execution will not be wrapped inside \link[progressr]{with_progress}. This allows user to use global handler see \link[progressr]{handlers}.\cr
#' When FALSE, execution will be performed inside \link[progressr]{without_progress}.\cr
#' When TRUE, \link[progressr]{handlers} will be automatically selected and called inside inside \link[progressr]{with_progress}.\cr
#' @param parallel whether to use parallelization. Default is NULL.\cr
#' When NULL, current \pkg{future}'s plan 'strategy' will be used.\cr
#' When FALSE, \link[future]{plan} will be called with "sequential" 'strategy'.
#' When TRUE, \link[future]{plan} will be called with either "multisession" 'strategy' on Windows or "multicore" otherwise.
myfun <- function(...,
                  main_lab = "",
                  sub_lab = "",
                  num = 10L,
                  progress = TRUE,
                  parallel = NULL)  {
  stopifnot(typeof(main_lab) == "character", length(main_lab) == 1)
  stopifnot(typeof(sub_lab) == "character", length(sub_lab) == 1)
  stopifnot(typeof(num) == "integer", num > 0)
  dots=list(...)
  
  # define handler used to monitor progress
  p = progressr::progressor
  fun = progressr::with_progress
  hand = progressr::handler_txtprogressbar(title = main_lab)
  
  if(is.null(progress)) {
    fun = function(expr, ...) { expr }
  } else {
    stopifnot(length(progress) == 1, progress %in% c(TRUE, FALSE))
    old_hand_h <- getOption("progress.handlers", list())
    on.exit(progressr::handlers(old_hand_h, append = FALSE), add = TRUE)
    progressr::handlers(progressr::handler_void, append = FALSE)
    if(!progress) {
      fun = function(expr, ...) { progressr::without_progress(expr) }
      hand = progressr::handler_void
      p = function(...) { return(p) }
    } else {
      fun = progressr::with_progress
    }
  }
  
  # define future plan
  if(missing(parallel) || is.null(parallel)) {
    strategy = NULL
  } else {
    stopifnot(length(parallel) == 1, parallel %in% c(TRUE, FALSE))
    if(parallel) {
      if(.Platform$OS.type == "windows") {
        strategy = future::multisession
      } else {
        strategy = future::multicore
      }
    } else {
      strategy = future::sequential
    }
  }
  future_args = list(strategy = strategy)
  dots=dots[!(names(dots) %in% names(future_args))]
  if(!is.null(strategy)) dots=dots[names(dots) %in% setdiff(names(formals(strategy, envir = asNamespace("future"))), "...")]
  oplan=do.call(what = future::plan, args = c(future_args, dots))
  on.exit(future::plan(oplan), add = TRUE)
  
  # make the long computation
  fun(handlers = hand,
      interrupts = TRUE,
      enable = !is.null(progress) || progress,
      cleanup = TRUE,
      expr = {
    p <- p(steps = num, on_exit = FALSE, auto_finish = FALSE)
    on.exit(p("\n", amount = 0, type = "finish"), add = TRUE)
    p(main_lab, class = "sticky", amount = 0)
    # show future plan that will be used
    p(paste0("initialising [workers=", future::nbrOfWorkers(),"] ",
             paste0(setdiff(class(future::plan()),
                            c("FutureStrategy", "uniprocess", "future", "function")),
                    collapse = "|")),
      class = ifelse(sub_lab == "" || is.null(progress), "sticky", "non_sticky"), amount = 0)
    Sys.sleep(0.5)
    ans <- future.apply::future_lapply(
      future.globals = FALSE,
      future.lazy = FALSE,
      future.scheduling = +Inf,
      future.chunk.size = NULL,
      # other extra future args
      # future.packages = ,
      # future.seed = ,
      # future.envir = ,
      # future.globals,
      X = 1:num,
      FUN = function(iter) { 
        Sys.sleep(1)        # place your own computation expensive function instead of this line
        p(sprintf("%s %i%%", sub_lab, round(100*iter/num)))
        iter
      })
  })
  return(invisible(ans))
}

# reset handlers and plan
future::plan(future::sequential)
progressr::handlers(progressr::handler_void)
progressr::handlers(global = FALSE)

# several examples
myfun(num = 10L, main_lab = "foo", sub_lab = "bar", parallel = TRUE)
myfun(num = 10L, main_lab = "foo", sub_lab = "bar", parallel = FALSE)

# using custom handler
if(!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")
if(!requireNamespace("progress", quietly = TRUE)) remotes::install_github("r-lib/progress")
library(progress)
hand = list(
  progressr::handler_progress(
    format   = ":spin :current/:total (:message) [:bar] :percent in :elapsed ETA: :eta",
    width    = 60,
    complete = "+"
  )
)

# using global handler
progressr::handlers(progressr::handler_void)
progressr::handlers(global = TRUE)
progressr::handlers(hand, append = FALSE)
myfun(num = 10L, main_lab = "foo", sub_lab = "bar", progress = NULL, parallel = FALSE)  # use hand                (i.e. user one)
myfun(num = 10L, main_lab = "foo", sub_lab = "bar", progress = TRUE, parallel = TRUE)   # function defined handler (i.e. dev one)
myfun(num = 10L, main_lab = "foo", sub_lab = "bar", progress = FALSE, parallel = FALSE) # nothing

# inside with_progress
progressr::with_progress(myfun(num = 10L, main_lab = "foo", sub_lab = "bar", progress = NULL, parallel = FALSE), handlers = hand) # use hand                         (i.e. user one)
progressr::with_progress(myfun(num = 10L, main_lab = "foo", sub_lab = "bar", progress = TRUE, parallel = TRUE), handlers = hand)  # add hand to already existing bar (i.e. user and dev ones)
progressr::with_progress(myfun(num = 10L, main_lab = "foo", sub_lab = "bar", progress = FALSE, parallel = TRUE), handlers = hand) # nothing

# in rstudio user + dev reporters
if(.Platform$GUI == "RStudio") { 
  progressr::with_progress(myfun(num = 10L, main_lab = "foo", sub_lab = "bar", progress = TRUE, parallel = FALSE), handlers =  progressr::handler_rstudio)
}

# using user plan and with user reporter
if(!requireNamespace("future.callr", quietly = TRUE)) remotes::install_github("HenrikBengtsson/future.callr")
future::plan(future.callr::callr(workers = 5))
myfun(num = 10L, main_lab = "foo", sub_lab = "bar", progress = NULL, parallel = NULL)

# using dev plan but with user defined reporter and number of workers passed by user
myfun(num = 10L, main_lab = "foo", sub_lab = "bar", progress = NULL, parallel = TRUE, workers = 3)

@tripartio
Copy link

tripartio commented Jan 31, 2024

I'd like to pitch in by first saying a big thanks for this fantastic package with such a carefully designed infrastructure for highly customizable progress bars. Yet, this issue is my one big qualm with the package design--that it makes it very hard for package developers to control the appearance of progress bars.

On one hand, I appreciate the package philosophy of giving users full control. On the other hand, the basic reality is that the vast majority of users (who are not package developers) simply don't care enough to want to take the trouble of configuring their progress bars, at least not unless they use our package enough to want to tweak details like that. Users do not install packages for their progress bars. They install packages for some other core functionality. Progress bars are great, but users would like them to just work out of the box without having to configure them just to use the package's core functionality.

So, after a few false starts, I have successfully written some simple code to automatically activate {progressr} in my package on behalf of users.

core_function1_with_progress_bar <- function(..., silent = FALSE) {
  validate_silent(silent)
  
  # Core function code
  # ...
}

core_function2_with_progress_bar <- function(..., silent = FALSE) {
  validate_silent(silent)
  
  # Core function code
  # ...
}

validate_silent <- function(silent) {
  # Validate the silent argument
  assertthat::assert_that(assertthat::is.flag(silent))
  
  if (!silent) {
    if (!progressr::handlers(global = NA)) {
      # If no progressr bar settings are configured, then set cli as the default.

      if (interactive()) {
        progressr::handlers(global = TRUE)
        progressr::handlers('cli')
        message(
          'Info: No global progress bars were found; the cli handler has been enabled. ',
          'This activation only lasts for one R session; ',
          'see help(ale) for how to permanently configure the progress bar settings.'
        )
      }
    }
  }
}

In all my functions with progress bars, I have a silent argument: TRUE is progress bars and other non-essential messages should be disabled (warnings and errors are still displayed) and FALSE (default) if progress bars should be displayed. I validate all the inputs of my exported functions. In this case, I use a dedicated function validate_silent(). That is where I place the activation code to enable progress bars. Crucially, it is not placed on the package .onLoad() function (which would fail).

Note the if (interactive()) condition: this is specifically to avoid cases where the code runs in non-interactive settings like RMarkdown files (such as package vignettes) that do not support global handlers. In that case, the progress bars are simply disabled, which is fine for non-interactive use.

Here is the documentation I provide users with more detail on how to configure things for themselves:

#' @section Progress bars:
#' Progress bars are implemented with the `{progressr}` package, which lets
#' the user fully control progress bars. **To disable progress bars, set `silent = TRUE`.**
#' The first time a function is called in
#' the `{ale}` package that requires progress bars, it checks if the user has
#' activated the necessary `{progressr}` settings. If not, the `{ale}` package
#' automatically enables `{progressr}` progress bars with the `cli` handler and
#' prints a message notifying the user.
#'
#' If you like the default progress bars and you want to make them permanent, then you
#' can [add the following lines of code to your .Rprofile configuration file](https://support.posit.co/hc/en-us/articles/360047157094-Managing-R-with-Rprofile-Renviron-Rprofile-site-Renviron-site-rsession-conf-and-repos-conf)
#' and they will become your defaults for every R session; you will not see the
#' message again:
#' ```R
#' progressr::handlers(global = TRUE)
#' progressr::handlers('cli')
#' ```
#' For more details on formatting progress bars to your liking, see the introduction
#' to the [`{progressr}` package](https://progressr.futureverse.org/articles/progressr-intro.html).

I have tried to strike a balance between respecting the package philosophy of giving users full control and yet making things easier for them with sensible defaults.

I would appreciate feedback on this approach, especially if there are any downsides that I have not anticipated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants