Any way to inform ETA estimate with prior info? #142

mike-lawrence · 2022-08-25T12:02:16Z

I'm working on a pipeline-prototyping package that uses progressr to track progress as many elements of a list are processed and where efficient re-processing is enabled by skipping over already processed elements. This results in a scenario where I gather the current ETA calculations of progressr become inaccurate as early elements that have already been successfully processed return from the processing function quickly while later elements can be expected to take much longer. I save history data associated with each element, including processing time duration, and I'm wondering if it'd be possible to provide that duration to progressr somehow to leave it with more accurate ETA estimates. A minimal example of my case is:

options(
	progressr.handlers = progressr::handler_progress(
		format   = "[:bar] :spin :current/:total :percent in :elapsed ETA: :eta (:message)"
		, clear = FALSE
	)
)
options(progressr.clear=FALSE)
f = function(.x,pb){
    x_hash = digest::digest(.x)
    if(file.exists(x_hash)){
        t = readRDS(x_hash)
    }else{
        t = runif(1,10,20)
        Sys.sleep(t)
        if(.x%%2){  # odd elements get saved, triggering skip next time they're processed
            saveRDS(t,x_hash)
        }
    }
    pb(duration = t) #imagined API for supplying the duration manually
    
}

#on first run, ETA is accurate
progressr::with_progress({
    .x = 1:10
    pb <- progressr::progressor(along = .x)
    y = furrr::future_map(.x=.x,.f=f,pb=pb,.progress=F)
})

#on second run, some elements are skipped internally by f()
progressr::with_progress({
    .x = 1:10
    pb <- progressr::progressor(along = .x)
    y = furrr::future_map(.x=.x,.f=f,pb=pb,.progress=F)
})

HenrikBengtsson · 2022-08-25T15:49:18Z

The progressr package is not aware of time. The task for estimating ETA is passed on to the underlying progress handler, e.g. the progress package (as here). This was a deliberate choice to keep the progressr API as simple as possible, especially in its early stage of life.

One idea I had in the past related to this was to signal that some progress steps are skipped whereas others are not (the default), e.g.

pb(skip = TRUE)

or possibly

pb(skip = 0.9)

where skip is a fraction in [0,1] where skip=1.0 is full skip ("zero cost"), and skip=0.9 (10% cost), etc. (Maybe, cost is a better name, where cost = 1.0 is the default)

Then the progress handler can do whatever they'd like with that info, e.g. adjust their ETA estimates, or completely ignore it. See also r-lib/progress#120 for my proposal to the progress package. This is an idea that needs to maturity and be explored, e.g. is skip the most generic approach (where ETA is just a special case), is it sufficient, or do we need more.

tripartio · 2024-01-30T16:40:46Z

progressr::handlers('cli') supports very accurate ETA estimates by default with no further code needed. I am especially surprised about how good the estimates are even with parallel processing with the {future} framework.

HenrikBengtsson · 2024-01-30T16:53:02Z

progressr::handlers('cli') supports very accurate ETA estimates by default with no further code needed. I am especially surprised about how good the estimates are even with parallel processing with the {future} framework.

Thanks for sharing your experience and observations.

FWIW, it makes sense that the ETA works equally well in parallel. cli cannot even tell if we're running sequentially or parallelly; all it sees is an incoming stream of progress updates. The only difference, when running in parallel, is that they'll arrive more frequently. So, cli just concludes that things are running faster and adjusts its ETA accordingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any way to inform ETA estimate with prior info? #142

Any way to inform ETA estimate with prior info? #142

mike-lawrence commented Aug 25, 2022 •

edited

Loading

HenrikBengtsson commented Aug 25, 2022 •

edited

Loading

tripartio commented Jan 30, 2024 •

edited

Loading

HenrikBengtsson commented Jan 30, 2024

Any way to inform ETA estimate with prior info? #142

Any way to inform ETA estimate with prior info? #142

Comments

mike-lawrence commented Aug 25, 2022 • edited Loading

HenrikBengtsson commented Aug 25, 2022 • edited Loading

tripartio commented Jan 30, 2024 • edited Loading

HenrikBengtsson commented Jan 30, 2024

mike-lawrence commented Aug 25, 2022 •

edited

Loading

HenrikBengtsson commented Aug 25, 2022 •

edited

Loading

tripartio commented Jan 30, 2024 •

edited

Loading