Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method to create inits from fit or draws objects #876

Closed
avehtari opened this issue Nov 10, 2023 · 11 comments
Closed

Add method to create inits from fit or draws objects #876

avehtari opened this issue Nov 10, 2023 · 11 comments
Assignees
Labels
feature New feature or request

Comments

@avehtari
Copy link
Contributor

avehtari commented Nov 10, 2023

Especially useful now that Pathfinder is useful for initializing MCMC. To show what I mean I show below what I came up quickly for testing purposes

pth5 <- model5$pathfinder(data = standata5, init=0.1, num_paths=10, single_path_draws=40,
                          history_size=50, max_lbfgs_iters=100)

# create a fit object (should now work also with pathfinder object 
fit5 <- model5$sample(data=standata5, chains=1, iter_sampling=1, fixed_param=TRUE, refresh=0)
# get parameter names so that we don't need to include thousands of generated quantities variables
# it would sufficient to get just the parameter names without dimensions, so in that sense the instantiated fit would 
# not be needed, but I guess this is now the only way? The downside is that it requires compiling model methods
params <- names(fit5$variable_skeleton(transformed_parameters = FALSE, generated_quantities = FALSE))
# see the function definition below, argument `variables` follows CmdStanR argument naming
init5 <- create_inits(pth5, variables=params)

#' Function to form a list of list of initial values from a draws object.
#' Arguments `variable` and  `ndraws` follow posterior package argument naming
as_inits <- function(draws, variable=NULL, ndraws=4) {
  ndraws <- min(ndraws(draws),ndraws)
  if (is.null(variable)) {variable = variables(draws)}
  draws <- draws |> as_draws_rvars()
  inits <- lapply(1:ndraws,
                  function(drawid) {
                    sapply(variable,
                           function(var) {
                             drop(draws_of(subset_draws(draws, variable=var, draw=drawid)[[1]]))
                           })
                  })
  inits
}

#' Function to form a list of list of initial values from a Pathfinder object.
#' Something like this will be eventually available in `cmdstanr` package.
create_inits <- function(pthf, variables=NULL, ndraws=4) {
g  if (is.null(variables)) {
    # set variable names to be list of parameter names
    variables <- names(pthf$variable_skeleton(transformed_parameters = FALSE,
                                              generated_quantities = FALSE))
  }
  draws <- pthf$draws(format="df")
  draws <- draws |>
    mutate_variables(lw = lp__ - lp_approx__)
  ndist <- n_distinct(extract_variable(draws,"lw"))
  if (ndist < ndraws) {
    stop(paste0("Not enough distinct draws (", ndist, ") to create inits."))
  }
  if (ndist < 0.95*ndraws(draws)) {
    # Resampling has been done in Stan, compute weights for distinct draws
    # (these are now non Pareto smoothed as we have lost the original information)
    draws <- draws |>
      group_by(lw) |>
      summarise(.draw=min(.draw)) |>
      left_join(draws, by = ".draw") |>
      as_draws_df() |>
      mutate_variables(lw = lw.x,
                       w = exp(lw-max(lw)))
  } else {
    # Resampling was not done in Stan, compute Pareto smoothed weights
    draws <- draws |>
      mutate_variables(w=pareto_smooth(exp(lw-max(lw)), tail="right"))
  }
  draws |>
    weight_draws(weights=extract_variable(draws,"w"), log=FALSE) |>
    resample_draws(ndraws=ndraws, method = "simple_no_replace") |>
    as_inits(variable=variables, ndraws=ndraws)
}

There is a corresponding CmdStanPy method `create_inits`

EDIT: updated the code to support both `psis_resample=TRUE` and `psis_resample=FALSE`, and support matrix and array type parameters.
@avehtari avehtari added the feature New feature or request label Nov 10, 2023
@andrjohns andrjohns assigned andrjohns and unassigned jgabry Dec 13, 2023
@andrjohns
Copy link
Collaborator

@jgabry I'll tackle this as well

@jgabry
Copy link
Member

jgabry commented Dec 13, 2023

Awesome, thanks @andrjohns!

@jgabry
Copy link
Member

jgabry commented Dec 13, 2023

Just did the 0.7.0 release. Happy to another one whenever these additional features are ready. Thanks for working on this!

@reuning
Copy link

reuning commented Jan 10, 2024

fwiw, I'd love this feature (currently working on using pathfinder to initialize a large model). If there is anything I can do to help let me know.

@andrjohns
Copy link
Collaborator

@avehtari The cmdstanpy implementation of create_inits just returns a random draw from the existing pathfinder estimates, but your example above is performing an additional PSIS step before selecting the first nchain draws as inits.

Ideally both of the interfaces should be consistent here. Doesn't the multi-pathfinder call already perform PSIS resampling on the results? Is there a benefit to repeating it here?

@avehtari
Copy link
Contributor Author

When I made the issue, 1) PSIS resampling was often failing in CmdStan, 2) there was no option for sample without replacement which would be better for getting unique initial values. Now 1 has been fixed, and maybe 2, too?

@avehtari
Copy link
Contributor Author

Birthdays case study illustrates the use of the above functions
https://users.aalto.fi/~ave/casestudies/Birthdays/birthdays.html

@jgabry
Copy link
Member

jgabry commented Mar 13, 2024

@andrjohns Just checking in on this. Are you still planning on working on it? Let me know if it would be helpful to chat about the implementation.

@SteveBronder
Copy link
Collaborator

I'm finishing this up hopefully this week

@jgabry
Copy link
Member

jgabry commented Mar 13, 2024

Awesome thanks Steve!

@avehtari
Copy link
Contributor Author

Closed by #937

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants