Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow algorithms to use draws of model fits for inits #936

Closed
wants to merge 11 commits into from

Conversation

SteveBronder
Copy link
Collaborator

Submission Checklist

  • Run unit tests
  • Declare copyright holder and agree to license (see below)

Summary

This PR does two things

  1. It allows all algorithms to use inits that are draws object from posterior or fit objects from cmdstanr
  2. Instead of dispatching one process per chain for sampling it instead uses cmdstan's internal parallel chains. This is nice because it means that the data is only copied over for cmdstan once. So if a user wants to run 100 chains with 10GB of data they would have needed 1000GB of memory for the data where now they need only 10GB for the data. Reducing copies of the data also cuts down on memory bloat that can thrash the cpu caches.

Removing the need for multiple procs was done to support multiple inits for pathfinder. Previously, pathfinder was the only algorithm that used cmdstan's internal parallelism for multiple paths. While I was coding up support for pathfinder to accept a list of inits I ran into some odd problems in the lower level parts of cmdstanr that would require a bunch of if statements just for pathfinder during execution and creation/reading/writing files. imo I thought it was best to push sampling to use multiple chains as that helped keep the code a little cleaner.

Using Stan's internal parallel chains has one major downside which is that cmdstan is all or nothing for chain initialization failure. If one chain fails to initialize then the whole program fails. I have an issue (stan-dev/stan#3275) to address this, but I think personally the benefits outweigh the costs.

One other issue is that I couldn't completely remove proc since cmdstan does not use the generate_quantities that uses Stan's internal parallelism. I have a PR (stan-dev/cmdstan#1256) to fix this so the next release of cmdstan we can use that and only need to ever spin up 1 process for many chain algorithms.

Copyright and Licensing

Please list the copyright holder for the work you are submitting
(this will be you or your assignee, such as a university or company):
Simons Foundation

By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses:

@SteveBronder
Copy link
Collaborator Author

Pinging @avehtari if you have time to take this for a spin!

@SteveBronder
Copy link
Collaborator Author

After some comments from @bob-carpenter and @WardBrian in stan-dev/stan#3275 I think the default behavior should be that if any chains fail to init then all chains fail to init. This lines up with cmdstan and cmdstanpy's behavior and is usually caused by something being wrong with the model

@SteveBronder SteveBronder requested a review from jgabry March 20, 2024 14:27
@avehtari
Copy link
Contributor

It seems to work. I don't see documentation on how does the init work if given a Pathfinder object with many draws. Can you clarify?

@SteveBronder
Copy link
Collaborator Author

@avehtari it uses the same logic you had in your original code. For anything that does an approximation we cut the draws down to uniques if less than 95 percent of the samples are unique.

  if (unique_draws < (0.95 * nrow(draws_df))) {
    temp_df = aggregate(.draw ~ lw, data = draws_df, FUN = min)
    draws_df = posterior::as_draws_df(merge(temp_df, draws_df, by = 'lw'))
    draws_df$pareto_weight = exp(draws_df$lw - max(draws_df$lw))
  } else {
    draws_df$pareto_weight = posterior::pareto_smooth(
      exp(draws_df$lw - max(draws_df$lw)), tail = "right")[["x"]]
  }

@SteveBronder SteveBronder removed the request for review from jgabry March 20, 2024 18:45
@SteveBronder
Copy link
Collaborator Author

@jgabry I need to minute to fix a few little things. I forgot about the performance but mingw has when using thread_local variables that makes stan very slow when using threading. I'm changing things so that if the user has wsl, mac, or linux then threading is enabled by default and we do threading in cmdstan, but if the user uses mingw then we just do several processes still

@avehtari
Copy link
Contributor

@avehtari it uses the same logic you had in your original code. For anything that does an approximation we cut the draws down to uniques if less than 95 percent of the samples are unique.

And if, e.g., 4 inits are needed, samples 4 inits from the weights using the weights and without replacement?

@jgabry
Copy link
Member

jgabry commented Mar 20, 2024

@jgabry I need to minute to fix a few little things.

No problem. I can wait to take a look (I have a bunch of other things to work on anyway this afternoon). Thanks again for working on this!

@SteveBronder
Copy link
Collaborator Author

Yep!

@avehtari
Copy link
Contributor

The output from sampling is a bit confusing

Chain 1 
Chain 1 Chain [1] Iteration: 100 / 200 [ 50%]  (Warmup) 
Chain 1 Chain [1] Iteration: 101 / 200 [ 50%]  (Sampling) 
Chain 1 Chain [4] Iteration: 100 / 200 [ 50%]  (Warmup) 
Chain 1 Chain [4] Iteration: 101 / 200 [ 50%]  (Sampling) 
Chain 1 Chain [2] Iteration: 100 / 200 [ 50%]  (Warmup) 
Chain 1 Chain [2] Iteration: 101 / 200 [ 50%]  (Sampling) 
Chain 1 Chain [3] Iteration: 100 / 200 [ 50%]  (Warmup) 
Chain 1 Chain [3] Iteration: 101 / 200 [ 50%]  (Sampling) 
Chain 1 Chain [1] Iteration: 200 / 200 [100%]  (Sampling) 
Chain 1 finished in 4.6 seconds.
MCMC sampling from model 1 posterior took 6.1 sec

All lines start with "Chain 1", and then have "Chain [number]"

@avehtari
Copy link
Contributor

If I try to initialize with a wrong fit object missing a parameter

fit1 <- model1$sample(data=standata1, iter_warmup=100, iter_sampling=100,
                      chains=4, threads=4,
                      init=fit2)

the error message is

Error: The following variables are missing in the draws object: {'intercept'}

It would be better if it would say missing in the fit object, but if that information is not available at the point of error, then this is acceptable.

@avehtari
Copy link
Contributor

> pth5 <- model5$pathfinder(data = standata5, init=0.1,
                          num_paths=10, single_path_draws=40, draws=400,
                          history_size=50, max_lbfgs_iters=100, refresh=0, seed=1)
Pareto k value (19.100021) is greater than 0.7. Importance resampling was not able to improve the approximation, which may indicate that the approximation itself is poor. 
Finished in  11.6 seconds.

> fit5 <- model5$sample(data=standata5, iter_warmup=100, iter_sampling=100,
                      chains=4, threads=4,
                      init=pth5)
Error in process_init_approx(init, num_procs, model_variables, warn_partial) : 
  Not enough distinct draws (4) to create inits.
  • Here the message Error in process_init_approx could be dropped.
  • "Not enough distinct draws (4) to create inits." could say
    "Not enough distinct draws (4) in the Pathfinder object to create inits.
    Try running Pathfinder with psis_resample=FALSE"
> pth5b <- model5$pathfinder(data = standata5, init=0.1,
                          num_paths=10, single_path_draws=40, draws=400,
                          history_size=50, max_lbfgs_iters=100, refresh=0,
                          seed=1, psis_resample=FALSE)
Finished in  11.7 seconds.
> fit5 <- model5$sample(data=standata5, iter_warmup=100, iter_sampling=100,
                      chains=4, threads=4,
                      init=pth5b)

Pareto k-hat = 19.1. Mean does not exist, making empirical mean estimate of the draws not applicable.
Error in posterior::pareto_smooth(exp(draws_df$lw - max(draws_df$lw)),  : 
  subscript out of bounds
In addition: Warning message:
In read_csv_metadata(output_file) : NAs introduced by coercion
  • Something wrong here, as my R code create_inits() works for this

@SteveBronder
Copy link
Collaborator Author

but if that information is not available at the point of error, then this is acceptable.

That warning is made at a much lower level. I'll see if I can move it up

All lines start with "Chain 1", and then have "Chain [number]"

Yes. tbh I think I'm going to close this PR and open up a simpler one that just does the inits. I thought doing multi path pathfinder inits would be really messy without also enabling parallelism from within Stan, but now that I understand the codebase more I think it's actually pretty simple. To get parallelism from within cmdstan it's going to require a lot of changes and I think needs to be a separate PR

"Not enough distinct draws (4) in the Pathfinder object to create inits. Try running Pathfinder with psis_resample=FALSE"

Fixed

Something wrong here, as my R code create_inits() works for this

Can you send me your standata5? Just to make sure I can reproduce the error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants