-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create inits from fit and draws objects #937
Conversation
@avehtari Sorry for closing the other PR but I started doing some heavy feature creep there and this one is much simpler |
An example Pathfinder object which caused a failure with the previous PR is at https://users.aalto.fi/~ave/casestudies/Birthdays/pth5.rds |
@avehtari when I try loading that into R I'm getting load("~/Downloads/pth5.rds")
Error in load("~/Downloads/pth5.rds") :
bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning messages:
1: In readChar(con, 5L, useBytes = TRUE) :
truncating string with embedded nuls
2: file ‘pth5.rds’ has magic number 'X'
Use of save versions prior to 2 is deprecated Does that rds file come from one of the scripts in that folder? |
The R code is at https://github.com/avehtari/casestudies/blob/master/Birthdays/birthdays.R Which R version you are using? |
4.3.0. Maybe I need to upgrade? But I'll run the scripts you sent over and see if that causes my code to break |
@avehtari running pth5 from the script I got the error about not enough distinct draws which I think is what we want.
Turning off psis resampling stopped the error so I think that is what we want |
Yes, this is what we want. It just didn't work like that with the previous PR. I'll test with this PR |
@SteveBronder was using an older |
Ok thanks @avehtari. I think the windows unit tests are failing due to an unrelated reason. Looks like some issue setting up R, which often is resolved at some point without us having to do anything. But I will merge master into this branch just in case (there was an update to something related to GitHub actions). @SteveBronder I've started taking a look and so far it looks great, but I will do a deeper dive after @avehtari finishes his testing in case he finds any larger issues. |
just a comment on the windows tests, not related to this PR per se - we are seeing the same failures for other packages, so definitely unrelated to the PR. They seem intermittent, sometimes passing, sometimes failing... |
1 similar comment
just a comment on the windows tests, not related to this PR per se - we are seeing the same failures for other packages, so definitely unrelated to the PR. They seem intermittent, sometimes passing, sometimes failing... |
My Birthdays case study runs now completely with new inits and other recent CmdStanR features. I will next check the documentation added in this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments. In addition, the PR seems to be missing a doc update for the init arguments in model.R
R/args.R
Outdated
#' @param init A `CmdStanMCMC` class | ||
#' @param num_procs Number of inits requested | ||
#' @param model_variables A list of all parameters with their types and | ||
#' number of dimensions. Typically the output of model$variables(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output of model$variables() includes also other variables than parameters (data, transformed parameters, and generated quantities). More accurate would be "output of model$variables()$parameters". The same comment repeats in many function docs. As the dimensionality of generated quantities is often much bigger than the parameters, it is not useful to include them in init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
R/args.R
Outdated
init_draws_df = posterior::resample_draws(draws_df, ndraws = num_procs, | ||
weights = draws_df$pareto_weight, method = "simple_no_replace") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I read correctly, this resampling is done always, even if Stan did already do resampling. If model$pathfinder(..., psis_resample=TRUE) was used and there are enough unique draws, then the resampling here should be done with uniform weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense!
R/args.R
Outdated
init_draws_df = posterior::resample_draws(draws_df, ndraws = num_procs, | ||
method = "simple") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CmdStanMLE should only have one draw. Resampling just copies that one draw several times. There is no performance hit, but it just looks strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'll fix this to just rep the draw a bunch of times
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #937 +/- ##
==========================================
- Coverage 88.33% 88.31% -0.02%
==========================================
Files 12 12
Lines 4553 4640 +87
==========================================
+ Hits 4022 4098 +76
- Misses 531 542 +11 ☔ View full report in Codecov by Sentry. |
Additional observations using the Birthday case study
|
…s draws than procs
Yes I changed this now so that if the number of draws is less than the number of processes it repeats the inits so that we have the amount of inits as processes. Since all the other process_init functions always return the same number of draws as the number of processes this logic will only happen when the user passes a draws object.
Oh that should just give a warning. I think it makes sense to only have some parameters if for instance you are doing iterative model building
Agree! Just added that with a nice error message @jgabry one more thing. It would be pretty easy to add something here so that if a user passes a sampling object, has not included their own inverse metric, and is doing sampling with the same model / parameters we can use the sampling init object's inverse metric. Do you want me to include that here or should that be another PR? |
@SteveBronder Anything I can do to help test/debug this? |
also add @venpopov as contributor
@SteveBronder Anything I can do to help test/debug this? Also I just fixed a merge conflict with master, moved you up in the DESCRIPTION file with the other full authors (which we probably should have done when you added the pathfinder method initially), and added @venpopov to DESCRIPTION as a contributor. I think we're super close to merging this. |
Not yet, I want to test if I get this same issue on master from just running a bunch of the algorithms multiple times. It looks like this code is not touching anything related to reading / writing so I'm pretty confused by the failures |
If you have time to test that today it would be nice but otherwise I'll try that tmrw |
I can make a branch off of master than runs basically the same test file you added but overrides the |
Ok this should be running the same code as on your branch, just replacing the inits with an init function instead of using the fitted model objects: https://github.com/stan-dev/cmdstanr/pull/954/files Let's see if it fails. |
Much appreciated! |
This and master are getting a weird error in windows latest when trying to build stringi? |
Besides that though I think I figured out the original bug with the tests |
Yeah
Brilliant! Let me know when you're done with changes and I'll do a final look-over before merging |
I think it is ready to merge! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is going to be very well-appreciated!
Submission Checklist
Summary
Adds a
process_inits
function to turn fit objects into a list of list of inits. All types of fits from cmdstanr as well as draws objects can be used as inits now.Tests can be run by calling
Copyright and Licensing
Please list the copyright holder for the work you are submitting
(this will be you or your assignee, such as a university or company):
Simon's Foundation
By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses: