-
Notifications
You must be signed in to change notification settings - Fork 28
[WIP] MCMC sampling of rates and Tprobs #187
base: master
Are you sure you want to change the base?
Conversation
Here is the driver code:
|
The reason I use the
|
I'll add more comments and docstrings later. It might be useful for people to just ask questions about what isn't intuitive about the PyMC interface. |
Looks cool. What is the 'thin' kwarg supposed to do? Will this only work for dense matrices? Do you think this is going to be possible for a 10,000 state model? |
The current implementation will only work for dense models. I have ideas about how to scale things up better, but it's definetely going to be a challenge. I think there are open questions about what priors to use etc. For example, transition matrices are never sparse, so we shouldn't really be assuming sparsity during estimation. This leads to huge problem with 10,000 state models. I think things might be better for rate matrices, but then we have the issue of matrix exponentiation. |
Also: the current code uses an Uninformative prior, not a Dirichlet prior. I'm still figuring out the best way to do a Dirichlet prior. The issue is that I need to "stack" a bunch of Dirichlet random variables, but PyMC doesn't have a very elegant way to do this. I think I might end up just using an Uninformative prior and throwing the prior on as an external "potential". |
Speed is definitely an issue here--these chains are super slow to converge... |
How slow do you mean? Waiting a day, probably isn't a big deal. Can you make a parallel version? |
It seems like we need ~1-2 minutes to converge for small matrices. I think the code is still worth having, as it's simple and easy to understand. Some error analysis is better than none... |
Agreed. -Robert On Fri, Apr 26, 2013 at 3:31 PM, kyleabeauchamp [email protected]
|
So for estimating rates, it might be natural to use an exponential distribution as a prior. This should place maximum prior weight at Kij = 0, inducing sparsity in the rate matrix. We would then have a single hyperparameter, I can't remember if I've ever seen this in the literature, but it seems like the right thing to do. |
I just checked in a revised codebase that appears to be working. Here is a driver program that checks all 4 types of model:
So I'm finding some differences between the different models. The differences are most obvious in the few-count regime, as we would expect. Thus, we have an interesting question: what is the right answer? I suspect there might not be a "right" answer. |
Hm, there might still be something funny with the reversible rate sampler. I'm looking into it. |
I think I fixed the issue with the ReversibleRateSampler. I needed to discard the early samples (burn-in). I added this feature and now I think things are working much better. I also added tests, which are disabled by default due to slowness. |
Here is an updated driver program:
|
I think this is almost ready, so people can comment on what else they want to see. |
Here are some things that I didn't implement: Estimation with a fixed value of equilibrium populations. These are considerably more difficult, but do-able. We can implement them as people find it necessary. |
So I have been experimenting with NumExpr for speeding up likelihood calculations (this is for my ensemble stuff, but it would also apply here and to MLE code in MSMB). For expressions involving vectorized logarithms and algebra, it's trivial to get a ~3X speedup by just throwing in something like
|
Sweet. numpexpr is already a dependency, via tables, so this is all upside. -Robert On May 14, 2013, at 1:44 PM, kyleabeauchamp wrote:
|
@kyleabeauchamp, is this identical to what's proposed in J.C.P 128, 244103 (2008). If not, what are the differences? |
Similar, but not identical. I think pymc uses normally distributed proposal steps, with a proposal scale that is tuned to optimize the acceptance rate. In Noe's paper, he analytically calculates a bunch of proposal distributions. For a model that is on the boundary of the feasible space, I bet Frank's stuff works better. For models in the interior, I bet ours might be better. I'm not sure--I'm not an expert. I think the real value in my method is simplicity. All I had to do was make find a parameterization with the correct restraints (detailed balance, normalization, rate normalization, etc), but pymc does the rest. We could possibly get performance enhancements by using a Gibbs sampling type approach, where we manually work out how we propose samples. That would be more work... |
But it's sampling the same distributions. The difference, you're saying, is just efficiency? |
Well, there's also the question of prior. For a general transition matrix, the dirichlet prior is pretty standard. I don't think it's as clear-cut how to parameterize the prior for constrained models or for rate models. |
Fair. On May 27, 2013, at 8:34 PM, kyleabeauchamp wrote:
|
I think there's a lot that could be done to generalize and accelerate this code, but I think it might be wise to wait to see how we plan to build models in the future. In particular, large sparse models are going to be a pain in the ass. If we can avoid them using some sort of TICA trick, then we can really use MCMC throughout the pipeline. |
Yeah, I'm okay with slow code really. I think we want flexibility and generality first. We can deal with speed later. -Robert On May 27, 2013, at 8:44 PM, kyleabeauchamp wrote:
|
Regarding speed, there are several things that might be rate limilting, depending on what regime we work in:
|
OK so it appears that I will likely need to rewrite this code to better handle NANs from nonzero counts. |
zero counts, rather. |
Argh. So I think there's a bug in pymc that won't let you initialize a Dirichlet RV at a boundary point. If you have alpha=1 (e.g. uniform), then that point is perfectly valid. The point is that we therefore cannot use the unregularized MLE as the starting point for MCMC. |
@schwancr Do you have a preference for which type of estimator you want? Rate or TProb? General or reversible? Also, how many states (and how sparse). This will help me design tests. |
So these issues aren't a problem if you use a prior, right? I mean it would be nice to be able to do this for microstate models (thousands of states), but I'm ok if we can only do macrostate models (dense only). |
I've been using tprob, but it doesn't hurt to have both |
So I'm going to assert that microstate models are totally infeasible with the current tool. I'd even guess that we have a limit of ~10 states or so... |
Yea that's fine. I did get an answer with a 200 state model, by the way. On Fri, Oct 18, 2013 at 3:36 PM, kyleabeauchamp [email protected]:
|
I think this serves as a good start to what will become an msmbuilder.error We can add bootstrapping to this, and maybe implement nina's method, and On Fri, Oct 18, 2013 at 3:37 PM, Christian Schwantes
|
So I think I should probably throw a Dirichlet prior on the Reversible transtion matrix as well. The idea is to just use a potential to add the Dirichlet likelihood term. |
Sure, is that the same as feeding in a count matrix with pseudocounts On Fri, Oct 18, 2013 at 3:50 PM, kyleabeauchamp [email protected]:
|
I don't think they're exactly the same for the case of reversible matrices--the integrals no longer work out exactly because the domain is restricted, so there's no analytic formula for the posterior. |
It's only conjugate for the unrestricted, general transition matrix. |
Ah ok. Will you add the prior behind the scenes? How does one specify the On Fri, Oct 18, 2013 at 3:53 PM, kyleabeauchamp [email protected]:
|
But that does remind me--we should be using the analytic answer for the unrestricted case. |
So I'll add an extra term |
By unrestricted you mean unconstrained to be reversible? On Fri, Oct 18, 2013 at 3:54 PM, kyleabeauchamp [email protected]:
|
yeah, correct. |
So I'm currently writing up my notes on this stuff into a PDF so that we can make sure we're calculating the right thing. Once I finish with my notes, I can then fix various code things and write the tests. |
Still working on this. I'm currently trying to figure out a way to get a Dirchlet prior into the reversible TProb sampler. I think the approach I'm currently using probably won't work for that, as I'm essentially doing a coordinate transformation without paying attention to the Jacobian--which essentially means sampling using a slightly modified prior... |
This is to give people a feel for the interface. At this point, mcmc.TransitionSampler and mcmc.RateSampler are working.