[WIP] MCMC sampling of rates and Tprobs #187

kyleabeauchamp · 2013-04-26T19:18:40Z

This is to give people a feel for the interface. At this point, mcmc.TransitionSampler and mcmc.RateSampler are working.

kyleabeauchamp · 2013-04-26T19:19:22Z

Here is the driver code:

import pymc
import numpy as np
from msmbuilder import mcmc

num_states = 3
alpha = 1

C = np.ones((num_states,num_states))
C[0,0] = 1000

sampler = mcmc.TransitionSampler(C)
sampler.sample(50000,thin=20)

Ti = []
for T in sampler.iterate_T():
    Ti.append(T)

Ti = np.array(Ti)
T = Ti.mean(0)

sampler = mcmc.RateSampler(C)
sampler.sample(50000,thin=20)

Ki = []
for K in sampler.iterate_K():
    Ki.append(K)

Ki = np.array(Ki)
K = Ki.mean(0)

kyleabeauchamp · 2013-04-26T19:20:41Z

The reason I use the iterate_K() function is that we have to handle two cases:

The samples all fit in memory.
The samples are too big for in-memory

kyleabeauchamp · 2013-04-26T19:22:25Z

I'll add more comments and docstrings later. It might be useful for people to just ask questions about what isn't intuitive about the PyMC interface.

schwancr · 2013-04-26T19:29:52Z

Looks cool. What is the 'thin' kwarg supposed to do?

Will this only work for dense matrices? Do you think this is going to be possible for a 10,000 state model?

kyleabeauchamp · 2013-04-26T19:34:38Z

thin is how much to subsample the MCMC samples to reduce correlation.

The current implementation will only work for dense models. I have ideas about how to scale things up better, but it's definetely going to be a challenge. I think there are open questions about what priors to use etc.

For example, transition matrices are never sparse, so we shouldn't really be assuming sparsity during estimation. This leads to huge problem with 10,000 state models.

I think things might be better for rate matrices, but then we have the issue of matrix exponentiation.

kyleabeauchamp · 2013-04-26T19:54:58Z

Also: the current code uses an Uninformative prior, not a Dirichlet prior. I'm still figuring out the best way to do a Dirichlet prior. The issue is that I need to "stack" a bunch of Dirichlet random variables, but PyMC doesn't have a very elegant way to do this.

I think I might end up just using an Uninformative prior and throwing the prior on as an external "potential".

kyleabeauchamp · 2013-04-26T22:00:19Z

Speed is definitely an issue here--these chains are super slow to converge...

schwancr · 2013-04-26T22:29:38Z

How slow do you mean?

Waiting a day, probably isn't a big deal. Can you make a parallel version?

kyleabeauchamp · 2013-04-26T22:31:30Z

It seems like we need ~1-2 minutes to converge for small matrices. I think the code is still worth having, as it's simple and easy to understand. Some error analysis is better than none...

rmcgibbo · 2013-04-26T22:57:47Z

Agreed.

-Robert
Sent from my iPhone.

On Fri, Apr 26, 2013 at 3:31 PM, kyleabeauchamp [email protected]
wrote:

It seems like we need ~1-2 minutes to converge for small matrices. I think the code is still worth having, as it's simple and easy to understand. Some error analysis is better than none...

Reply to this email directly or view it on GitHub:
#187 (comment)

kyleabeauchamp · 2013-04-27T04:44:35Z

So for estimating rates, it might be natural to use an exponential distribution as a prior. This should place maximum prior weight at Kij = 0, inducing sparsity in the rate matrix.

We would then have a single hyperparameter, $\lambda$, that would be shared for all the rates. We could estimate $\lambda$ by cross validation.

I can't remember if I've ever seen this in the literature, but it seems like the right thing to do.

kyleabeauchamp · 2013-04-29T18:25:59Z

I just checked in a revised codebase that appears to be working. Here is a driver program that checks all 4 types of model:

import pymc
import numpy as np
from msmbuilder import mcmc, MSMLib
import scipy.linalg

num_states = 2
alpha = 1

C = 100 * np.ones((num_states,num_states))
C[0,0] *= 1000
C[0,1] += 1

T0 = MSMLib.estimate_transition_matrix(C)
X0 = MSMLib.mle_reversible_count_matrix(scipy.sparse.csr_matrix(C)).toarray()
T0_rev = MSMLib.estimate_transition_matrix(X0)

sampler = mcmc.TransitionSampler(C)
sampler.sample(1000000,thin=500)
T = sampler.mcmc.trace("T")[:].mean(0)
T_sigma = sampler.mcmc.trace("T")[:].std(0)

sampler = mcmc.RateSampler(C)
sampler.sample(1000000,thin=500)
T_K = sampler.mcmc.trace("T")[:].mean(0)
T_K_sigma = sampler.mcmc.trace("T")[:].std(0)


sampler = mcmc.ReversibleTransitionSampler(C)
sampler.sample(1000000,thin=500)
T_rev = sampler.mcmc.trace("T")[:].mean(0)
T_rev_sigma = sampler.mcmc.trace("T")[:].std(0)

sampler = mcmc.ReversibleRateSampler(C)
sampler.sample(1000000,thin=500)
T_K_rev = sampler.mcmc.trace("T")[:].mean(0)
T_K_rev_sigma = sampler.mcmc.trace("T")[:].std(0)

So I'm finding some differences between the different models. The differences are most obvious in the few-count regime, as we would expect. Thus, we have an interesting question: what is the right answer? I suspect there might not be a "right" answer.

kyleabeauchamp · 2013-04-30T18:26:15Z

Hm, there might still be something funny with the reversible rate sampler. I'm looking into it.

kyleabeauchamp · 2013-05-06T18:46:31Z

I think I fixed the issue with the ReversibleRateSampler. I needed to discard the early samples (burn-in). I added this feature and now I think things are working much better.

I also added tests, which are disabled by default due to slowness.

kyleabeauchamp · 2013-05-06T18:46:42Z

Here is an updated driver program:

import numpy as np
import scipy.sparse
from msmbuilder import mcmc, MSMLib

num_states = 2
alpha = 1

num_steps = 1000000
thin = 500
burn = 50000

C = 1000000 * np.ones((num_states,num_states))
C[0,0] *= 1000
C[0,1] += 1

T0 = MSMLib.estimate_transition_matrix(C)
X0 = MSMLib.mle_reversible_count_matrix(scipy.sparse.csr_matrix(C)).toarray()
T0_rev = MSMLib.estimate_transition_matrix(X0)

sampler = mcmc.TransitionSampler(C)
sampler.sample(num_steps,thin=thin, burn=burn)
T = sampler.mcmc.trace("T")[:].mean(0)
T_sigma = sampler.mcmc.trace("T")[:].std(0)

sampler = mcmc.RateSampler(C)
sampler.sample(num_steps,thin=thin, burn=burn)
T_K = sampler.mcmc.trace("T")[:].mean(0)
T_K_sigma = sampler.mcmc.trace("T")[:].std(0)


sampler = mcmc.ReversibleTransitionSampler(C)
sampler.sample(num_steps,thin=thin, burn=burn)
T_rev = sampler.mcmc.trace("T")[:].mean(0)
T_rev_sigma = sampler.mcmc.trace("T")[:].std(0)

sampler = mcmc.ReversibleRateSampler(C)
sampler.sample(num_steps,thin=thin, burn=burn)
T_K_rev = sampler.mcmc.trace("T")[:].mean(0)
T_K_rev_sigma = sampler.mcmc.trace("T")[:].std(0)


T0 - T
T0 - T_K
T0_rev - T_rev
T0_rev - T_K_rev

kyleabeauchamp · 2013-05-06T18:48:13Z

I think this is almost ready, so people can comment on what else they want to see.

kyleabeauchamp · 2013-05-07T05:05:21Z

Here are some things that I didn't implement:

Estimation with a fixed value of equilibrium populations.
Sparse estimation.
Estimation with a fixed sparsity structure.

These are considerably more difficult, but do-able. We can implement them as people find it necessary.

kyleabeauchamp · 2013-05-14T20:44:47Z

So I have been experimenting with NumExpr for speeding up likelihood calculations (this is for my ensemble stuff, but it would also apply here and to MLE code in MSMB). For expressions involving vectorized logarithms and algebra, it's trivial to get a ~3X speedup by just throwing in something like

rmcgibbo · 2013-05-14T21:28:05Z

Sweet. numpexpr is already a dependency, via tables, so this is all upside.

-Robert

On May 14, 2013, at 1:44 PM, kyleabeauchamp wrote:

So I have been experimenting with NumExpr for speeding up likelihood calculations (this is for my ensemble stuff, but it would also apply here and to MLE code in MSMB). For expressions involving vectorized logarithms and algebra, it's trivial to get a ~3X speedup by just throwing in something like
—
Reply to this email directly or view it on GitHub.

rmcgibbo · 2013-05-28T03:20:21Z

@kyleabeauchamp, is this identical to what's proposed in J.C.P 128, 244103 (2008). If not, what are the differences?

kyleabeauchamp · 2013-05-28T03:30:53Z

Similar, but not identical. I think pymc uses normally distributed proposal steps, with a proposal scale that is tuned to optimize the acceptance rate.

In Noe's paper, he analytically calculates a bunch of proposal distributions.

For a model that is on the boundary of the feasible space, I bet Frank's stuff works better. For models in the interior, I bet ours might be better. I'm not sure--I'm not an expert.

I think the real value in my method is simplicity. All I had to do was make find a parameterization with the correct restraints (detailed balance, normalization, rate normalization, etc), but pymc does the rest.

We could possibly get performance enhancements by using a Gibbs sampling type approach, where we manually work out how we propose samples. That would be more work...

rmcgibbo · 2013-05-28T03:32:45Z

But it's sampling the same distributions. The difference, you're saying, is just efficiency?

kyleabeauchamp · 2013-05-28T03:34:37Z

Well, there's also the question of prior.

For a general transition matrix, the dirichlet prior is pretty standard. I don't think it's as clear-cut how to parameterize the prior for constrained models or for rate models.

rmcgibbo · 2013-05-28T03:36:28Z

Fair.

On May 27, 2013, at 8:34 PM, kyleabeauchamp wrote:

Well, there's also the question of prior.

For a general transition matrix, the dirichlet prior is pretty standard. I don't think it's as clear-cut how to parameterize the prior for constrained models or for rate models.

—
Reply to this email directly or view it on GitHub.

kyleabeauchamp · 2013-05-28T03:44:31Z

I think there's a lot that could be done to generalize and accelerate this code, but I think it might be wise to wait to see how we plan to build models in the future. In particular, large sparse models are going to be a pain in the ass. If we can avoid them using some sort of TICA trick, then we can really use MCMC throughout the pipeline.

rmcgibbo · 2013-05-28T03:46:18Z

Yeah, I'm okay with slow code really. I think we want flexibility and generality first. We can deal with speed later.

-Robert

On May 27, 2013, at 8:44 PM, kyleabeauchamp wrote:

I think there's a lot that could be done to generalize and accelerate this code, but I think it might be wise to wait to see how we plan to build models in the future. In particular, large sparse models are going to be a pain in the ass. If we can avoid them using some sort of TICA trick, then we can really use MCMC throughout the pipeline.

—
Reply to this email directly or view it on GitHub.

kyleabeauchamp · 2013-05-28T03:53:18Z

Regarding speed, there are several things that might be rate limilting, depending on what regime we work in:

Large numbers of pymc Potential function calls (fix with Cython?)
Large number of exp() calls with small inputs (fix with cython)
exp() calls with large inputs (fix with numexpr)
Dense or sparse matrix math.

kyleabeauchamp · 2013-10-18T21:31:41Z

OK so it appears that I will likely need to rewrite this code to better handle NANs from nonzero counts.

kyleabeauchamp · 2013-10-18T21:31:50Z

zero counts, rather.

kyleabeauchamp · 2013-10-18T22:05:01Z

Argh. So I think there's a bug in pymc that won't let you initialize a Dirichlet RV at a boundary point. If you have alpha=1 (e.g. uniform), then that point is perfectly valid. The point is that we therefore cannot use the unregularized MLE as the starting point for MCMC.

kyleabeauchamp · 2013-10-18T22:30:12Z

@schwancr Do you have a preference for which type of estimator you want? Rate or TProb? General or reversible?

Also, how many states (and how sparse). This will help me design tests.

schwancr · 2013-10-18T22:31:36Z

So these issues aren't a problem if you use a prior, right?

I mean it would be nice to be able to do this for microstate models (thousands of states), but I'm ok if we can only do macrostate models (dense only).

schwancr · 2013-10-18T22:32:16Z

I've been using tprob, but it doesn't hurt to have both

kyleabeauchamp · 2013-10-18T22:36:19Z

So I'm going to assert that microstate models are totally infeasible with the current tool. I'd even guess that we have a limit of ~10 states or so...

schwancr · 2013-10-18T22:37:59Z

Yea that's fine. I did get an answer with a 200 state model, by the way.
so we might be able to go further than 10

On Fri, Oct 18, 2013 at 3:36 PM, kyleabeauchamp [email protected]:

So I'm going to assert that microstate models are totally infeasible with
the current tool. I'd even guess that we have a limit of ~10 states or so...

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/187#issuecomment-26635075
.

schwancr · 2013-10-18T22:38:44Z

I think this serves as a good start to what will become an msmbuilder.error
module.

We can add bootstrapping to this, and maybe implement nina's method, and
when we come up with something better, then it can be added easily.

On Fri, Oct 18, 2013 at 3:37 PM, Christian Schwantes
[email protected]:

Yea that's fine. I did get an answer with a 200 state model, by the way.
so we might be able to go further than 10

On Fri, Oct 18, 2013 at 3:36 PM, kyleabeauchamp [email protected]:

So I'm going to assert that microstate models are totally infeasible with
the current tool. I'd even guess that we have a limit of ~10 states or so...

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/187#issuecomment-26635075
.

kyleabeauchamp · 2013-10-18T22:50:07Z

So I think I should probably throw a Dirichlet prior on the Reversible transtion matrix as well. The idea is to just use a potential to add the Dirichlet likelihood term.

schwancr · 2013-10-18T22:50:42Z

Sure, is that the same as feeding in a count matrix with pseudocounts
everywhere?

On Fri, Oct 18, 2013 at 3:50 PM, kyleabeauchamp [email protected]:

So I think I should probably throw a Dirichlet prior on the Reversible
transtion matrix as well. The idea is to just use a potential to add the
Dirichlet likelihood term.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/187#issuecomment-26635685
.

kyleabeauchamp · 2013-10-18T22:53:13Z

I don't think they're exactly the same for the case of reversible matrices--the integrals no longer work out exactly because the domain is restricted, so there's no analytic formula for the posterior.

kyleabeauchamp · 2013-10-18T22:53:34Z

It's only conjugate for the unrestricted, general transition matrix.

schwancr · 2013-10-18T22:54:48Z

Ah ok. Will you add the prior behind the scenes? How does one specify the
strength of the prior?

On Fri, Oct 18, 2013 at 3:53 PM, kyleabeauchamp [email protected]:

It's only conjugate for the unrestricted, general transition matrix.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/187#issuecomment-26635859
.

kyleabeauchamp · 2013-10-18T22:54:58Z

But that does remind me--we should be using the analytic answer for the unrestricted case.

kyleabeauchamp · 2013-10-18T22:55:14Z

So I'll add an extra term $\alpha$ to control the prior.

schwancr · 2013-10-18T22:55:38Z

By unrestricted you mean unconstrained to be reversible?

On Fri, Oct 18, 2013 at 3:54 PM, kyleabeauchamp [email protected]:

But that does remind me--we should be using the analytic answer for the
unrestricted case.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/187#issuecomment-26635927
.

kyleabeauchamp · 2013-10-18T22:57:09Z

yeah, correct.

kyleabeauchamp · 2013-10-20T23:44:48Z

So I'm currently writing up my notes on this stuff into a PDF so that we can make sure we're calculating the right thing. Once I finish with my notes, I can then fix various code things and write the tests.

kyleabeauchamp · 2013-10-22T21:19:47Z

Still working on this. I'm currently trying to figure out a way to get a Dirchlet prior into the reversible TProb sampler. I think the approach I'm currently using probably won't work for that, as I'm essentially doing a coordinate transformation without paying attention to the Jacobian--which essentially means sampling using a slightly modified prior...

First mcmc code added

0953746

Now use Dirichlet RVs to sample TProb

a7313d7

Updated MCMC code

34ffd14

fixed docstrings

aa53d11

Added tests for MCMC

407d135

[WIP] MCMC sampling of rates and Tprobs #187

Are you sure you want to change the base?

[WIP] MCMC sampling of rates and Tprobs #187

Conversation

kyleabeauchamp commented Apr 26, 2013

kyleabeauchamp commented Apr 26, 2013

kyleabeauchamp commented Apr 26, 2013

kyleabeauchamp commented Apr 26, 2013

schwancr commented Apr 26, 2013

kyleabeauchamp commented Apr 26, 2013

kyleabeauchamp commented Apr 26, 2013

kyleabeauchamp commented Apr 26, 2013

schwancr commented Apr 26, 2013

kyleabeauchamp commented Apr 26, 2013

rmcgibbo commented Apr 26, 2013

It seems like we need ~1-2 minutes to converge for small matrices. I think the code is still worth having, as it's simple and easy to understand. Some error analysis is better than none...

kyleabeauchamp commented Apr 27, 2013

kyleabeauchamp commented Apr 29, 2013

kyleabeauchamp commented Apr 30, 2013

kyleabeauchamp commented May 6, 2013

kyleabeauchamp commented May 6, 2013

kyleabeauchamp commented May 6, 2013

kyleabeauchamp commented May 7, 2013

kyleabeauchamp commented May 14, 2013

rmcgibbo commented May 14, 2013

rmcgibbo commented May 28, 2013

kyleabeauchamp commented May 28, 2013

rmcgibbo commented May 28, 2013

kyleabeauchamp commented May 28, 2013

rmcgibbo commented May 28, 2013

kyleabeauchamp commented May 28, 2013

rmcgibbo commented May 28, 2013

kyleabeauchamp commented May 28, 2013

kyleabeauchamp commented Oct 18, 2013

kyleabeauchamp commented Oct 18, 2013

kyleabeauchamp commented Oct 18, 2013

kyleabeauchamp commented Oct 18, 2013

schwancr commented Oct 18, 2013

schwancr commented Oct 18, 2013

kyleabeauchamp commented Oct 18, 2013

schwancr commented Oct 18, 2013

schwancr commented Oct 18, 2013

kyleabeauchamp commented Oct 18, 2013

schwancr commented Oct 18, 2013

kyleabeauchamp commented Oct 18, 2013

kyleabeauchamp commented Oct 18, 2013

schwancr commented Oct 18, 2013

kyleabeauchamp commented Oct 18, 2013

kyleabeauchamp commented Oct 18, 2013

schwancr commented Oct 18, 2013

kyleabeauchamp commented Oct 18, 2013

kyleabeauchamp commented Oct 20, 2013

kyleabeauchamp commented Oct 22, 2013