Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

[WIP] MCMC sampling of rates and Tprobs #187

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

kyleabeauchamp
Copy link
Contributor

This is to give people a feel for the interface. At this point, mcmc.TransitionSampler and mcmc.RateSampler are working.

@kyleabeauchamp
Copy link
Contributor Author

Here is the driver code:

import pymc
import numpy as np
from msmbuilder import mcmc

num_states = 3
alpha = 1

C = np.ones((num_states,num_states))
C[0,0] = 1000

sampler = mcmc.TransitionSampler(C)
sampler.sample(50000,thin=20)

Ti = []
for T in sampler.iterate_T():
    Ti.append(T)

Ti = np.array(Ti)
T = Ti.mean(0)

sampler = mcmc.RateSampler(C)
sampler.sample(50000,thin=20)

Ki = []
for K in sampler.iterate_K():
    Ki.append(K)

Ki = np.array(Ki)
K = Ki.mean(0)

@kyleabeauchamp
Copy link
Contributor Author

The reason I use the iterate_K() function is that we have to handle two cases:

  1. The samples all fit in memory.
  2. The samples are too big for in-memory

@kyleabeauchamp
Copy link
Contributor Author

I'll add more comments and docstrings later. It might be useful for people to just ask questions about what isn't intuitive about the PyMC interface.

@schwancr
Copy link
Contributor

Looks cool. What is the 'thin' kwarg supposed to do?

Will this only work for dense matrices? Do you think this is going to be possible for a 10,000 state model?

@kyleabeauchamp
Copy link
Contributor Author

thin is how much to subsample the MCMC samples to reduce correlation.

The current implementation will only work for dense models. I have ideas about how to scale things up better, but it's definetely going to be a challenge. I think there are open questions about what priors to use etc.

For example, transition matrices are never sparse, so we shouldn't really be assuming sparsity during estimation. This leads to huge problem with 10,000 state models.

I think things might be better for rate matrices, but then we have the issue of matrix exponentiation.

@kyleabeauchamp
Copy link
Contributor Author

Also: the current code uses an Uninformative prior, not a Dirichlet prior. I'm still figuring out the best way to do a Dirichlet prior. The issue is that I need to "stack" a bunch of Dirichlet random variables, but PyMC doesn't have a very elegant way to do this.

I think I might end up just using an Uninformative prior and throwing the prior on as an external "potential".

@kyleabeauchamp
Copy link
Contributor Author

Speed is definitely an issue here--these chains are super slow to converge...

@schwancr
Copy link
Contributor

How slow do you mean?

Waiting a day, probably isn't a big deal. Can you make a parallel version?

@kyleabeauchamp
Copy link
Contributor Author

It seems like we need ~1-2 minutes to converge for small matrices. I think the code is still worth having, as it's simple and easy to understand. Some error analysis is better than none...

@rmcgibbo
Copy link
Contributor

Agreed. 

-Robert
Sent from my iPhone.

On Fri, Apr 26, 2013 at 3:31 PM, kyleabeauchamp [email protected]
wrote:

It seems like we need ~1-2 minutes to converge for small matrices. I think the code is still worth having, as it's simple and easy to understand. Some error analysis is better than none...

Reply to this email directly or view it on GitHub:
#187 (comment)

@kyleabeauchamp
Copy link
Contributor Author

So for estimating rates, it might be natural to use an exponential distribution as a prior. This should place maximum prior weight at Kij = 0, inducing sparsity in the rate matrix.

We would then have a single hyperparameter, $\lambda$, that would be shared for all the rates. We could estimate $\lambda$ by cross validation.

I can't remember if I've ever seen this in the literature, but it seems like the right thing to do.

@kyleabeauchamp
Copy link
Contributor Author

I just checked in a revised codebase that appears to be working. Here is a driver program that checks all 4 types of model:

import pymc
import numpy as np
from msmbuilder import mcmc, MSMLib
import scipy.linalg

num_states = 2
alpha = 1

C = 100 * np.ones((num_states,num_states))
C[0,0] *= 1000
C[0,1] += 1

T0 = MSMLib.estimate_transition_matrix(C)
X0 = MSMLib.mle_reversible_count_matrix(scipy.sparse.csr_matrix(C)).toarray()
T0_rev = MSMLib.estimate_transition_matrix(X0)

sampler = mcmc.TransitionSampler(C)
sampler.sample(1000000,thin=500)
T = sampler.mcmc.trace("T")[:].mean(0)
T_sigma = sampler.mcmc.trace("T")[:].std(0)

sampler = mcmc.RateSampler(C)
sampler.sample(1000000,thin=500)
T_K = sampler.mcmc.trace("T")[:].mean(0)
T_K_sigma = sampler.mcmc.trace("T")[:].std(0)


sampler = mcmc.ReversibleTransitionSampler(C)
sampler.sample(1000000,thin=500)
T_rev = sampler.mcmc.trace("T")[:].mean(0)
T_rev_sigma = sampler.mcmc.trace("T")[:].std(0)

sampler = mcmc.ReversibleRateSampler(C)
sampler.sample(1000000,thin=500)
T_K_rev = sampler.mcmc.trace("T")[:].mean(0)
T_K_rev_sigma = sampler.mcmc.trace("T")[:].std(0)

So I'm finding some differences between the different models. The differences are most obvious in the few-count regime, as we would expect. Thus, we have an interesting question: what is the right answer? I suspect there might not be a "right" answer.

@kyleabeauchamp
Copy link
Contributor Author

Hm, there might still be something funny with the reversible rate sampler. I'm looking into it.

@kyleabeauchamp
Copy link
Contributor Author

I think I fixed the issue with the ReversibleRateSampler. I needed to discard the early samples (burn-in). I added this feature and now I think things are working much better.

I also added tests, which are disabled by default due to slowness.

@kyleabeauchamp
Copy link
Contributor Author

Here is an updated driver program:

import numpy as np
import scipy.sparse
from msmbuilder import mcmc, MSMLib

num_states = 2
alpha = 1

num_steps = 1000000
thin = 500
burn = 50000

C = 1000000 * np.ones((num_states,num_states))
C[0,0] *= 1000
C[0,1] += 1

T0 = MSMLib.estimate_transition_matrix(C)
X0 = MSMLib.mle_reversible_count_matrix(scipy.sparse.csr_matrix(C)).toarray()
T0_rev = MSMLib.estimate_transition_matrix(X0)

sampler = mcmc.TransitionSampler(C)
sampler.sample(num_steps,thin=thin, burn=burn)
T = sampler.mcmc.trace("T")[:].mean(0)
T_sigma = sampler.mcmc.trace("T")[:].std(0)

sampler = mcmc.RateSampler(C)
sampler.sample(num_steps,thin=thin, burn=burn)
T_K = sampler.mcmc.trace("T")[:].mean(0)
T_K_sigma = sampler.mcmc.trace("T")[:].std(0)


sampler = mcmc.ReversibleTransitionSampler(C)
sampler.sample(num_steps,thin=thin, burn=burn)
T_rev = sampler.mcmc.trace("T")[:].mean(0)
T_rev_sigma = sampler.mcmc.trace("T")[:].std(0)

sampler = mcmc.ReversibleRateSampler(C)
sampler.sample(num_steps,thin=thin, burn=burn)
T_K_rev = sampler.mcmc.trace("T")[:].mean(0)
T_K_rev_sigma = sampler.mcmc.trace("T")[:].std(0)


T0 - T
T0 - T_K
T0_rev - T_rev
T0_rev - T_K_rev

@kyleabeauchamp
Copy link
Contributor Author

I think this is almost ready, so people can comment on what else they want to see.

@kyleabeauchamp
Copy link
Contributor Author

Here are some things that I didn't implement:

Estimation with a fixed value of equilibrium populations.
Sparse estimation.
Estimation with a fixed sparsity structure.

These are considerably more difficult, but do-able. We can implement them as people find it necessary.

@kyleabeauchamp
Copy link
Contributor Author

So I have been experimenting with NumExpr for speeding up likelihood calculations (this is for my ensemble stuff, but it would also apply here and to MLE code in MSMB). For expressions involving vectorized logarithms and algebra, it's trivial to get a ~3X speedup by just throwing in something like


@rmcgibbo
Copy link
Contributor

Sweet. numpexpr is already a dependency, via tables, so this is all upside.

-Robert

On May 14, 2013, at 1:44 PM, kyleabeauchamp wrote:

So I have been experimenting with NumExpr for speeding up likelihood calculations (this is for my ensemble stuff, but it would also apply here and to MLE code in MSMB). For expressions involving vectorized logarithms and algebra, it's trivial to get a ~3X speedup by just throwing in something like


—
Reply to this email directly or view it on GitHub.

@rmcgibbo
Copy link
Contributor

@kyleabeauchamp, is this identical to what's proposed in J.C.P 128, 244103 (2008). If not, what are the differences?

@kyleabeauchamp
Copy link
Contributor Author

Similar, but not identical. I think pymc uses normally distributed proposal steps, with a proposal scale that is tuned to optimize the acceptance rate.

In Noe's paper, he analytically calculates a bunch of proposal distributions.

For a model that is on the boundary of the feasible space, I bet Frank's stuff works better. For models in the interior, I bet ours might be better. I'm not sure--I'm not an expert.

I think the real value in my method is simplicity. All I had to do was make find a parameterization with the correct restraints (detailed balance, normalization, rate normalization, etc), but pymc does the rest.

We could possibly get performance enhancements by using a Gibbs sampling type approach, where we manually work out how we propose samples. That would be more work...

@rmcgibbo
Copy link
Contributor

But it's sampling the same distributions. The difference, you're saying, is just efficiency?

@kyleabeauchamp
Copy link
Contributor Author

Well, there's also the question of prior.

For a general transition matrix, the dirichlet prior is pretty standard. I don't think it's as clear-cut how to parameterize the prior for constrained models or for rate models.

@rmcgibbo
Copy link
Contributor

Fair.

On May 27, 2013, at 8:34 PM, kyleabeauchamp wrote:

Well, there's also the question of prior.

For a general transition matrix, the dirichlet prior is pretty standard. I don't think it's as clear-cut how to parameterize the prior for constrained models or for rate models.


Reply to this email directly or view it on GitHub.

@kyleabeauchamp
Copy link
Contributor Author

I think there's a lot that could be done to generalize and accelerate this code, but I think it might be wise to wait to see how we plan to build models in the future. In particular, large sparse models are going to be a pain in the ass. If we can avoid them using some sort of TICA trick, then we can really use MCMC throughout the pipeline.

@rmcgibbo
Copy link
Contributor

Yeah, I'm okay with slow code really. I think we want flexibility and generality first. We can deal with speed later.

-Robert

On May 27, 2013, at 8:44 PM, kyleabeauchamp wrote:

I think there's a lot that could be done to generalize and accelerate this code, but I think it might be wise to wait to see how we plan to build models in the future. In particular, large sparse models are going to be a pain in the ass. If we can avoid them using some sort of TICA trick, then we can really use MCMC throughout the pipeline.


Reply to this email directly or view it on GitHub.

@kyleabeauchamp
Copy link
Contributor Author

Regarding speed, there are several things that might be rate limilting, depending on what regime we work in:

  1. Large numbers of pymc Potential function calls (fix with Cython?)
  2. Large number of exp() calls with small inputs (fix with cython)
  3. exp() calls with large inputs (fix with numexpr)
  4. Dense or sparse matrix math.

@kyleabeauchamp
Copy link
Contributor Author

OK so it appears that I will likely need to rewrite this code to better handle NANs from nonzero counts.

@kyleabeauchamp
Copy link
Contributor Author

zero counts, rather.

@kyleabeauchamp
Copy link
Contributor Author

Argh. So I think there's a bug in pymc that won't let you initialize a Dirichlet RV at a boundary point. If you have alpha=1 (e.g. uniform), then that point is perfectly valid. The point is that we therefore cannot use the unregularized MLE as the starting point for MCMC.

@kyleabeauchamp
Copy link
Contributor Author

@schwancr Do you have a preference for which type of estimator you want? Rate or TProb? General or reversible?

Also, how many states (and how sparse). This will help me design tests.

@schwancr
Copy link
Contributor

So these issues aren't a problem if you use a prior, right?

I mean it would be nice to be able to do this for microstate models (thousands of states), but I'm ok if we can only do macrostate models (dense only).

@schwancr
Copy link
Contributor

I've been using tprob, but it doesn't hurt to have both

@kyleabeauchamp
Copy link
Contributor Author

So I'm going to assert that microstate models are totally infeasible with the current tool. I'd even guess that we have a limit of ~10 states or so...

@schwancr
Copy link
Contributor

Yea that's fine. I did get an answer with a 200 state model, by the way.
so we might be able to go further than 10

On Fri, Oct 18, 2013 at 3:36 PM, kyleabeauchamp [email protected]:

So I'm going to assert that microstate models are totally infeasible with
the current tool. I'd even guess that we have a limit of ~10 states or so...


Reply to this email directly or view it on GitHubhttps://github.com//pull/187#issuecomment-26635075
.

@schwancr
Copy link
Contributor

I think this serves as a good start to what will become an msmbuilder.error
module.

We can add bootstrapping to this, and maybe implement nina's method, and
when we come up with something better, then it can be added easily.

On Fri, Oct 18, 2013 at 3:37 PM, Christian Schwantes
[email protected]:

Yea that's fine. I did get an answer with a 200 state model, by the way.
so we might be able to go further than 10

On Fri, Oct 18, 2013 at 3:36 PM, kyleabeauchamp [email protected]:

So I'm going to assert that microstate models are totally infeasible with
the current tool. I'd even guess that we have a limit of ~10 states or so...


Reply to this email directly or view it on GitHubhttps://github.com//pull/187#issuecomment-26635075
.

@kyleabeauchamp
Copy link
Contributor Author

So I think I should probably throw a Dirichlet prior on the Reversible transtion matrix as well. The idea is to just use a potential to add the Dirichlet likelihood term.

@schwancr
Copy link
Contributor

Sure, is that the same as feeding in a count matrix with pseudocounts
everywhere?

On Fri, Oct 18, 2013 at 3:50 PM, kyleabeauchamp [email protected]:

So I think I should probably throw a Dirichlet prior on the Reversible
transtion matrix as well. The idea is to just use a potential to add the
Dirichlet likelihood term.


Reply to this email directly or view it on GitHubhttps://github.com//pull/187#issuecomment-26635685
.

@kyleabeauchamp
Copy link
Contributor Author

I don't think they're exactly the same for the case of reversible matrices--the integrals no longer work out exactly because the domain is restricted, so there's no analytic formula for the posterior.

@kyleabeauchamp
Copy link
Contributor Author

It's only conjugate for the unrestricted, general transition matrix.

@schwancr
Copy link
Contributor

Ah ok. Will you add the prior behind the scenes? How does one specify the
strength of the prior?

On Fri, Oct 18, 2013 at 3:53 PM, kyleabeauchamp [email protected]:

It's only conjugate for the unrestricted, general transition matrix.


Reply to this email directly or view it on GitHubhttps://github.com//pull/187#issuecomment-26635859
.

@kyleabeauchamp
Copy link
Contributor Author

But that does remind me--we should be using the analytic answer for the unrestricted case.

@kyleabeauchamp
Copy link
Contributor Author

So I'll add an extra term $\alpha$ to control the prior.

@schwancr
Copy link
Contributor

By unrestricted you mean unconstrained to be reversible?

On Fri, Oct 18, 2013 at 3:54 PM, kyleabeauchamp [email protected]:

But that does remind me--we should be using the analytic answer for the
unrestricted case.


Reply to this email directly or view it on GitHubhttps://github.com//pull/187#issuecomment-26635927
.

@kyleabeauchamp
Copy link
Contributor Author

yeah, correct.

@kyleabeauchamp
Copy link
Contributor Author

So I'm currently writing up my notes on this stuff into a PDF so that we can make sure we're calculating the right thing. Once I finish with my notes, I can then fix various code things and write the tests.

@kyleabeauchamp
Copy link
Contributor Author

Still working on this. I'm currently trying to figure out a way to get a Dirchlet prior into the reversible TProb sampler. I think the approach I'm currently using probably won't work for that, as I'm essentially doing a coordinate transformation without paying attention to the Jacobian--which essentially means sampling using a slightly modified prior...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants