-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fitting with emcee
works poorly
#601
Comments
I think that MCMC-methods like It's very well possible that the May I also recommend you send this to the mailinglist to solicit input from users who might use the |
@reneeotten OK, I basically agree with all of that, at least in principle. My main concern is that it is actually not working well. It seems like the main argument in favor of using I was just looking at the one nearly-trivial-but-heavily-cooked example in the
but then added
While they label "m*x+b" as "truth", they have also added an additional term to their model that is not linear in I can believe it is useful in some contexts. I am also certain that I do not ever intend to use this code, and cannot really envision ever recommending using this to anyone. |
My formal mathematics and statistics are not really up to the task of articulating this well, but it seems that you may have a misapprehension of the case for employing the MCMC algorithm. emcee and the MCMC algorithm itself are not solvers or optimizers, but rather a means of sampling a particular parameter space given a user-defined prior and target function. The benefit is not the ability of the algorithm to reveal a global optimum in parameter space, but the ability to estimate the parameter distributions themselves which provide a more rich context for understanding the model you are applying to your data. I think this has great utility, separate from extracting correctly weighted uncertainties. What may be missing is the mathematical formalism for correctly constructing the target function. Perhaps @dfm can make a comment here? A discussion has also been started here. |
MCMC algorithms are not optimizers and should not be used as optimizers. Best practices would be to run an optimizer first and then start off the MCMC algorithm around any high probability points you find. If you just start off emcee to randomly sample over some search space and want it to find the high probability areas it could take a VERY long time since that is not what MCMC is built for. I have seen systems where it is likely the sun will go out before it finds the high probability regions depending on the size of the search space. I have used emcee in my own research and it does work very well. For many problems to get a good distribution you might needs to run chains thousands to tens of thousands in length. There are even metrics like the integrated autocorrelation time to determine when you have enough samples. |
@PetMetz @Immudzen Thanks -- I think that we all basically agree that MCMC is really not designed for optimization itself. Here in We had a recent request for what I thought would be some minor changes with this This issue is completely within the context of lmfit: Can we support an optimizer based on Of course, MCMC has real merit in exploring parameter space and can be useful for understanding parameter uncertainties and correlations after a fit is complete. It turns out that we have other routines for such exploration. We are not opposed to supporting MCMC methods for doing this. Perhaps we could remove Given both of your assessments that "MCMC should not be used for optimization", is certainly curious that one of the few tutorial examples for To be clear, saying we may need to drop support for using |
@newville just to clarify, what do you mean by "does not work well". You mean that the |
@newville Perhaps the solution here, if support for MCMC methods within Without looking too closely at the
|
dear all, I do use It's worth considering though, as is brought up here, to move Finally, I will not have time myself to do any of this before the holidays and with the currently planned time-frame of releasing v1.0 ("within a few weeks"), it will probably not happen for that version. |
@Gabriel-p Well, most of all I mean that the @PetMetz yes, thanks - those are all fine approaches to think about. @reneeotten I agree: let's release 1.0 with no real code changes. Maybe we could document a warning about using At some time in the nebulous future, we can try to think about dropping the |
Hello, |
@reneeotten OK, I'm back to: why in the world are we trying to support this code? #781 still states "fitting with emcee". Ugh. I propose the removal of any code in |
Fitting with
emcee
works poorly. Poorly enough that claiming to have anemcee
"solver" that can be used to solve minimization problems is kind of a disservice to our users. I do not know if there is something wrong with the code that is new toemcee v3
, but I suspect this may have actually never worked that well.As a minimal, complete, example showing the problem, let's look at
https://lmfit.github.io/lmfit-py/examples/example_emcee_Model_interface.html?highlight=emcee
This shows
emcee
getting a solution: it looks like we're claiming that it works. And thecorner
plot is cool. But that solution is actually found by starting with the (much, much faster) solution fromNelder-Mead
. In fact, that solution is not actually very close to the right answer itself, andemcee
does not improve upon the problem stemming from the huge correlation in the double-exponential. If you change that example to haveemcee
start at the same starting point asNelder-Mead
, the results are much worse. And 100x slower.I'm perfectly willing to believe that the code in
lmfit
usingemcee
has some error that is making it not work as well as it should. I don't see anything obvious, but I'm definitely not an MCMC expert. Some of the code did change for #600 andemcee3
. So, suggestions for fixing the code would be most welcome.Having submitted #600, I can definitely say that "fitting with
emcee
" takes up quite a bit of code and has been a fair amount of maintenance. So the cost is definitely not zero, and the benefit seems vanishingly small to me. I never useemcee
. Withampgo
andbrute
available, I'm not sure anyone should use this.Is there a compelling reason to retain
emcee
inlmfit
? Can anyone provide an example of fitting withemcee
in which it finds a solution that is not derived from another solver or actually improves upon a previous fit result?And, yes, to be blunt and clear: I would propose we drop
emcee
unless it can be shown to give correct results.The text was updated successfully, but these errors were encountered: