-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradient descent fails to converge (to the right result). #45
Comments
Do you have any reason to think that it isn't converging to the minimum? What you show is that in 10000 iterations it hasn't made much progress. But it does seem to be going in the right direction, albeit very slowly! Seriously, this |
I didn't realize that :-/ I've been trying to make serious use of it. |
@barak : Do you know what the best way to calculate |
Sorry. I blame ... not sure but I'll think of someone. As to a better routine, the present conjugate gradient routine looks reasonable for a high dimensional system, although the line search could use a bit of robustness-enhancing love. And it works well here, moves rapidly to a point close to the optimum:
For something low dimensional like the test problem you give, a simple Newton's method would also make sense. Or for robustness one with trust regions, or for something of intermediate dimensionality, BFGS or Levenberg-Marquardt. There is a curated collection of optimization routines in http://lmfit.github.io/lmfit-py/ and I think it would make a lot of sense to translate them from Python to Haskell, using the |
I've found this library optimization which seems designed to fit with |
On this topic, what properties hold for conjugate descent? I'm noticing the progression is actually ascending, not descending. This is for a parabolic function with a log barrier so I'm guessing there is an issue with infinity.
Notice the lack of descent:
|
As a matter of update / hijacking of this issue. The above objective function, Perhaps I should send a pull request indicating
Thoughts? |
Consider the function,
It's a smooth convex function and gradient descent should converge to
[m,b] = [2.911525576,7196.512447]
. See Wolfram Alpha.Instead,
[6.503543385615585,1.0522541316691529e-3]
[4.074543700217947,1.0251230667415617e-3]
[4.028579379271611,4.621971888645496e-3]
[4.02857388945333,4.100383031651195e-2]
The text was updated successfully, but these errors were encountered: