Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 10 Adagrad text. #11

Open
avs20 opened this issue May 24, 2021 · 0 comments
Open

Chapter 10 Adagrad text. #11

avs20 opened this issue May 24, 2021 · 0 comments

Comments

@avs20
Copy link

avs20 commented May 24, 2021

In chapter 10 after the square root example the text is written as

Overall, the impact is the learning rates for parameters with smaller gradients are decreased slowly, while the parameters with larger gradients have their learning rates decreased faster

I am confused over this, we are not updating learning rate anywhere (other than rate decay). Yes the weights will be updated faster for parameters with bigger gradients but much slower than they would have if no normalization is used.

Am I correct or am I understanding it incorrectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant