Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster and more stable training #12

Open
YuhengHuang42 opened this issue Nov 18, 2020 · 0 comments
Open

Faster and more stable training #12

YuhengHuang42 opened this issue Nov 18, 2020 · 0 comments

Comments

@YuhengHuang42
Copy link

YuhengHuang42 commented Nov 18, 2020

Hi, first thank you for your insightful work.

During my experiment, I found that the training result varies. And it may heavily depend on how I do the weight initialization.

For Example, if you do kaiming_uniform for the update convolutional layer, the picture the CA generates will soon "blow up". In a few iterations every pixel in the picture will be NAN.

I doubt that this is because in the train_step function, the iteration steps are set in range(64, 96). So the gradient descent can't impact the weight timely. And if you choose the initial weight wrongly, you can't get a satisfied result.

So to avoid this, I suggest to do some "warm up" first. You can set the iteration steps in a very small range ((1, 9) for example) at first, and gradually raise it as you train the model.

I have made some tests and it seems that by doing "warm up" you can get ideal result regardless of your weight initialization method. What's more, this also speeds up the training.

This is like you first teach the model some simple characteristics of the picture(like color and the position), and later you give model more space to do some harder tasks(like the details of the picture).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant