Faster and more stable training #12

YuhengHuang42 · 2020-11-18T04:04:04Z

Hi, first thank you for your insightful work.

During my experiment, I found that the training result varies. And it may heavily depend on how I do the weight initialization.

For Example, if you do kaiming_uniform for the update convolutional layer, the picture the CA generates will soon "blow up". In a few iterations every pixel in the picture will be NAN.

I doubt that this is because in the train_step function, the iteration steps are set in range(64, 96). So the gradient descent can't impact the weight timely. And if you choose the initial weight wrongly, you can't get a satisfied result.

So to avoid this, I suggest to do some "warm up" first. You can set the iteration steps in a very small range ((1, 9) for example) at first, and gradually raise it as you train the model.

I have made some tests and it seems that by doing "warm up" you can get ideal result regardless of your weight initialization method. What's more, this also speeds up the training.

This is like you first teach the model some simple characteristics of the picture(like color and the position), and later you give model more space to do some harder tasks(like the details of the picture).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster and more stable training #12

Faster and more stable training #12

YuhengHuang42 commented Nov 18, 2020 •

edited

Loading

Faster and more stable training #12

Faster and more stable training #12

Comments

YuhengHuang42 commented Nov 18, 2020 • edited Loading

YuhengHuang42 commented Nov 18, 2020 •

edited

Loading