You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During my experiment, I found that the training result varies. And it may heavily depend on how I do the weight initialization.
For Example, if you do kaiming_uniform for the update convolutional layer, the picture the CA generates will soon "blow up". In a few iterations every pixel in the picture will be NAN.
I doubt that this is because in the train_step function, the iteration steps are set in range(64, 96). So the gradient descent can't impact the weight timely. And if you choose the initial weight wrongly, you can't get a satisfied result.
So to avoid this, I suggest to do some "warm up" first. You can set the iteration steps in a very small range ((1, 9) for example) at first, and gradually raise it as you train the model.
I have made some tests and it seems that by doing "warm up" you can get ideal result regardless of your weight initialization method. What's more, this also speeds up the training.
This is like you first teach the model some simple characteristics of the picture(like color and the position), and later you give model more space to do some harder tasks(like the details of the picture).
The text was updated successfully, but these errors were encountered:
Hi, first thank you for your insightful work.
During my experiment, I found that the training result varies. And it may heavily depend on how I do the weight initialization.
For Example, if you do kaiming_uniform for the update convolutional layer, the picture the CA generates will soon "blow up". In a few iterations every pixel in the picture will be NAN.
I doubt that this is because in the
train_step
function, the iteration steps are set in range(64, 96). So the gradient descent can't impact the weight timely. And if you choose the initial weight wrongly, you can't get a satisfied result.So to avoid this, I suggest to do some "warm up" first. You can set the iteration steps in a very small range ((1, 9) for example) at first, and gradually raise it as you train the model.
I have made some tests and it seems that by doing "warm up" you can get ideal result regardless of your weight initialization method. What's more, this also speeds up the training.
This is like you first teach the model some simple characteristics of the picture(like color and the position), and later you give model more space to do some harder tasks(like the details of the picture).
The text was updated successfully, but these errors were encountered: