-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential bug in lbann.Scale #2433
Comments
From the log file, it seems we have vanishing gradients, based on the fact that the loss is barely changing. I should mention including |
Reproduced. Thanks for pointing this out. We'll get this fixed very soon. In the meantime, if memory isn't an issue, you can use |
I've traced this back to an issue with in-place operator layers. Still working on a fix. In the meantime, you can set |
Ok thanks, I added |
Quick question, how much extra overhead do you expect the environment variable |
@jvwilliams23 I think this issue should have been fixed by #2442. Can you confirm? To answer your other question about |
Will check the fix tomorrow morning (UK time).. Just a quick comment regarding the performance issue I mentioned, I don't think it is the in-place. I profiled my job, see discussion #2438. |
Working now on |
Hi,
After a series of turning things on/off in my code, I seem to have found that lbann.Scale is giving some problems or at least what I would call unexpected behaviour (even when using
x = lbann.Scale(x, constant=1)
. I tested this also on the lenet example to get a MWE for sharing.Output from
lenet.py
(unmodified)Output from modified
lenet.py
with lbann.ScaleHere is the modified code, with some annotations:
At first when applying a
constant
!= 1, I thought perhaps this was only modifying the forward pass and not being back-propagated, so the gradients were not right. But, I would have expected that scaling the activations by 1 would have no effect at all on the training. So I have no idea where this issue lies (or is this behaviour expected?)Best wishes,
Josh
The text was updated successfully, but these errors were encountered: