-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
self.get_G_wrt_shared得到的grads一直为0 #6
Comments
您好,好像看不到您上传的图片 |
使用MTL-ALigned的时候,求梯度的时候,比如torch.autograd.grad(fusion_loss, list(fusion.parameters()),loss对模型参数的梯度一直是0。 |
请问您能提供更详细的调试代码截图吗 |
def after_train_iter(self, runner): |
哈喽,请问您解决了嘛?我也有这个问题了....训练到1000对齐的时候cuda out of memory了.... |
我也遇到这个问题,我是A10040G显卡。应该说这个网络本身加上fusionnet就增加了训练时显存,在加上GMTA操作更加增加了在训练时的显存消耗。当我不使用GMTA 操作并且将将batch降低为2时,发现网络性能并没有降低太多,相比论文大概也就降低了两个点不到,但是如果真的想要复现完整的网络,即使是40G的显存也不够用,可能只有像作者那样的80G显存才能完全复现。 |
您好,想请问下梯度更新相关的代码是怎么被调用的,能力受限,找了好久都没找到调用GMTA的代码。每次调试到forward_train之后程序就不往下走了 |
The text was updated successfully, but these errors were encountered: