Skip to content
This repository has been archived by the owner on Nov 1, 2021. It is now read-only.

Fix zero gradient for subtensor assignment. #127

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bartvm
Copy link
Contributor

@bartvm bartvm commented May 27, 2016

@alexbw Partially fixes #126 and adds a unit test. It gives the correct answer in direct mode now, but still crashes in optimized mode.

@alexbw
Copy link
Collaborator

alexbw commented May 28, 2016

Will merge when Assignment test passes on CI

On Fri, May 27, 2016 at 5:38 PM Bart van Merriënboer <
[email protected]> wrote:

@alexbw https://github.com/alexbw Partially fixes #126
#126 and adds a unit
test. It gives the correct answer in direct mode now, but still crashes in

optimized mode.

You can view, comment on, or merge this pull request online at:

#127
Commit Summary

  • Fix zero gradient for subtensor assignment.

File Changes

Patch Links:


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#127, or mute the thread
https://github.com/notifications/unsubscribe/AAJ4jzSbpB_VT7x5bzVhKIpXaZRAPPBNks5qF2RegaJpZM4Io4la
.

@bartvm
Copy link
Contributor Author

bartvm commented May 29, 2016

The test that I added fails on the current master as well, it's a bug somewhere in optimized mode. I'd have to get a bit more familiar with the details of optimized mode before I can fix that.

@bartvm
Copy link
Contributor Author

bartvm commented Jul 11, 2016

Maybe someone with better knowledge of the optimized mode internals can understand what is going on. The problem seems to be this line in Node.lua which specifically deals with Value.__internal_set:

mutationFlow:alias(self.inputs[1], valueAlias)

Unlike valueAlias = Value.from(outputs[1], Source.computed(self, 1)), self.inputs[1] here is not transformed into a Value instance. This line in runtime/codegen/backend/lua/init.lua takes issue with that:

aliasOp = graph.mutationFlow.history[i]
addNodeTargets(aliasOp.from.source.node, hazardNodes)

aliasOp.from.source doesn't have a node, it's simply a table that looks like this:

from    {
  type : "tensor"
  source :
    {
      type : "table"
      parent :
        {
          type : "param"
          name : 1
        }
      key : "x"
    }
  raw : DoubleTensor - size: 10x10
}

The variable that is being assigned has its gradient correctly
calculated (g[k]) but later on when the gradient of the variable being
assigned to is calculated g[k] is set to 0. This gives the correct
gradient for the variable being assigned to, but because it shares the
same storage it actually overrides the earlier gradient incorrectly to
zero. This fixes that.
@bartvm
Copy link
Contributor Author

bartvm commented Jul 13, 2016

Because I need this for something else as well now, another stab at it. I'm using torch.clone to sidestep the issue of params.x not being a node (as was done in the other tests).

However, now the test for optimized mode fails for another reason: The gradient gets computed correctly the first time grad(f) is called, but when it is called a second time it overwrites the gradient in place and sets it to zero. This surfaced accidentally because gradcheck happens to call grad(f) twice to check for determinism. I made the check more explicit, but I haven't figured out how to fix it yet.

@bartvm
Copy link
Contributor Author

bartvm commented Jul 13, 2016

@luketwitter @alexbw Although I can't figure out the current bug in optimized mode triggered by the new unit test I added, can I propose at least merging my changes to src/gradfuns.lua? That way the gradients for index assignment in direct mode aren't silently set to zero anymore, which is a really nasty bug because it can make models completely fail to train without giving a single error.

I made a new PR for this: #139

@CLAassistant
Copy link

CLAassistant commented Jul 18, 2019

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gradient direct assignment is missing
3 participants