Fix zero gradient for subtensor assignment. #127

bartvm · 2016-05-27T21:38:38Z

@alexbw Partially fixes #126 and adds a unit test. It gives the correct answer in direct mode now, but still crashes in optimized mode.

alexbw · 2016-05-28T15:35:03Z

Will merge when Assignment test passes on CI

On Fri, May 27, 2016 at 5:38 PM Bart van Merriënboer <
[email protected]> wrote:

@alexbw https://github.com/alexbw Partially fixes #126
#126 and adds a unit
test. It gives the correct answer in direct mode now, but still crashes in

optimized mode.

You can view, comment on, or merge this pull request online at:

#127
Commit Summary

Fix zero gradient for subtensor assignment.

File Changes

M src/gradfuns.lua
https://github.com/twitter/torch-autograd/pull/127/files#diff-0 (7)

M test/test.lua
https://github.com/twitter/torch-autograd/pull/127/files#diff-1 (5)

Patch Links:

https://github.com/twitter/torch-autograd/pull/127.patch

https://github.com/twitter/torch-autograd/pull/127.diff

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#127, or mute the thread
https://github.com/notifications/unsubscribe/AAJ4jzSbpB_VT7x5bzVhKIpXaZRAPPBNks5qF2RegaJpZM4Io4la
.

bartvm · 2016-05-29T16:36:25Z

The test that I added fails on the current master as well, it's a bug somewhere in optimized mode. I'd have to get a bit more familiar with the details of optimized mode before I can fix that.

bartvm · 2016-07-11T17:53:56Z

Maybe someone with better knowledge of the optimized mode internals can understand what is going on. The problem seems to be this line in Node.lua which specifically deals with Value.__internal_set:

mutationFlow:alias(self.inputs[1], valueAlias)

Unlike valueAlias = Value.from(outputs[1], Source.computed(self, 1)), self.inputs[1] here is not transformed into a Value instance. This line in runtime/codegen/backend/lua/init.lua takes issue with that:

aliasOp = graph.mutationFlow.history[i]
addNodeTargets(aliasOp.from.source.node, hazardNodes)

aliasOp.from.source doesn't have a node, it's simply a table that looks like this:

from    {
  type : "tensor"
  source :
    {
      type : "table"
      parent :
        {
          type : "param"
          name : 1
        }
      key : "x"
    }
  raw : DoubleTensor - size: 10x10
}

The variable that is being assigned has its gradient correctly calculated (g[k]) but later on when the gradient of the variable being assigned to is calculated g[k] is set to 0. This gives the correct gradient for the variable being assigned to, but because it shares the same storage it actually overrides the earlier gradient incorrectly to zero. This fixes that.

bartvm · 2016-07-13T17:12:35Z

Because I need this for something else as well now, another stab at it. I'm using torch.clone to sidestep the issue of params.x not being a node (as was done in the other tests).

However, now the test for optimized mode fails for another reason: The gradient gets computed correctly the first time grad(f) is called, but when it is called a second time it overwrites the gradient in place and sets it to zero. This surfaced accidentally because gradcheck happens to call grad(f) twice to check for determinism. I made the check more explicit, but I haven't figured out how to fix it yet.

bartvm · 2016-07-13T20:58:02Z

@luketwitter @alexbw Although I can't figure out the current bug in optimized mode triggered by the new unit test I added, can I propose at least merging my changes to src/gradfuns.lua? That way the gradients for index assignment in direct mode aren't silently set to zero anymore, which is a really nasty bug because it can make models completely fail to train without giving a single error.

I made a new PR for this: #139

CLAassistant · 2019-07-18T15:15:48Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

bartvm force-pushed the subtensor branch from 0d31b91 to 22b229a Compare July 13, 2016 17:09

Add overwriting test, fix indexing test

1fa109a

bartvm force-pushed the subtensor branch from 22b229a to 1fa109a Compare July 13, 2016 17:09

bartvm mentioned this pull request Jul 13, 2016

Fix wrong zero gradient for assignment in direct mode #139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix zero gradient for subtensor assignment. #127

Fix zero gradient for subtensor assignment. #127

bartvm commented May 27, 2016

alexbw commented May 28, 2016

optimized mode.

bartvm commented May 29, 2016

bartvm commented Jul 11, 2016 •

edited

Loading

bartvm commented Jul 13, 2016

bartvm commented Jul 13, 2016

CLAassistant commented Jul 18, 2019 •

edited

Loading

Fix zero gradient for subtensor assignment. #127

Are you sure you want to change the base?

Fix zero gradient for subtensor assignment. #127

Conversation

bartvm commented May 27, 2016

alexbw commented May 28, 2016

optimized mode.

bartvm commented May 29, 2016

bartvm commented Jul 11, 2016 • edited Loading

bartvm commented Jul 13, 2016

bartvm commented Jul 13, 2016

CLAassistant commented Jul 18, 2019 • edited Loading

bartvm commented Jul 11, 2016 •

edited

Loading

CLAassistant commented Jul 18, 2019 •

edited

Loading