-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support vectorize/devectorize inside gradients #1533
Comments
I know that I can use y = ~VEC[1 1] |> vectorize(:bar)
Foo.f_and_grad(x, y) to get the result I expect, but in practice |
I think this makes sense because the grad is computed over y, but I would like to see if @polvalente has a different opinion. |
I tried checking if it would still be efficient to broadcast x = ~VEC[0 0] |> vectorize(:foo)
y = ~VEC[1]
[x, y] = Nx.broadcast_vectors([x, y])
y |> Nx.byte_size()
# 16
# if another elements are added to `x`, evaluates to 24, etc. So I still would be interested if there is a way to get the non-summed gradient, although I understand if it's not possible with this API. |
I agree with @jyc in that the grad should have the same vector shape as the output. That is, the correct result for the example should be The mental model I have is that |
Memory-wise, vectorization will end up doing the explicit broadcasting, if applicable, regardless of the backend (although some backends might end up fusing things). |
defmodule Foo do
import Nx.Defn
defn f(x, y) do
x + y
end
defn f_and_grad(x, y) do
value_and_grad(y, fn y -> Foo.f(x, y) end)
end
end
x = ~VEC[0 1 2]
y = ~VEC[1]
Foo.f_and_grad(x, y)
# {~VEC[1, 2, 3], ~VEC[3]}
|
Actually, I have confused myself! I don't believe it's a red herring because it's the other axis that is vectorized. I misunderstood. Please ignore my last comment, sorry for the noise. In other words, I agree with your comment here:
|
The problem here is that for that x = Nx.tensor([0, 1, 2])
y = 1
{_, grad0} = Foo.f_and_grad(x[0], y)
{_, grad1} = Foo.f_and_grad(x[1], y)
{_, grad2} = Foo.f_and_grad(x[2], y)
expected_result = Nx.stack([grad0, grad1, grad2]) |> Nx.vectorize(:foo)
actual_result = Foo.f_and_grad(Nx.vectorize(x, :foo), y) iex(19)> expected_result = Nx.stack([grad0, grad1, grad2]) |> Nx.vectorize(:foo)
#Nx.Tensor<
vectorized[foo: 3]
f32
[1.0, 1.0, 1.0]
>
iex(20)>
nil
iex(21)> actual_result = Foo.f_and_grad(Nx.vectorize(x, :foo), y)
{#Nx.Tensor<
vectorized[foo: 3]
s32
[1, 2, 3]
>,
#Nx.Tensor<
f32
3.0
>} |
You are right! Sorry for the noise. |
Reopening because we still need to support vectorize/devectorize inside the gradient. :) |
Thanks for making Nx!
I tried to use
value_and_grad
on a function that takes two inputs: a vectorized tensor and a non-vectorized tensor.This evaluates to:
The value is correct and maintains the vectorized axis of the vectorized input to
x
, but the gradient surprises me. I would have expected a vectorized tensor rank-1 dimension-2 vector with the same:foo
axis and which is everywhere1
; it looks like instead Nx is summing up the two gradients.Is this behavior expected? If so, is there any way to make Nx return a vectorized gradient?
Thanks!
The text was updated successfully, but these errors were encountered: