-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle multiple inplace update input output aliasing #7023
Conversation
@alanwaketan @wonjoolee95 I think this one is ready for review. |
@@ -2538,7 +2539,38 @@ void XLANativeFunctions::_propagate_xla_data(const at::Tensor& input, | |||
// 1) Aid XLA's InputOutputAlias. | |||
auto input_tensor = bridge::GetXlaTensor(input); | |||
auto output_tensor = bridge::GetXlaTensor(output); | |||
output_tensor->data()->alias_id = input_tensor->GetUniqueId(); | |||
if (input_tensor->CurrentDataHandle() != nullptr || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can always use alias_id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha that's what I thought but actually no. Look at my example down below
// x.tensor_id = 1, x.alias_id = 1
x = torch.randn(5,5).to(xla_device())
// x.tensor_id = 2, x.alias_id should be 1
x += 1
xm.mark_step()
// x.tensor_id =3, x.alias_id should be 2 since input tensor id will be 2
// for this graph
x *= 1
xm.mark_step()
if we always use alias_id
, the alias_id
of x
in second would be 1
, but we need it to be 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the second execution, input tensor id is 2
, we need the alias ID to always match the input tensor ID. In other world we should not carry alias_id
across mark_step
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit tricky, even the underlying buffer is aliased, we still create a new PjrtBuffer object for x
after the first mark_step
. That DeviceData object(wrap about pjrtbuffer) will have data_info
with tensor_id
2, since x
's tensor id is 2 after the first mark_step
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess resetting alias_id after mark_step is probably very complicated. This is more like a simplified way to achieve that. Assuming IR/outputs becomes DeviceData/inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can do that too(reset alias_id to tensor id after processed the input_output_alias info). That might make this code less confuse haha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good follow up, but feel free to skip it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Will this go into 2.4? Any chance it can be backported to 2.3? |
This will be part of the 2.4, we don't do dot releases so it is unlikely for this one to be in the 2.3 release. |
Fix the bug where if inplace operation is being applied multiple times, the aliasing won;t happened.
Consider the case without this pr
During the
mark_step
time we check that input buffer hastensor ID
1, and the outputalias id
is 2, hence it will skip donating input buffer of size (5,5).xla/torch_xla/csrc/xla_graph_executor.cpp
Lines 1249 to 1253 in d123585
xla/torch_xla/csrc/xla_graph_executor.cpp
Lines 1261 to 1269 in d123585
Alias ID should track the tensor ID of the input buffer, not the tensor ID of last base.