Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core ATen Opset] Lower aten_full_like #5866

Closed
wonjoolee95 opened this issue Nov 21, 2023 · 4 comments
Closed

[Core ATen Opset] Lower aten_full_like #5866

wonjoolee95 opened this issue Nov 21, 2023 · 4 comments
Assignees
Labels

Comments

@wonjoolee95
Copy link
Collaborator

wonjoolee95 commented Nov 21, 2023

In order for PyTorch/XLA to support the PyTorch core ATen opset, it requires lowering each core ATen op in PyTorch/XLA. This issue is used to track the PyTorch/XLA lowering for aten_full_like.

Here are some general guidelines to lowering this op:

  • Uncomment @unittest.skip or @unittest.expectFailure and run the unit test at test_core_aten_ops.py. Eg: pytest test/test_core_aten_ops.py -k test_aten_full_0
  • Make code changes until the test passes. Read and follow fix_lowering_for_core_aten_ops.md for ideas to fix.
    • There may be multiple unit tests for a single op. For this op, the corresponding unit tests are:
      • test_aten_full_like_0
      • test_aten_full_like_1
      • test_aten_full_like_2
    • Please also uncomment the skips for all these tests and ensure all tests are fixed.
    • Note that sometimes the fix may be to fix the unit tests itself. Please take a look at the corresponding unit tests to make sure the tests are valid.
  • Submit the PR!

For any questions, feel free to leave a comment in this PR.

@danielvegamyhre
Copy link
Collaborator

danielvegamyhre commented Nov 28, 2023

@wonjoolee95 I lowered this in #5781 but it appears this new unit test for the op is failing, so I can take a look at this issue.

I made some progress on this by simply updating the args to match the expected required args of full(), namely a size (e.g. (10,10)) and fill_value (e.g. 0.123).

Now the arguments match those defined in the function signature, but there is a new error which is: AssertionError: Cannot find fake_mode attatched to the graph's placeholders. The call stack to the error begins at the call to torch.export.export(...) and ends with the error on this line.

I didn't know what this error meant, but I found some docs on Fake Tensors and Fake Tensor Modes here which seem related and read those, and think I understand why we use fake tensors for the exporting process now.

I then printed some metadata about each node in gm.graph.nodes and sure enough, none of the nodes with op type "placholder" have a corresponding node.meta["val"] key, which is what the _convert_input_to_fake function is looking for:

node: arg0
op: placeholder
meta:
{}

node: arg1
op: placeholder
meta:
{}

node: arg2
op: placeholder
meta:
{}

node: full
op: call_function
meta:
{'example_value': FakeTensor(..., size=(10, 10)),
 'from_node': [('full', <OpOverloadPacket(op='aten.full')>)],
 'seq_nr': -1,
 'source_fn_stack': [('full', <OpOverloadPacket(op='aten.full')>)],
 'stack_trace': '  File '
                '"/usr/local/lib/python3.8/site-packages/torch/_dynamo/external_utils.py", '
                'line 17, in inner\n'
                '    return fn(*args, **kwargs)\n'}

node: output
op: output
meta:
{}

This comment indicates the fake_mode comes from torch dynamo, and I see dynamo being utilized inside the _export_to_torch_ir function here, so I'm not sure why it isn't adding the fake mode for this particular op.

I also thought dynamo was for JIT compilation, and the term "export" seems to imply AOT compilation, so I'm confused about dynamo's role in this process.

Have you seen this error before, or have any insight into what could cause it?

Also just to check my understanding on the e2e export process: Dynamo doing the tracing using the CPython frame evaluation API to produce a FX graph of tensor operations, which is sent to the XLA bridge. The XLA bridge is then supposed to returns a graph of corresponding XLA operations. This graph of XLA ops can then be converted to Stable HLO and fed into the XLA compiler, which produces the actual machine code that can be executed on the target device. Is this correct?

@qihqi
Copy link
Collaborator

qihqi commented Dec 1, 2023

Hi Daniel,

For some reason i dont see the test test_aten_full_0 anymore. I can only find test_aten_full_like_*. What script did you run to get AssertionError: Cannot find fake_mode attatched to the graph's placeholders. Thanks!.

I also thought dynamo was for JIT compilation, and the term "export" seems to imply AOT compilation, so I'm confused about dynamo's role in this process.

So export actually uses dynamo. It calls dynamo and asserts no-graph-break. So if the JIT happens to JIT everything then that is equivalent to AOT compilation.

@wonjoolee95
Copy link
Collaborator Author

wonjoolee95 commented Dec 2, 2023

Hi, it seems like this issue may be mistakenly referring to the wrong aten op because it was using an older file -- https://raw.githubusercontent.com/pytorch/xla/5e63756c3438e0d25e32ba5dceac68d82d23993a/test/test_core_aten_ops.py.

This should really be test_aten_full_like. Let me manually update this issue for now and check if we have any other issues that we need to change/update.

Thanks!

@wonjoolee95 wonjoolee95 changed the title [Core ATen Opset] Lower aten_full [Core ATen Opset] Lower aten_full_like Dec 2, 2023
@wonjoolee95 wonjoolee95 assigned qihqi and wonjoolee95 and unassigned qihqi Jan 8, 2024
@wonjoolee95
Copy link
Collaborator Author

This is already passing, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

3 participants