Export debug information to StableHLO #7014

thong3le · 2024-05-01T21:27:11Z

❓ Questions and Help

Hi team, the debugging information is lost during exported_program_to_stablehlo, is there a way to export this information?

For example, torch.export generates file and line number for each op,

import torch
import torch.nn as nn
from torch_xla.stablehlo import exported_program_to_stablehlo

class Test(nn.Module):
    def forward(self, a, b):
        a += 1
        b += 2
        return a + b

ep = torch.export.export(Test(), (torch.randn(1, 5), torch.randn(1, 5)))
print(ep)
# ExportedProgram:
#     class GraphModule(torch.nn.Module):
#         def forward(self, arg0_1: "f32[1, 5]", arg1_1: "f32[1, 5]"):
#             # File: /home/thonle/ai/data/stablehlo/add/add.py:7 in forward, code: a += 1
#             add: "f32[1, 5]" = torch.ops.aten.add.Tensor(arg0_1, 1);  arg0_1 = None
            
#             # File: /home/thonle/ai/data/stablehlo/add/add.py:8 in forward, code: b += 2
#             add_1: "f32[1, 5]" = torch.ops.aten.add.Tensor(arg1_1, 2);  arg1_1 = None
            
#             # File: /home/thonle/ai/data/stablehlo/add/add.py:9 in forward, code: return a + b
#             add_2: "f32[1, 5]" = torch.ops.aten.add.Tensor(add, add_1)
#             return (add, add_1, add_2)

however, when we export to stablehlo, we couldn't find this information in StableHLOModelBundle.

om = exported_program_to_stablehlo(ep)
print(om._bundle)

# StableHLOModelBundle(state_dict={}, additional_constants=[array(2., dtype=float32)], stablehlo_funcs=[StableHLOFunc(meta=StableHLOFunctionMeta(name='forward', stablehlo_version='0.0.0', input_signature=[VariableSignature(shape=[1, 5], dtype='float32', dynamic_dims=[]), VariableSignature(shape=[], dtype='float32', dynamic_dims=[]), VariableSignature(shape=[1, 5], dtype='float32', dynamic_dims=[])], output_signature=[VariableSignature(shape=[1, 5], dtype='float32', dynamic_dims=[]), VariableSignature(shape=[1, 5], dtype='float32', dynamic_dims=[]), VariableSignature(shape=[1, 5], dtype='float32', dynamic_dims=[])], input_locations=[InputLocation(type_=<VariableType.INPUT_ARG: 'input_arg'>, position=0, name=''), InputLocation(type_=<VariableType.CONSTANT: 'constant'>, position=0, name=''), InputLocation(type_=<VariableType.INPUT_ARG: 'input_arg'>, position=1, name='')], unused_inputs=[], input_pytree_spec='[1, {"type": "builtins.tuple", "context": "null", "children_spec": [{"type": "builtins.tuple", "context": "null", "children_spec": [{"type": null, "context": null, "children_spec": []}, {"type": null, "context": null, "children_spec": []}]}, {"type": "builtins.dict", "context": "[]", "children_spec": []}]}]', output_pytree_spec='[1, {"type": null, "context": null, "children_spec": []}]'), bytecode=b"ML\xefR\rStableHLO_v0.19.1\x00\x01\x1d\x05\x01\x05\r\x01\x03\x0b\x03\x0b\x0f\x13\x17\x1b\x1f\x03S1\x0f\x01%\x07\x0f#\x0b\x0b\x0b\x0b\x0b\x0f\x0b\x0f\x0b\x0f\x0b\x0f\x0b\x0f\x0b\x03\r\x0b\x0b\x0b\x0b\x1f\x0f\x01\x03\x0b\x03\r\x17\x07\x0f'\x13\x07\x02\xb5\x1f\x11\x01\x00\x03\x07\x07\t\x0b\x03\r\x03\x05\x11\x01\x01\x05\x13\x05\x15\x05\x17\x1d\x13\x01\x05\x19\x1d\x17\x01\x05\x1b\x1d\x1b\x01\x05\x1d\x1d\x1f\x01\x05\x1f\x1d#\x01\x05!\x03\x01#\t\x1d#\x1d%\x1f\x03\t\x00\x00\x80?\x1f\x0b\x01\x01\t)\x05\x05\x15\x05\t)\x01\x05\x11\x07\x03\x07\x03\x07\x03\x03\x03)\x03\x01\r\x1d\x04\x91\x05\x01Q\x01\x05\x01\x07\x04\x7f\x03\x01\x05\x05P\x01\x03\x07\x04k\x03\x11\x1b\x07\x05\r\x05\x00\x07B\x11\x05\x03\x03\x03\x06\x15\x03\x03\x05\x01\x07\tF\x19\x07\x03\x03\x03\x03\x03\x06\x1d\x03\x03\x05\x05\x0b\x03\x06!\x03\x03\x05\t\r\x0b\x04\x01\x07\t\r\x0f\x06\x03\x01\x05\x01\x00\xb6\x03'\x03\x0b\x0f\x0f\x1b\r\x19\x17A!=\x15)\x19\x11\x0f\x0f\x0b\x11builtin\x00vhlo\x00module\x00add_v1\x00func_v1\x00constant_v1\x00broadcast_in_dim_v1\x00return_v1\x00mhlo.cross_program_prefetches\x00mhlo.is_dynamic\x00mhlo.use_auto_spmd_partitioning\x00IrToHlo.18\x00broadcast.5\x00add.6\x00broadcast.11\x00add.12\x00add.16\x00main\x00\x00\x08\x1d\t\x05\x1f\x01\x0b%'%)+\x03-\x03/", text='module @IrToHlo.18 attributes {mhlo.cross_program_prefetches = [], mhlo.is_dynamic = false, mhlo.use_auto_spmd_partitioning = false} {\n  func.func @main(%arg0: tensor<1x5xf32>, %arg1: tensor<f32>, %arg2: tensor<1x5xf32>) -> (tensor<1x5xf32>, tensor<1x5xf32>, tensor<1x5xf32>) {\n    %0 = stablehlo.constant dense<1.000000e+00> : tensor<1x5xf32>\n    %1 = stablehlo.add %arg0, %0 : tensor<1x5xf32>\n    %2 = stablehlo.broadcast_in_dim %arg1, dims = [] : (tensor<f32>) -> tensor<1x5xf32>\n    %3 = stablehlo.add %arg2, %2 : tensor<1x5xf32>\n    %4 = stablehlo.add %1, %3 : tensor<1x5xf32>\n    return %1, %3, %4 : tensor<1x5xf32>, tensor<1x5xf32>, tensor<1x5xf32>\n  }\n}\n')])

The text was updated successfully, but these errors were encountered:

thong3le · 2024-05-01T21:27:40Z

cc @JackCaoG

JackCaoG · 2024-05-01T21:46:12Z

This will be hard.. The way we consume the fx is to actually run the fx graph and lower them. during this process all of the comment will be ignored..

@lsy323 @qihqi in case you guys have other ideas.

thong3le · 2024-05-01T22:44:45Z

@JackCaoG I see, the stack trace can also be found in metadata of the fx node, e.g. node.meta

ipdb> nodes = list(ep.graph.nodes)
ipdb> nodes[2].meta
{'stack_trace': '  File "/home/thonle/ai/data/stablehlo/add/add.py", line 7, in forward\n    a += 1\n', 'nn_module_stack': {'L__self__': ('', <class '__main__.Test'>)}, 'source_fn_stack': [('iadd', <built-in function iadd>)], 'original_aten': <OpOverload(op='aten.add', overload='Tensor')>, 'from_node': [('a', <built-in function iadd>)], 'seq_nr': -1, 'val': FakeTensor(..., size=(1, 5)), 'tensor_meta': TensorMetadata(shape=torch.Size([1, 5]), dtype=torch.float32, requires_grad=False, stride=(5, 1), memory_format=torch.contiguous_format, is_quantized=False, qparams={})}

two follow-up questions,

is there a way to export some fx node metadata into attributes of stablehlo op?
what is the best way to debug if stablehlo func produces the incorrect output?

JackCaoG · 2024-05-02T00:15:25Z

for 1 I am not sure, @lsy323 and @qihqi might know better.
for 2, you can do binary search I guess? reduce the length of the model and figure out which pytorch op/layer gave you the incorrect answer.

thong3le · 2024-05-02T20:09:37Z

thanks @JackCaoG, are you aware of existing tool for (2)?

JackCaoG · 2024-05-02T20:19:50Z

There is this #5461 but I never used it myself.

lsy323 · 2024-05-02T21:23:57Z

@thong3le If you turn on the env var XLA_HLO_DEBUG=1, you can get some debug info in the exported StableHLO now. But it's different and less useful than the nn_module_stack in the FX node. The nn_module_stack in the FX node cannot be propagated to StableHLO export now, but it's possible.

thong3le · 2024-05-03T05:49:18Z

@lsy323 thanks, is there any plan to propagate nn_module_stack to StableHLO?

tlsdmstn56 · 2024-05-04T00:38:57Z

I also have a similar feature request and wonder if there is a plan to propagate any metadata in fx.Node.meta as op attribute.

GleasonK · 2024-05-09T16:30:37Z

I'm curious what bits of the metadata are important? File-line-col info? arg0_1 = None? Everything?

lsy323 · 2024-05-10T22:12:48Z

The support is added in #7046

lsy323 mentioned this issue May 10, 2024

Add option to export FX Node metadata to StableHLO #7046

Merged

lsy323 self-assigned this May 10, 2024

lsy323 added the stablehlo StableHLO related work label May 10, 2024

lsy323 closed this as completed in #7046 May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export debug information to StableHLO #7014

Export debug information to StableHLO #7014

thong3le commented May 1, 2024

thong3le commented May 1, 2024

JackCaoG commented May 1, 2024

thong3le commented May 1, 2024 •

edited

Loading

JackCaoG commented May 2, 2024

thong3le commented May 2, 2024

JackCaoG commented May 2, 2024

lsy323 commented May 2, 2024 •

edited

Loading

thong3le commented May 3, 2024

tlsdmstn56 commented May 4, 2024

GleasonK commented May 9, 2024

lsy323 commented May 10, 2024

Export debug information to StableHLO #7014

Export debug information to StableHLO #7014

Comments

thong3le commented May 1, 2024

❓ Questions and Help

thong3le commented May 1, 2024

JackCaoG commented May 1, 2024

thong3le commented May 1, 2024 • edited Loading

JackCaoG commented May 2, 2024

thong3le commented May 2, 2024

JackCaoG commented May 2, 2024

lsy323 commented May 2, 2024 • edited Loading

thong3le commented May 3, 2024

tlsdmstn56 commented May 4, 2024

GleasonK commented May 9, 2024

lsy323 commented May 10, 2024

thong3le commented May 1, 2024 •

edited

Loading

lsy323 commented May 2, 2024 •

edited

Loading