Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ND WH watcher error when attempting to turn on watcher in all post-commit pipelines #6763

Closed
TT-billteng opened this issue Mar 26, 2024 · 4 comments
Labels
bug Something isn't working ci-bug bugs found in CI P1

Comments

@TT-billteng
Copy link
Collaborator

TT-billteng commented Mar 26, 2024

I'm trying to enable watcher on all non-perf pipelines so that device-side issues reported by watcher can be caught sooner.

On my branch where I try to enable watcher, I see this error when running post-commit action:

https://github.com/tenstorrent-metal/tt-metal/actions/runs/8429687483/job/23084407648

2024-03-26T02:04:38.0322916Z tests/tt_eager/python_api_testing/unit_testing/misc/test_optimized_conv_v2.py::test_optimized_conv_v2[pack_l1-LoFi-activations_BFLOAT16-weights_BFLOAT8_B-8-128-128-28-28-3-3-1-1-1-1-True-True-False-False] �[38;2;000;128;000m                  Metal�[0m | �[1m�[38;2;100;149;237mINFO    �[0m | Initializing device 0
2024-03-26T02:04:38.0684146Z �[38;2;000;128;000m                  Metal�[0m | �[1m�[38;2;100;149;237mINFO    �[0m | AI CLK for device 0 is:   800 MHz
2024-03-26T02:04:38.0746331Z �[38;2;000;128;000m              LLRuntime�[0m | �[1m�[38;2;100;149;237mINFO    �[0m | Watcher log file: /home/ubuntu/actions-runner/_work/tt-metal/tt-metal/generated/watcher/watcher.log
2024-03-26T02:04:38.0749693Z �[38;2;000;128;000m              LLRuntime�[0m | �[1m�[38;2;100;149;237mINFO    �[0m | Watcher attached device 0
2024-03-26T02:04:38.0751749Z �[38;2;000;128;000m              LLRuntime�[0m | �[1m�[38;2;100;149;237mINFO    �[0m | Watcher thread watching...
2024-03-26T02:04:38.1134412Z 2024-03-26 02:04:38.113 | INFO     | tests.tt_eager.python_api_testing.unit_testing.misc.test_optimized_conv_v2:test_optimized_conv_v2:160 - Conv output shape - [8, 28, 28, 128]
2024-03-26T02:05:38.0751917Z �[38;2;000;128;000m              LLRuntime�[0m | �[1m�[38;2;100;149;237mINFO    �[0m | Watcher checking device 0
2024-03-26T02:05:38.0954477Z terminate called after throwing an instance of 'std::runtime_error'
2024-03-26T02:05:38.0956670Z   what():  Read 0xffffffff from ARC scratch[6]: auto-reset succeeded.
2024-03-26T02:05:38.0957520Z Fatal Python error: Aborted
2024-03-26T02:05:38.0965755Z 
2024-03-26T02:05:38.0993090Z Thread 0x00007f2d1214e740 (most recent call first):
2024-03-26T02:05:38.0995270Z   File "/home/ubuntu/actions-runner/_work/tt-metal/tt-metal/tt_eager/tt_dnn/op_library/sliding_window_op_infra/tt_py_composite_conv.py", line 1104 in copy_output_from_device
2024-03-26T02:05:38.0997252Z   File "/home/ubuntu/actions-runner/_work/tt-metal/tt-metal/tests/tt_eager/python_api_testing/unit_testing/misc/test_optimized_conv_v2.py", line 233 in test_optimized_conv_v2
2024-03-26T02:05:38.0998926Z   File "/home/ubuntu/python_env/lib/python3.8/site-packages/_pytest/python.py", line 195 in pytest_pyfunc_call
@TT-billteng
Copy link
Collaborator Author

@jliangTT need some help with figuring out who should own this bug as I don't see a clear "owner" for this file

@TT-billteng
Copy link
Collaborator Author

@jliangTT
Copy link

tests/tt_eager/python_api_testing/unit_testing/misc/test_optimized_conv_v2.py

@tt-nshanker , is this the test case related to the 2.0 development?

@prajaramanTT
Copy link

@TT-billteng Can we close this issue ?

@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in External Requests and Reports Nov 28, 2024
@TT-billteng TT-billteng closed this as not planned Won't fix, can't repro, duplicate, stale Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ci-bug bugs found in CI P1
Projects
None yet
Development

No branches or pull requests

3 participants