-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uplift third_party/tt-metal to a6ec1c0a02c3b9dc55e769618639e0439ef9c06b 2025-01-13 #1752
Conversation
Not clean - this uplift to tt-metal at ca2c8677eb1ff3e2531c030070fa28554edb382c brought 35 tt-metal commits:
Failing a test in tt-torch (https://github.com/tenstorrent/tt-torch/actions/runs/12722170639/job/35466363484):
And bunch in tt-forge-fe (https://github.com/tenstorrent/tt-forge-fe/actions/runs/12722170869) - here are 9 fails in runner1 job (2 xpass):
quick look at tt-metal tree shows they had bad day yesterday with lots of regressions. The latest commit when this job was kicked off still had fails. There was one commit that went in few hours later at tenstorrent/tt-metal@323a5d7 that might be worth picking up, it improves their pass rate. In the interest of low effort debug, going to push that to this branch to kickoff CI (here and downstream) again and cross fingers. |
Same results with that run. Several bisects on CI in the background show all the new fails are due to this commit which happened to be the very first commit in the range.
I pinged Stas in slack metal pipelines thread, but probably need someone to help followup. |
Managed to reproduce the failure with the following TTNN test on latest tt-metal main: import torch
import ttnn
with ttnn.manage_device(0) as device:
torch_input = torch.rand(14)
torch_output = torch.log(torch_input)
ttnn_input = ttnn.from_torch(torch_input, dtype=ttnn.float32)
ttnn_input = ttnn.to_layout(ttnn_input, ttnn.TILE_LAYOUT)
ttnn_input = ttnn.to_device(ttnn_input, device, memory_config=ttnn.DRAM_MEMORY_CONFIG)
ttnn_output = ttnn.log(ttnn_input)
ttnn_output = ttnn.from_device(ttnn_output)
ttnn_output = ttnn.to_layout(ttnn_output, ttnn.ROW_MAJOR_LAYOUT)
ttnn_output = ttnn.to_torch(ttnn_output)
print("Torch Output: ")
print(torch_output)
print("TTNN Output: ")
print(ttnn_output)
torch.testing.assert_close(ttnn_output, torch_output, rtol=1e-5, atol=1e-5) The failure log:
|
Created the following issue on the metal side to track the issue: |
…6b 2025-01-13 (1D tensor fix for tt-forge-fails)
Going to merge. Pulled in Stas' bug fix, which brought some more small issues (ShardSpec interface changed to remove halo, made updates for it here).
Since we are falling behind in tt-metal, and above is only fail, I have to force this through, will flag to folks in slack. This brings 70 tt-metal commits now.
|
This PR uplifts the third_party/tt-metal to the a6ec1c0a02c3b9dc55e769618639e0439ef9c06b