-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deterministic Hang with unpack_reconfig_data_format at a specific spot in Flash Decode Kernel #9608
Comments
An update to this issue: I found out in #13364 that input/output cbs just do not support reconfig dataformat properly. You would get hang for calling @ttmtrajkovic @rtawfik01 are you aware of this issue? It seems like this is a software design choice, not a hardware constraint -- I'd like to file an issue to have it supported. Do you know who I should talk to? |
@caixunshiren @ttmtrajkovic The Similarly Right now, only intermediate CBs can be used for both input and output within compute kernels. |
This should no longer be a problem with #14971. |
@caixunshiren Is this still an open issue ? If not, can you please mark this closed ? Thanks. |
@prajaramanTT no it's no longer open. Closing this issue now. Thanks |
Description
In the flash decode op #9510 which I'm currently implementing, I'm seeing that when
unpack_reconfig_data_format
is called before amul_tiles_bcast_xxx
kernel. As shown below, uncommentingline a
would cause a hang, but uncommentingline b
would not. It is worth noting that all inputs and intermediates are bf16, and cbs involved are only used in compute.tt-metal/tt_eager/tt_dnn/op_library/sdpa/kernels/compute/sdpa_flash_decode.cpp
Line 629 in 191b108
Things we tried
TTI_STALLWAIT(p_stall::STALL_CFG, p_stall::UNPACK)
andTTI_STALLWAIT(p_stall::STALL_UNPACK, p_stall::TRISC_CFG)
before and afterline a
; Did not resolve the hang.TTI_STALLWAIT(p_stall::STALL_UNPACK, p_stall::UNPACK)
before and afterline a
; Did not resolve the hang.mul_block_bcast_cols_inplace
function, which commenting out the compute callmul_tiles_bcast_scalar
resolves the hang.To Repro
I pushed a debug branch which is rebased on latest main from June 20th:
https://github.com/tenstorrent/tt-metal/tree/xuncai/flash-decode-reconfig-dataformat-hang
After building the branch, (optional) enable dprint:
Then, run the following command:
You should see that
[C] R ckpt 3
printed 3 times but[C] R ckpt 3.1
does not print.If you comment out line 629 in
sdpa_flash_decode.cpp
, then the test is expected to not hang.tt-metal/tt_eager/tt_dnn/op_library/sdpa/kernels/compute/sdpa_flash_decode.cpp
Line 629 in 191b108
FYI @cglagovichTT
The text was updated successfully, but these errors were encountered: