You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
In general, data flow in CB between Data Movement Kernel (DMVK) and Compute Kernel is as follows.
Fig.1 Data Flow in CB between Data Movement Kernel and Compute Kernel
While implementing the operation, we encountered a hang issue at the seventh step of the cb_reserve_back API when the pop and push operations were repeatedly executed within the compute kernel, as shown in the diagram below.
Fig.2 Detailed Data Flow and Loop in Compute Kernel
I have added test code in the hang_in_compute_kernel branch by slightly modifying moreh_sum code.
The sequence appears to be correct, but we need to verify if this kernel implementation violates any guidelines or if it is indeed a bug.
Details
Describe the bug
In general, data flow in CB between Data Movement Kernel (DMVK) and Compute Kernel is as follows.
Fig.1 Data Flow in CB between Data Movement Kernel and Compute Kernel
While implementing the operation, we encountered a hang issue at the seventh step of the
cb_reserve_back
API when the pop and push operations were repeatedly executed within the compute kernel, as shown in the diagram below.Fig.2 Detailed Data Flow and Loop in Compute Kernel
I have added test code in the
hang_in_compute_kernel
branch by slightly modifying moreh_sum code.The sequence appears to be correct, but we need to verify if this kernel implementation violates any guidelines or if it is indeed a bug.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
No kernel hang occurs.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
OS: Ubuntu 20.04 / 172.27.44.100 VM
Version of software (eg. commit)
commit 0ddee05 (HEAD -> pack_uint32_to_uint8, origin/main, origin/HEAD)
Author: Nenad Petrovic [email protected]
Date: Mon Jul 22 15:25:21 2024 +0200
RDIV migration to TTNN (RDIV migration to TTNN tenstorrent/tt-metal#10389)
TTLIB ops sweep migrations to TTNN tenstorrent/tt-metal#10147: Add rdiv test corrections
TTLIB ops sweep migrations to TTNN tenstorrent/tt-metal#10147: Corrected the naming of rdiv op test
TTLIB ops sweep migrations to TTNN tenstorrent/tt-metal#10147: naming modify
TTLIB ops sweep migrations to TTNN tenstorrent/tt-metal#10147: Linter applied
TTLIB ops sweep migrations to TTNN tenstorrent/tt-metal#10147: Linter applied
TTLIB ops sweep migrations to TTNN tenstorrent/tt-metal#10147: fix opmap
The text was updated successfully, but these errors were encountered: