Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPRINT asserts hardcoded L1 address check causing dprint failure on WH_B0 #5980

Closed
yugaoTT opened this issue Mar 4, 2024 · 11 comments · Fixed by #6028
Closed

DPRINT asserts hardcoded L1 address check causing dprint failure on WH_B0 #5980

yugaoTT opened this issue Mar 4, 2024 · 11 comments · Fixed by #6028
Assignees
Labels
bug Something isn't working P1 WH

Comments

@yugaoTT
Copy link
Contributor

yugaoTT commented Mar 4, 2024

is the CB is allocated above 1024 * 1024 or below 120*1024, the dprint failed with an error BAD TILE POINTER count=0.

@tt-dma
Copy link
Contributor

tt-dma commented Mar 4, 2024

Could you provide an example that reproduces this bug? So we can check if the bounds need updating or if there's a bug in the DPRINT addressing

@yugaoTT
Copy link
Contributor Author

yugaoTT commented Mar 4, 2024

yes,
branch yugao/moreh-u-benchmark
make build
make tests
run ./build/test/tt_metal/perf_microbenchmark/1_compute_mm/test_compute_mm --m 32 --n 32 --k 64 --dtype 1 --fidel 0 --block 1 --num-tests 1 --fast-dispatch --bypass-check --one-core 1 --num-blocks 1

@yugaoTT yugaoTT added the WH label Mar 4, 2024
@tt-dma
Copy link
Contributor

tt-dma commented Mar 4, 2024

Are there additional defines that are needed to run this test? I'm getting:

(python_env) davidma@e04cs04:~/tt-metal$ ./build/test/tt_metal/perf_microbenchmark/1_compute_mm/test_compute_mm --m 32 --n 32 --k 64 --dtype 1 --fidel 0 --block 1  --num-tests 1 --fast-dispatch --bypass-check --one-core 1 --num-blocks 1
Remaining test_args:
        --dtype
        1
        --num-blocks
        1
                   Test | ERROR    | Command line arguments found exception
                 Always | ERROR    | In the slow dispatch mode, device profiler is used to measure the performance. Build the Metal library and test code with 'ENABLE_PROFILER=1'
                   Test | ERROR    | TT_ASSERT @ tests/tt_metal/tt_metal/perf_microbenchmark/1_compute_mm/test_compute_mm.cpp:199: false
backtrace:
 --- /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f6b9839e083]
 --- ./build/test/tt_metal/perf_microbenchmark/1_compute_mm/test_compute_mm(+0xa0ae) [0x55a4ed47f0ae]

                   Test | ERROR    | System error message: Success
                   Test | ERROR    | Test Failed

Seems it needs either profiler enabled or a certain dispatch mode?

@yugaoTT
Copy link
Contributor Author

yugaoTT commented Mar 4, 2024 via email

@tt-dma
Copy link
Contributor

tt-dma commented Mar 4, 2024

I checked out origin/yugao/moreh-u-benchmark

@yugaoTT
Copy link
Contributor Author

yugaoTT commented Mar 4, 2024 via email

@yugaoTT
Copy link
Contributor Author

yugaoTT commented Mar 4, 2024

Oh sorry you are trying to dprint. Then it shouldn't be built with enable profiles. make build make test should work. If it doesn't, I will double check after went back home

@tt-dma
Copy link
Contributor

tt-dma commented Mar 4, 2024

I'm running on an n300 BM, and I'm able to run your test after checking out that commit (no extra env vars). I see a bunch of prints and then this error:

index: 2046 1 1
index: 2047 1 1
                   Test | ERROR    | validation single core failed
                  Metal | INFO     | Closing device 0
                 Always | INFO     | CSV_MICROBENCHMARK:title:test_compute_mm
                 Always | INFO     | CSV_INPUT:M:32:N:32:K:64:fast-dispatch:true

This looks separate from the DPRINT error that would be tripped by those bounds?

@yugaoTT
Copy link
Contributor Author

yugaoTT commented Mar 4, 2024 via email

@tt-dma
Copy link
Contributor

tt-dma commented Mar 4, 2024

Any idea on how I can reproduce the dprint issue that is filed here then?

@yugaoTT
Copy link
Contributor Author

yugaoTT commented Mar 5, 2024

please enable the printing on core 0,0. export TT_METAL_DPRINT_CORES="(0,0)", the DPRINT is in file tests/tt_metal/tt_metal/perf_microbenchmark/1_compute_mm/kernels/bmm_large_block_zm_fused_bias_activation_block.cpp, at line 115

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1 WH
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants