-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove restriction of input_nsticks_per_core % w == 0 #15205
Remove restriction of input_nsticks_per_core % w == 0 #15205
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang-Tidy
found issue(s) with the introduced code (1/1)
ttnn/cpp/ttnn/operations/pool/upsample/device/upsample_program_factory_multicore.cpp
Outdated
Show resolved
Hide resolved
ttnn/cpp/ttnn/operations/pool/upsample/device/upsample_program_factory_multicore.cpp
Outdated
Show resolved
Hide resolved
…ded tensor inputs. Signed-off-by: Nilaykumar Patel <[email protected]>
Signed-off-by: Nilaykumar Patel <[email protected]>
ToDo: commonize code. Signed-off-by: Nilaykumar Patel <[email protected]>
607825e
to
e39f225
Compare
Signed-off-by: Nilaykumar Patel <[email protected]>
Signed-off-by: Nilaykumar Patel <[email protected]>
a6ede9f
to
67eba82
Compare
Signed-off-by: Nilaykumar Patel <[email protected]>
…trictions-input_width Signed-off-by: Nilaykumar Patel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang-Tidy
found issue(s) with the introduced code (1/1)
ttnn/cpp/ttnn/operations/pool/upsample/device/upsample_program_factory_multicore.cpp
Show resolved
Hide resolved
ttnn/cpp/ttnn/operations/pool/upsample/device/upsample_program_factory_multicore.cpp
Show resolved
Hide resolved
Signed-off-by: Nilaykumar Patel <[email protected]>
…trictions-input_width
Suggested-by: Pavle Josipović <[email protected]> Signed-off-by: Nilaykumar Patel <[email protected]>
b677eea
to
09b57cb
Compare
…trictions-input_width
ttnn/cpp/ttnn/operations/pool/upsample/device/upsample_program_factory_multicore.cpp
Outdated
Show resolved
Hide resolved
writer_rt_args[5] = out_w; | ||
writer_rt_args[6] = 0; // set for each core below | ||
writer_rt_args[4] = input_nsticks_per_core; | ||
writer_rt_args[5] = output_nsticks_per_core / 2; // half of the outputs are processed by each core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if output_nsticks_per_core is odd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved logic to kernel where odd case is handled.
ttnn/cpp/ttnn/operations/pool/upsample/device/upsample_program_factory_multicore.cpp
Outdated
Show resolved
Hide resolved
ttnn/cpp/ttnn/operations/pool/upsample/device/upsample_program_factory_multicore.cpp
Show resolved
Hide resolved
...ttnn/operations/pool/upsample/device/kernels/dataflow/writer_upsample_multi_core_sharded.cpp
Outdated
Show resolved
Hide resolved
...ttnn/operations/pool/upsample/device/kernels/dataflow/writer_upsample_multi_core_sharded.cpp
Show resolved
Hide resolved
constexpr uint32_t config_cb_id = get_compile_time_arg_val(3); | ||
|
||
uint32_t reader_nsticks_per_core = (in_nsticks_per_core + is_reader) / 2; | ||
uint32_t writer_nsticks_per_core = in_nsticks_per_core / 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this code and moved output sticks logic here and am deriving output sticks from reader_nsticks_per_core so that we handle even and odd case correctly.
...ttnn/operations/pool/upsample/device/kernels/dataflow/writer_upsample_multi_core_sharded.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: Nilaykumar Patel <[email protected]>
Signed-off-by: Nilaykumar Patel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock, but please open an issue for this: #15205 (comment)
please also run nightly pipeline to make sure there are no perf regressions. |
Done. |
Ticket
Link
Problem description
Currently, whole input row is processed per core which inefficient since other cores could be idle
What's changed
Distribute work to all possible cores.
Checklist