-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global CB Support #15180
Global CB Support #15180
Conversation
b9116e7
to
8033afe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when running the subdevice ops, we probably need to update the compute kernel pointers as well (inplace CB update to the Global CB pointer). I wonder should this be done within the kernel or have a better way of hiding it.
75ea2bf
to
a047d55
Compare
tt_metal/host_api.hpp
Outdated
namespace experimental { | ||
|
||
class GlobalCircularBuffer; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this directly in namespace v1. We have announced that we are no longer changing v0 going forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved it to v1 namespace, please take another look.
fa5f5c5
to
218b0f9
Compare
218b0f9
to
34894ac
Compare
34894ac
to
3d3e532
Compare
3d3e532
to
5851a89
Compare
5851a89
to
613aff0
Compare
613aff0
to
dfe17fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From conv/max_pool perspective this looks good.
dfe17fd
to
6eb1a24
Compare
Update GlobalSemaphore tests to use DispatchFixture Add global circular buffer tests
… after integrating remote cb initialization in fw
6eb1a24
to
00e6ee5
Compare
Ticket
#14108
Problem description
Metal infra support needed for creating and dispatching global circular buffers.
What's changed
Add metal infra support for creating and dispatching global circular buffers.
Dispatch data is laid out as follows:
Local CB data from 0 to max index across all cores, followed by remote CB data from 31 to min index across all cores.
We require that the min start remote index >= max end local index to avoid overlaps with initialization, since local cbs are sequentially initialized from 0 up, whereas remote cbs are sequentially initialized from 31 down. This also means for optimal dispatch performance, users should specify remote indices sequentially from 31, otherwise we are sending and initializing extraneous data, similarly to how local cb indices should go from 0 for optimal perf.
We decided to add the new global circular buffer support under an experimental namespace.
On the kernel side we are constrained with local memory size for the statically allocated interface array, so we cannot store as much data as we could for potentially simplifying user experience. Ex user must manually call secondary initialization functions in their own kernel, which is fairly error prone.
On the host side, we kept almost all apis the same except for a new CircularBufferConfig constructor to minimize api changes, however this may not be the best approach and potentially new apis should be added instead. TBD until we finalize some other cb api aspects regarding things such as potentially autogenerating ids.
Checklist