Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global CB Support #15180

Merged
merged 5 commits into from
Dec 3, 2024
Merged

Global CB Support #15180

merged 5 commits into from
Dec 3, 2024

Conversation

tt-aho
Copy link
Contributor

@tt-aho tt-aho commented Nov 18, 2024

Ticket

#14108

Problem description

Metal infra support needed for creating and dispatching global circular buffers.

What's changed

Add metal infra support for creating and dispatching global circular buffers.

  • Split CB constants from circular_buffer.h to avoid building kernel file when building metal lib
  • Add GlobalCircularBuffer implementation. a Global CB has 2 buffers
    • cb_buffer is the actual buffer space in L1 that is used for pushing/popping data
    • cb_config_buffer is the data needed to populate the remote cb interface when initializing/using remote cbs in kernels
    • config contains 2 arrays of ptrs (l1 aligned) used for tiles acked/sent sized based on num_receivers_per_sender
  • Add a RemoteCBInterface struct, that is a union with the original CBInterface (renamed to LocalCBInterface) struct in order to save memory when creating the sized array. Each cb_id in the array corresponds to either a local or remote config, not both. We assert the two structs are the same size as we provide pointer aliases to the array to simplify access for whether it's a local/remote cb
  • Add new apis/functions for initializing the remote interfaces, as well as apis to interact/use the remote cbs in a new remote_circular_buffer_api.h file taken from the original remote cb microbenchmarks
  • Add support in dispatch for dispatching remote cb configs. Currently the config is 1 word representing the address to the config buffer to read/use
  • Update cb dispatch to support dispatching remote cbs. Kernel launch msg now contains a remote cb offset, as well as the min_start_remote_cb_id.
    Dispatch data is laid out as follows:
    Local CB data from 0 to max index across all cores, followed by remote CB data from 31 to min index across all cores.
    We require that the min start remote index >= max end local index to avoid overlaps with initialization, since local cbs are sequentially initialized from 0 up, whereas remote cbs are sequentially initialized from 31 down. This also means for optimal dispatch performance, users should specify remote indices sequentially from 31, otherwise we are sending and initializing extraneous data, similarly to how local cb indices should go from 0 for optimal perf.

We decided to add the new global circular buffer support under an experimental namespace.
On the kernel side we are constrained with local memory size for the statically allocated interface array, so we cannot store as much data as we could for potentially simplifying user experience. Ex user must manually call secondary initialization functions in their own kernel, which is fairly error prone.
On the host side, we kept almost all apis the same except for a new CircularBufferConfig constructor to minimize api changes, however this may not be the best approach and potentially new apis should be added instead. TBD until we finalize some other cb api aspects regarding things such as potentially autogenerating ids.

Checklist

@tt-aho tt-aho force-pushed the aho/global-cbs branch 9 times, most recently from b9116e7 to 8033afe Compare November 20, 2024 06:40
tt_metal/hw/firmware/src/ncrisc.cc Outdated Show resolved Hide resolved
tt_metal/hw/firmware/src/ncrisc.cc Outdated Show resolved Hide resolved
tt_metal/hw/inc/circular_buffer.h Outdated Show resolved Hide resolved
Copy link
Contributor

@yugaoTT yugaoTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when running the subdevice ops, we probably need to update the compute kernel pointers as well (inplace CB update to the Global CB pointer). I wonder should this be done within the kernel or have a better way of hiding it.

tt_metal/hw/inc/circular_buffer.h Outdated Show resolved Hide resolved
@tt-aho tt-aho force-pushed the aho/global-cbs branch 2 times, most recently from 75ea2bf to a047d55 Compare November 27, 2024 19:49
namespace experimental {

class GlobalCircularBuffer;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this directly in namespace v1. We have announced that we are no longer changing v0 going forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved it to v1 namespace, please take another look.

tt_metal/impl/buffers/circular_buffer_types.hpp Outdated Show resolved Hide resolved
@tt-aho tt-aho force-pushed the aho/global-cbs branch 9 times, most recently from fa5f5c5 to 218b0f9 Compare December 2, 2024 15:16
@tt-aho tt-aho marked this pull request as ready for review December 2, 2024 18:42
tt_metal/hw/inc/debug/sanitize_noc.h Outdated Show resolved Hide resolved
Copy link
Contributor

@pavlejosipovic pavlejosipovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From conv/max_pool perspective this looks good.

@tt-aho tt-aho merged commit 8355789 into main Dec 3, 2024
9 checks passed
@tt-aho tt-aho deleted the aho/global-cbs branch December 3, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants