Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sub-Device fabric api updates, and make global sem/cb thread safe in metal #15936

Merged
merged 2 commits into from
Dec 12, 2024

Conversation

tt-aho
Copy link
Contributor

@tt-aho tt-aho commented Dec 11, 2024

Ticket

#15944

Problem description

Want to clean up the api regarding how to instantiate the ethernet cores as its own sub-device by adding a new temporary api that adds it automatically when creating the sub-device manager. This will be removed when actual fabric is implemented.

Global CBs + global sems were not thread-safe with the metal api Normally thread safety was handled in ttnn apis by pushing from the top level to workers, but some of the use cases of these apis were calling the metal api on the main thread, so we need to make them internally thread-safe.

What's changed

Add a new api create_sub_device_manager_with_fabric which has the same signature as create_sub_device_manager, but will automatically instantiate the ethernet cores as a separate sub-device. This also allows us to remove the MeshDevice create_sub_device_manager overload that allowed users to specify a different sub-device configuration per device that was added due to eth cores being different per device. This api is temporary until we have proper fabric, where eth cores will no longer be user programmable and won't be a sub-device anymore.

Make global cbs + sems metal apis thread safe by pushing the relevant code (writing the config/initial value) to the worker thread.

Checklist

@tt-aho tt-aho changed the title Sub-Device fabric api updates, and make global sem/cb thread safe in c++ Sub-Device fabric api updates, and make global sem/cb thread safe in metal c++ Dec 11, 2024
@tt-aho tt-aho force-pushed the aho/global-updates branch 3 times, most recently from f3e0647 to eb76f95 Compare December 11, 2024 23:05
@tt-aho tt-aho changed the title Sub-Device fabric api updates, and make global sem/cb thread safe in metal c++ Sub-Device fabric api updates, and make global sem/cb thread safe in metal Dec 11, 2024
Comment on lines 34 to 40
GlobalCircularBuffer(
Device* device,
const std::unordered_map<CoreCoord, CoreRangeSet>& sender_receiver_core_mapping,
uint32_t size,
BufferType buffer_type,
tt::stl::Span<const SubDeviceId> sub_device_ids);
tt::stl::Span<const SubDeviceId> sub_device_ids,
Private);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the ctor itself private instead?

Same for GlobalSemaphore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was following the pattern that was added for the Buffer class, which enables using std::make_shared instead of std::shared_ptr(new T) in the creation functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sminakov-tt thoughts on this as you originally added this pattern for buffer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By thoughts are basically the same as you described. It's the suggested way to be used for enable_shared_from_this: https://en.cppreference.com/w/cpp/memory/enable_shared_from_this.

But can you please move this "private" constructor closer to the bottom of the class and add a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've moved the creation fns to the top, constructors to the bottom and a comment saying they're "private" and to use the create fns instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I haven't noticed the shared_ptr! Why are we switching to the shared ownership though? If this is really desired / needed, another option is to create a shared_ptr using the move constructor at the client side, so that we are not imposing this on the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ s/switching/using - I see this is the existing usage.

tt_metal/impl/device/device.hpp Outdated Show resolved Hide resolved
@tt-aho tt-aho force-pushed the aho/global-updates branch 2 times, most recently from 5c0aafe to 189032a Compare December 12, 2024 02:53
…ager with a fabric sub-device. Remove create_sub_device_manager overload of MeshDevice that accepts a diffent sub-device config per device

This was added due to eth cores being different per device, but this is now handled internally
…o be thread-safe instead of depending on ttnn apis
@tt-aho tt-aho force-pushed the aho/global-updates branch from 189032a to bc81165 Compare December 12, 2024 16:10
@tt-aho tt-aho merged commit 8926ed0 into main Dec 12, 2024
127 of 128 checks passed
@tt-aho tt-aho deleted the aho/global-updates branch December 12, 2024 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants