Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core coordinates prototype #266

Merged
merged 13 commits into from
Dec 2, 2024
Merged

Core coordinates prototype #266

merged 13 commits into from
Dec 2, 2024

Conversation

pjanevskiTT
Copy link
Contributor

@pjanevskiTT pjanevskiTT commented Nov 5, 2024

Implement core coordinates translation

This PR is going to close #219 #243 #228 #212

As discussed, we are going to have CoordSystem enum, which is going to be used to determine the appropriate coordiante system of the coordinates that are used in the API

enum class CoordSystem {
    LOGICAL,
    PHYSICAL,
    VIRTUAL,
    TRANSLATED,
};

For now, we support these 4 coordinate systems. For explanations on each coordinate system there is a pending PR to cleanup the docs properly, but understanding these coordinate systems should not be prerequisite to review this PR. It is just needed to understand that each core coordinate can be in multiple coordinate systems.

Note for reviews

Since we have CoordinateManager class here, in order to better understand the use of that class it might be useful to skim the PR #202. In general it is mostly used to support the translation between coordinate systems.

The most useful parts of the PR to review are (in my opinion)

  1. For tt-metal people - test_core_coordinate.cpp file - these tests should represent how tt-metal (and other clients) should use coordinate types
  2. For tt-umd people - everything else :)
  3. Not in 1 or 2 - then probably same as tt-metal people, since this would be API change for all other clients

V1 coordinates

For V1 prototype, we have additional enum representing core type

enum class CoreType {
  ARC,
  DRAM,
  ETH,
  PCIE,
  TENSIX,
  ROUTER_ONLY,
};

which is used in the main struct

 struct CoreCoord_V1 : public tt_xy_pair {
    CoreType core_type;
    CoordSystem coord_system;
 };

When using this, it is needed to set both the core type and coordinate system, in order for API to work properly.

V2 coordinates (not going to be implemented, it was used for discussion)

For V2 prototype, we would have one struct for each core type that is represented using enum in V1. In PR, structs are implemented for Tensix and DRAM cores, just to see the effects of having multiple core types

struct CoreCoord_V2 : public tt_xy_pair {
    CoordSystem coord_system;
};

struct TensixCoreCoord_V2 : public CoreCoord_V2 {};
 
struct DramCoreCoord_V2 : public CoreCoord_V2 {};

When using this it is needed to set just the coordinate system, core type comes from struct type.

API changes

API is provided inside tt_SiliconDevice class for both the V1 and V2 core coordinates. Focus was on providing API for reads/writes as the example. I think looking this from PR is the best way to explain it.

Main point to make here is that API is the same for all coordinate systems, which was one of the points that tt-metal folks made in the meeting. UMD can figure out how to properly program the TLBs and write to device for any coordinate system. In the PR we rely on translating everything to physical (NOC) coordinates

Pros/cons

All pros/cons are my view, feel free to comment on everything

V1

Pros

  • coordinate translation logic is one place
  • less API functions

Cons

  • One extra step for figuring out core type on each API call
  • readability
  • core type can be concluded at runtime

V2

Pros

  • readability
  • core type is clear from the type

Cons

  • API blows up, need to provide functions for each core type

Questions for reviewers

  • Do we want to be able to get worker/DRAM cores in all coordinate systems from Soc descriptor?

TODOs

  • Think about CoreCoordRange, is that representation simply a set of CoreCoord
  • Finish translation for Blackhole translated coordinates (DRAM, ARC, PCIE)
  • Implement L2CPU cores if needed ?
  • Test custom soc descriptor coordinate manager
  • Rewrite tests to use CoordinateManager instead of tt_SocDescriptor

@pjanevskiTT pjanevskiTT self-assigned this Nov 5, 2024
@pjanevskiTT pjanevskiTT force-pushed the pjanevski/core_coordinates branch from 9a910ec to b160428 Compare November 5, 2024 13:22
@pjanevskiTT pjanevskiTT requested a review from broskoTT November 5, 2024 13:28
device/tt_soc_descriptor.h Outdated Show resolved Hide resolved
@pjanevskiTT
Copy link
Contributor Author

pjanevskiTT commented Nov 5, 2024

@pgkeller @eyonland @TT-billteng - - couldn't tag you in reviewers for some reason

device/tt_core_coordinates.h Outdated Show resolved Hide resolved
device/tt_core_coordinates.h Outdated Show resolved Hide resolved
device/tt_core_coordinates.h Outdated Show resolved Hide resolved
device/coordinate_manager.h Outdated Show resolved Hide resolved
device/tt_core_coordinates.h Outdated Show resolved Hide resolved
device/tt_soc_descriptor.cpp Outdated Show resolved Hide resolved
device/tt_soc_descriptor.h Outdated Show resolved Hide resolved
@broskoTT
Copy link
Contributor

broskoTT commented Nov 7, 2024

@pjanevskiTT feel free to just delete the V2 coords from this PR and continue working on this one, rather than creating a new PR. This would nicely keep all the conversation threads at the same place...

@tt-vjovanovic
Copy link
Member

In debuda (soon to be called tt-lens), we are using "exported" soc descriptor back in yaml format. It would be great if I could choose coordinates to be returned with tt_device::get_soc_descriptor...

@tt-vjovanovic
Copy link
Member

tt-vjovanovic commented Nov 12, 2024

In tt-lens JTAG scenario when there is no working PCI device attached to PC, we would still like to use CoordinateManager - just make constructor public...

@pjanevskiTT pjanevskiTT force-pushed the pjanevski/core_coordinates branch 4 times, most recently from bbd768b to 3fb5a0b Compare November 21, 2024 16:31
@pjanevskiTT
Copy link
Contributor Author

@abhullar-tt @tt-vjovanovic I implemented translation between different DRAM coordinate systems for CoreCoord in 3fb5a0b.

This is under assumption that harvesting mask is one bit per bank, I will follow up if something changes. When harvesting info is provided to SW, we will implement e2e support for this. For example on how clients (tt-metal, tt-lens) can interact with the translation you can take a look at test_core_coord_translation_bh.cpp

@pjanevskiTT pjanevskiTT force-pushed the pjanevski/core_coordinates branch from 7526902 to 5665a44 Compare November 22, 2024 17:28
@pjanevskiTT
Copy link
Contributor Author

@broskoTT @abhullar-tt @pgkeller @tt-vjovanovic @tt-asaigal The PR is taking final shape in terms of how translation is going to be implemented. One more thing I need to do is confirm all translated coordinates on Blackhole and implement that, but that will just require overriding few functions, in a way that has already been done. Can you all review parts of the PR that you are interested in, I will continue resolving the original comment all of you have left or I will open issues if those comments are not specific for translations, rather for API/other changes

@pjanevskiTT pjanevskiTT marked this pull request as ready for review November 25, 2024 19:25
Copy link
Contributor

@broskoTT broskoTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome that we will have this!
Sorry for leaving 32 comments, hopefully you'll also like the result in the end...

device/api/umd/device/coordinate_manager.h Show resolved Hide resolved
device/coordinate_manager.cpp Show resolved Hide resolved
device/api/umd/device/tt_core_coordinates.h Show resolved Hide resolved
device/tt_soc_descriptor.cpp Show resolved Hide resolved
device/coordinate_manager.cpp Outdated Show resolved Hide resolved
device/coordinate_manager.cpp Show resolved Hide resolved
tests/api/test_core_coord_translation_wh.cpp Show resolved Hide resolved
tests/api/test_core_coord_translation_bh.cpp Show resolved Hide resolved
@pjanevskiTT pjanevskiTT force-pushed the pjanevski/core_coordinates branch from 467ad4a to 2eb9088 Compare December 2, 2024 10:29
@pjanevskiTT pjanevskiTT merged commit 8cc7bb6 into main Dec 2, 2024
12 checks passed
@pjanevskiTT pjanevskiTT deleted the pjanevski/core_coordinates branch December 2, 2024 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Harvesting] Cover non-tensix cores
6 participants