Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_for_grayskull_cluster -> create_mock_cluster #310

Merged
merged 5 commits into from
Nov 22, 2024
Merged

Conversation

broskoTT
Copy link
Contributor

@broskoTT broskoTT commented Nov 18, 2024

Issue

Contributes to #99 , and a continuation of #280

Description

This function is now only used for creating a cluster descriptor for simulation or mockup environment. Changing accordingly.

List of the changes

  • Renamed create_for_grayskull_cluster to create_mock_cluster
  • Changed arguments and functionality so that this function does not accept physical device ids at all.
  • Accepting std::vector instead of std::set for logical device ids

Testing

Code builds.

API Changes

This PR has API changes:

@broskoTT broskoTT added the changes api API changing PR, needs changes in client code label Nov 18, 2024
Copy link
Contributor

@pjanevskiTT pjanevskiTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Do we have any actual tests for this, maybe @vtangTT can provide something

@vtangTT
Copy link
Contributor

vtangTT commented Nov 18, 2024

Just FYI that I just uncovered last night that simulator was broken on the metal level due to changes made in this tenstorrent/tt-metal#14945.

Is there a branch with related metal changes? I want to see if this fixes the simulator problem.

@pjanevskiTT
Copy link
Contributor

Could you try and bisect the commits to see which one breaks the simulator? Not really sure there is a branch with related metal changes. So far uplift strategy was to uplift everything that changes the API plus Almeet uplifts some things that are useful for metal when there is some change inside UMD. If you can pinpoint the commit maybe we can debug it together

@vtangTT
Copy link
Contributor

vtangTT commented Nov 19, 2024

I have a small change to fix for simulator. Is it ok if I push to your branch @broskoTT

On the note of future verification/testing for changes that could affect simulator, I'm trying to setup a pipeline for it on the gitlab side (actually how I found out it was broken in the first place)

@broskoTT broskoTT merged commit e9dc0d1 into main Nov 22, 2024
19 of 22 checks passed
@broskoTT broskoTT deleted the brosko/grays-mock branch November 22, 2024 12:33
broskoTT added a commit to tenstorrent/tt-metal that referenced this pull request Nov 22, 2024
### Ticket
Related to #13948

### Problem description
Tied to UMD change tenstorrent/tt-umd#310

### What's changed
- Rename and clear up create_for_grayskull_cluster to
create_mock_cluster
- For grayskull, the CEM already works, and the create_from_yaml path is
successfully used already

### Checklist
- [x] Post commit CI passes :
https://github.com/tenstorrent/tt-metal/actions/runs/11972671759
- [x] Blackhole Post commit (if applicable): Not applicable
- [x] Model regression CI testing passes (if applicable): Not applicable
- [x] Device performance regression CI testing passes (if applicable):
Not applicable
- [x] New/Existing tests provide coverage for changes: Not applicable
broskoTT added a commit to tenstorrent/tt-metal that referenced this pull request Nov 24, 2024
### Ticket
Related to #13948

### Problem description
Tied to UMD change tenstorrent/tt-umd#310

### What's changed
- Rename and clear up create_for_grayskull_cluster to
create_mock_cluster
- For grayskull, the CEM already works, and the create_from_yaml path is
successfully used already

### Checklist
- [x] Post commit CI passes :
https://github.com/tenstorrent/tt-metal/actions/runs/11972671759
- [x] Blackhole Post commit (if applicable): Not applicable
- [x] Model regression CI testing passes (if applicable): Not applicable
- [x] Device performance regression CI testing passes (if applicable):
Not applicable
- [x] New/Existing tests provide coverage for changes: Not applicable
broskoTT added a commit to tenstorrent/tt-metal that referenced this pull request Nov 26, 2024
### Ticket
Related to #13948

### Problem description
Tied to UMD change tenstorrent/tt-umd#310

### What's changed
- Rename and clear up create_for_grayskull_cluster to
create_mock_cluster
- For grayskull, the CEM already works, and the create_from_yaml path is
successfully used already

### Checklist
- [x] Post commit CI passes :
https://github.com/tenstorrent/tt-metal/actions/runs/11972671759
- [x] Blackhole Post commit (if applicable): Not applicable
- [x] Model regression CI testing passes (if applicable): Not applicable
- [x] Device performance regression CI testing passes (if applicable):
Not applicable
- [x] New/Existing tests provide coverage for changes: Not applicable
broskoTT added a commit to tenstorrent/tt-metal that referenced this pull request Nov 26, 2024
This is PR similar to #15301. It previously broke CI pipelines, but this
PR has a fix for that.

### Ticket
Related to #13948

### Problem description
Tied to UMD change tenstorrent/tt-umd#310

### What's changed
- Rename and clear up create_for_grayskull_cluster to
create_mock_cluster
- For grayskull, the CEM already works, and the create_from_yaml path is
successfully used already
- eth_coord_t was updated in
tenstorrent/tt-umd#306 , there is a change on
how PhysicalCoordinate is created to reflect that.

### Checklist
- [x] All post-commit tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033347997
- [x] Blackhole post-commit tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033347995
- [ ] (Single-card) Model perf tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033439959
- [ ] (Single-card) Device perf regressions :
https://github.com/tenstorrent/tt-metal/actions/runs/12033442095
- [x] (T3K) T3000 unit tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033456773
- [x] (T3K) T3000 demo tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033517770
- [x] (TG) TG unit tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033520415
- [x] (TG) TG demo tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033522422
- [x] (TGG) TGG unit tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033524632
- [x] (TGG) TGG demo tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033526704
gfengTT pushed a commit to tenstorrent/tt-metal that referenced this pull request Nov 27, 2024
This is PR similar to #15301. It previously broke CI pipelines, but this
PR has a fix for that.

### Ticket
Related to #13948

### Problem description
Tied to UMD change tenstorrent/tt-umd#310

### What's changed
- Rename and clear up create_for_grayskull_cluster to
create_mock_cluster
- For grayskull, the CEM already works, and the create_from_yaml path is
successfully used already
- eth_coord_t was updated in
tenstorrent/tt-umd#306 , there is a change on
how PhysicalCoordinate is created to reflect that.

### Checklist
- [x] All post-commit tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033347997
- [x] Blackhole post-commit tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033347995
- [ ] (Single-card) Model perf tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033439959
- [ ] (Single-card) Device perf regressions :
https://github.com/tenstorrent/tt-metal/actions/runs/12033442095
- [x] (T3K) T3000 unit tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033456773
- [x] (T3K) T3000 demo tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033517770
- [x] (TG) TG unit tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033520415
- [x] (TG) TG demo tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033522422
- [x] (TGG) TGG unit tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033524632
- [x] (TGG) TGG demo tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12033526704
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changes api API changing PR, needs changes in client code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants