Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove set_driver_*_params and set_device_*_params #395

Merged
merged 2 commits into from
Dec 16, 2024

Conversation

broskoTT
Copy link
Contributor

@broskoTT broskoTT commented Dec 11, 2024

Issue

Fixes #106

Description

Part of this was done previously in #260 #261 and #234

List of the changes

  • Added default l1_address_params and dram_address_params
  • Remove set_device_l1_address_params, set_device_dram_address_params, set_driver_host_address_params, set_driver_eth_interface_params
  • Remove unnecessary code from tests

Testing

CI tests, but more importantly CI tests on the related tt_metal PR.

API Changes

This PR has API changes:

Copy link
Collaborator

@blozano-tt blozano-tt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!!! LGTM!

My only minor concern is that tt-metal uses: https://github.com/tenstorrent/tt-metal/blob/main/tt_metal/hw/inc/wormhole/dev_mem_map.h for L1 memory map.

While UMD is using: https://github.com/tenstorrent/tt-umd/blob/main/src/firmware/riscv/wormhole/l1_address_map.h

I actually like UMD's method of defining the address map... MUCH BETTER.

However, I don't like that we have two roots of truth.

What happens if these diverge?

@broskoTT
Copy link
Contributor Author

My only minor concern is that tt-metal uses

Asked on tt_metal PR. I'm also not sure why are these membar values different, and whether that can be chosen arbitrarily or not.

Copy link
Contributor

@pjanevskiTT pjanevskiTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@pjanevskiTT
Copy link
Contributor

@broskoTT can you check which constants from our own memory map are metal/buda specific and remove it? You can open issue as well if you don't want to this in this or some follow up PR right now

@joelsmithTT
Copy link
Contributor

From looking at earlier UMD code, I never understood whether these constants were passed into UMD because they were architecture-specific (and the structure of the code at the time favored handling GS vs WH in this manner) or because the application/firmware developers want to move them around (this certainly seems to be possible for the FW_VERSION_ADDR, not sure about the barrier addresses).

Copy link
Contributor

@joelsmithTT joelsmithTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably worth understanding what determines the barrier addresses: hardware or firmware? I.e. can they move?

Moving the FW_VERSION_ADDR is asking for trouble, so probably safe to assume that is fixed.

@broskoTT broskoTT force-pushed the brosko/deprecate_sets branch from 3fc4e7c to 6e2d0bb Compare December 13, 2024 10:52
@broskoTT
Copy link
Contributor Author

Moving the FW_VERSION_ADDR is asking for trouble, so probably safe to assume that is fixed.

Yes

For barrier parameters, they can be moved around.
I changed the PR accordingly, to allow setting up only mem barriers.
All the other parameters (previously removed, and now removed fw_version_addr) should be constant.

However, I don't like that we have two roots of truth.

Memory map MIGHT consists of some constants which are truly defined by hardware, I can't find any at the moment, but maybe L1_MEM_SIZE, etc. Although most of these hardware specific are in noc_params I think.
Memory map actually consists of custom memory layout used for loading firmware and relevant queues onto the chip. It is fine to have two divergent memory maps. One is used by one client, tt_metal, and another is used by another client, umd unit tests.
Probably the one defined in umd should be cleared up since it was probably just copied from Buda, and lot of constants are necessary.
Additionally, it should be made more obvious that this is the case (by putting all of the maps into tests/ folder for example). I'll probably do that.

barrier params

some changes

minor modification

fix grayskull barrier tests
@broskoTT broskoTT force-pushed the brosko/deprecate_sets branch from e14c7fa to 969afb9 Compare December 16, 2024 20:39
@broskoTT broskoTT merged commit cfadca1 into main Dec 16, 2024
21 of 24 checks passed
@broskoTT broskoTT deleted the brosko/deprecate_sets branch December 16, 2024 21:22
@broskoTT broskoTT added the changes api API changing PR, needs changes in client code label Dec 17, 2024
broskoTT added a commit to tenstorrent/tt-metal that referenced this pull request Dec 17, 2024
### Ticket
Related to #13948

### Problem description
Related to UMD change: tenstorrent/tt-umd#395
These parameters are defined by hardware.
The only parameters to be defined by the client are membars. This change
adds set_barrier_address_params accordingly.

### What's changed
Removed set_device_dram_address_params and set_device_l1_address_params,
and related constants.
Replace with call to set_barrier_address_params.

### Checklist
- [ ] All post-commit tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12361649190 - main
generally unhealthy, I haven't seen new fails related to this change.
- [x] Blackhole post-commit tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12361650912
- [ ] (Single-card) Model perf tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12361652594 - N300
job failing on main
- [x] (Single-card) Device perf regressions :
https://github.com/tenstorrent/tt-metal/actions/runs/12361654455
- [x] (T3K) T3000 unit tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12361656223
- [ ] (T3K) T3000 demo tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12361657485 -
falcon job failing on main
- [ ] (TG) TG unit tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12361658889 - unit
test failing on main
- [ ] (TG) TG demo tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12361660547 -
failing on main
- [ ] (TGG) TGG unit tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12361662343 -
failing on main for a long time due to hugepages
- [x] (TGG) TGG demo tests :
https://github.com/tenstorrent/tt-metal/actions/runs/12361664160
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changes api API changing PR, needs changes in client code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revisit device_params and driver_params
4 participants