-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UMD] Removed set_*_params calls and constants #15908
Conversation
@@ -36,7 +36,6 @@ HalCoreInfoType create_idle_eth_mem_map() { | |||
|
|||
mem_map_bases.resize(static_cast<std::size_t>(HalL1MemAddrType::COUNT)); | |||
mem_map_bases[static_cast<std::size_t>(HalL1MemAddrType::BASE)] = MEM_ETH_BASE; | |||
mem_map_bases[static_cast<std::size_t>(HalL1MemAddrType::BARRIER)] = MEM_L1_BARRIER; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pgkeller why is this 12 in tt_metal, but here it takes some other value? Is this barrier a fixed address, or can it be chosen arbitrarily?
static constexpr std::uint32_t L1_BARRIER_BASE = 0x16dfc0;
https://github.com/tenstorrent/tt-umd/blob/main/src/firmware/riscv/blackhole/l1_address_map.h#L45
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using 0x16dfc0 in metal would cause problems, that could potentially overwrite tensor data
metal puts all its reserved addresses low in memory for tensix (eth is a little different due to the base firmware)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to add, the barrier doesn't have to be at a fixed address. It is up to the memory map definers to allocate space for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so there are two sets of parameters:
- Parameters which are constant for our hardware, and UMD's client doesn't have freedom to choose it, but has to use that one value.
- Parameters which can freely be chosen by the user, like how l1 memory is layed out.
I'd obviously want to remove the first category but not the latter. If l1_barrier can be chosen, I'll change this PR.
There is probably also a third category, something you might want to change theoretically, but some value is highly suggested, or it would require a power user to make sure our device works properly.
So I obviously need help with this :)
Can you comment on which category do these parameters fall into?
The tt_driver_host_address_params was already removed (just want to recheck)
The tt_driver_eth_interface_params was already removed (just want to recheck)
Fw_Base was removed from l1_address_params because it wasn't used in UMD.
struct tt_device_dram_address_params {
std::uint32_t DRAM_BARRIER_BASE = 0;
};
/**
- Struct encapsulating all L1 Address Map parameters required by UMD.
- These parameters are passed to the constructor.
*/
struct tt_device_l1_address_params {
std::uint32_t ncrisc_fw_base = 0;
std::uint32_t fw_base = 0;
std::uint32_t trisc0_size = 0;
std::uint32_t trisc1_size = 0;
std::uint32_t trisc2_size = 0;
std::uint32_t trisc_base = 0;
std::uint32_t tensix_l1_barrier_base = 0;
std::uint32_t eth_l1_barrier_base = 0;
std::uint32_t fw_version_addr = 0;
};
/**
- Struct encapsulating all Host Address Map parameters required by UMD.
- These parameters are passed to the constructor and are needed for non-MMIO transactions.
*/
struct tt_driver_host_address_params {
std::uint32_t eth_routing_block_size = 0;
std::uint32_t eth_routing_buffers_start = 0;
};
/**
- Struct encapsulating all ERISC Firmware parameters required by UMD.
- These parameters are passed to the constructor and are needed for non-MMIO transactions.
*/
struct tt_driver_eth_interface_params {
std::uint32_t noc_addr_local_bits = 0;
std::uint32_t noc_addr_node_id_bits = 0;
std::uint32_t eth_rack_coord_width = 0;
std::uint32_t cmd_buf_size_mask = 0;
std::uint32_t max_block_size = 0;
std::uint32_t request_cmd_queue_base = 0;
std::uint32_t response_cmd_queue_base = 0;
std::uint32_t cmd_counters_size_bytes = 0;
std::uint32_t remote_update_ptr_size_bytes = 0;
std::uint32_t cmd_data_block = 0;
std::uint32_t cmd_wr_req = 0;
std::uint32_t cmd_wr_ack = 0;
std::uint32_t cmd_rd_req = 0;
std::uint32_t cmd_rd_data = 0;
std::uint32_t cmd_buf_size = 0;
std::uint32_t cmd_data_block_dram = 0;
std::uint32_t eth_routing_data_buffer_addr = 0;
std::uint32_t request_routing_cmd_queue_base = 0;
std::uint32_t response_routing_cmd_queue_base = 0;
std::uint32_t cmd_buf_ptr_mask = 0;
std::uint32_t cmd_ordered = 0;
std::uint32_t cmd_broadcast = 0;
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tt_device_dram_address_params
and tt_device_l1_address_params
are freely chosen by users.
tt_device_l1_address_params
- Curious to know if
triscX_size/base
andncrisc_fw_base
are used by UMD - I also am not clear on need/usage of
fw_version_addr
. Metal has never really used this.
tt_driver_host_address_params/tt_driver_eth_interface_params
- This should only be required for WH
- I don't see Metal calling
set_driver_host_address_params
orset_driver_eth_interface_params
anywhere. This seems to pull out addresses needed by base eth routing FW so it falls into "constant for our hardware" category
I like Joel's idea of user supplying barrier address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You probably missed the info in my huge comment, but yeah the set_driver_host_address_params and set_driver_eth_interface_params were removed.
- Trisc and ncrisc fw_base have been removed indeed from l1_address_params
- fw_version_addr is used, it is used for verifying eth_firmware. But it obviously falls into "constant for our hardware", at least for now.
**If I understood correctly all the comments:
- All barriers should be set by the client, DRAM_BARRIER_BASE, tensix_l1_barrier_base, eth_l1_barrier_base
- All other constants fall under "constant for our hardware", or at least into "hard to come by to a scenario where you'd want to change this"**
Regarding @joelsmithTT 's comment:
pass barrier addresses when they are used
Not sure this is possible, there are many entry points which internally use these addresses, not always transparent to the client.
I'll make the changes accordingly, thanks all for feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have Almeet review as well.
How do L1 barriers work now? I think we initiate barriers from our device, where do they write to?
@@ -36,7 +36,6 @@ HalCoreInfoType create_idle_eth_mem_map() { | |||
|
|||
mem_map_bases.resize(static_cast<std::size_t>(HalL1MemAddrType::COUNT)); | |||
mem_map_bases[static_cast<std::size_t>(HalL1MemAddrType::BASE)] = MEM_ETH_BASE; | |||
mem_map_bases[static_cast<std::size_t>(HalL1MemAddrType::BARRIER)] = MEM_L1_BARRIER; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using 0x16dfc0 in metal would cause problems, that could potentially overwrite tensor data
metal puts all its reserved addresses low in memory for tensix (eth is a little different due to the base firmware)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm to remove the set_*_params but I think UMD would need to accept the barrier address from Metal because UMD shouldn't enforce a particular L1 structure
@broskoTT what about a script that can read |
I can suggest two different proposals for how to deal with this.
|
I am in favour of option 2 |
c5f6ca8
to
c1c14a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks
### Issue Fixes #106 ### Description Part of this was done previously in #260 #261 and #234 ### List of the changes - Added default l1_address_params and dram_address_params - Remove set_device_l1_address_params, set_device_dram_address_params, set_driver_host_address_params, set_driver_eth_interface_params - Remove unnecessary code from tests ### Testing CI tests, but more importantly CI tests on the related tt_metal PR. ### API Changes This PR has API changes: - [x] tt_metal approved PR pointing to this branch: tenstorrent/tt-metal#15908 - [X] No breaking changes in tt_debuda
c1c14a9
to
71cb080
Compare
Ticket
Related to #13948
Problem description
Related to UMD change: tenstorrent/tt-umd#395
These parameters are defined by hardware.
The only parameters to be defined by the client are membars. This change adds set_barrier_address_params accordingly.
What's changed
Removed set_device_dram_address_params and set_device_l1_address_params, and related constants.
Replace with call to set_barrier_address_params.
Checklist