-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pack kernel binary memory spans into one #12977
Conversation
d67cc7c
to
01f59c8
Compare
@@ -191,15 +191,15 @@ int main(int argc, char **argv) { | |||
JitBuildProcessorType::DATA_MOVEMENT, | |||
0, | |||
get_latest_kernel_binary_path(mask, riscv0_kernel)); | |||
ll_api::memory brisc_binary = llrt::get_risc_binary(brisc_hex_path); | |||
ll_api::memory brisc_binary = llrt::get_risc_binary(brisc_hex_path, 0, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from api standpoint I recommend to replace bool arg to an enum, as it makes semantics of this true
clear.
https://github.com/tenstorrent/tt-metal/blob/main/best_practices.md#14-avoid-bool-arguments-in-apis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integer constant should be an enum too, AFAICT it's an 'id'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change the bool. The riscv ID is entrenched throughtout the code base w/ many different definitions. It will get cleaned up when we start using the HAL to abstract this away.
01f59c8
to
4c006aa
Compare
#define MEM_IERISC_MAILBOX_SIZE 3072 | ||
#define MEM_IERISC_MAILBOX_END (MEM_IERISC_MAILBOX_BASE + MEM_IERISC_MAILBOX_SIZE) | ||
// TODO: this should be tightly packed against MAILBOX_END but that causes | ||
// problems w/ back to back runs, leaving the padding for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding TODO comment @ line 104, 109.
That is because there is data in 4K block @ 0x1000 that syseng tools read back when creating ethernet map.
idle erisc code has to start at 0x2000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, is that for Blackhole? Then I am not sure. Maybe syseng has the same assumptions/requirements for BH.
What I said above is for WH.
#define MEM_IERISC_MAILBOX_SIZE 3072 | ||
#define MEM_IERISC_MAILBOX_END (MEM_IERISC_MAILBOX_BASE + MEM_IERISC_MAILBOX_SIZE) | ||
// TODO: this should be tightly packed against MAILBOX_END but that causes | ||
// problems w/ back to back runs, leaving the padding for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments above in blackhole/dev_mem_map.h.
@@ -307,6 +328,7 @@ void wait_until_cores_done( | |||
// Print not-done cores | |||
if (loop_count % 1000 == 0) { | |||
log_debug(tt::LogMetal, "Not done phys cores: {}", fmt::join(not_done_phys_cores, " ")); | |||
usleep(100000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this sleep intended?
NOTE: slow dispatch doesn't check for kernel text+data overflowing the firmware hole size. This will be fixed w/ the ring buffer Split linker script builds into 2, fw and kernel Bump firmware hole size up by local sizes, kernel data now goes here Create scratch area for firmware locals, used only during init Check fw_size+kernel_size+kernel_local_size against max at runtime since now we could overflow after compile succeeds Use symbol for kernel data address, const for firmware data address Pack data span into text span
Some bad values just happened to work, sizes now line up
262b000
to
abaaf86
Compare
Ticket
#11789
Problem description
Kernel binaries are parsed into multiple memory address ranges (spans) and sent to the device individually which is inefficient. Also, the data section ends up at a magic address which is used prior to copying the data to local memory.
The kernel ring buffer needs the kernel in one blob in the ring buffer, not at multiple locations. This PR merges the spans into one in prep for the ring buffer.
What's changed
NOTE: slow dispatch doesn't check for kernel text+data overflowing
the firmware hole size. This will be fixed w/ the ring buffer
Split linker script builds into 2, fw and kernel
Bump firmware hole size up by local sizes, kernel data now goes here
Create scratch area for firmware locals, used only during init
Check fw_size+kernel_size+kernel_local_size against max at runtime since now we
could overflow after compile succeeds (only done for FD, will fix w/ ring buffer for SD)
Use symbol for kernel data address, const for firmware data address
Pack data span into text span
pgm_dispatch perf gains of 5-30%
Checklist