Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#13944: Redesign memory packing API #15980

Merged
merged 1 commit into from
Dec 13, 2024
Merged

#13944: Redesign memory packing API #15980

merged 1 commit into from
Dec 13, 2024

Conversation

nathan-TT
Copy link
Contributor

Ticket

#13944

Problem description

Packing ELF segments together for deployment relied on HAL information. That information is present in the ELF itself and no HAL involvement is necessary.

In developing this patch I discovered a bug that was mitigated by a different bug and obsolete code predating the use of ELF segments. Specifically for erisc kernel deployments packing was requested, but text memory range was incorrect. This resulted in the data segment being considered text, and the obsolete code inserted padding between the real text and the data, resulting in a deployable image, but one that was larger than necessary and would clobber any memory objects placed in that padding area.

What's changed

*) Linker scripts augmented to record the Load Memory Address(LMA) of data segments -- this is used to ensure packing is consistent
*) ELF loader augmented to record the LMA. (Other cleanups also implemented)
*) Replacement of the separate PackSpans and Relocate enums with a single Loading enum offering {DISCRETE, CONTIGUOUS, CONTIGUOUS_XIP} alternatives. (Removing an unrequired DISCRETE_XIP variant.). While the original enums are orthogonal, their use is not -- and the above mentioned bug occurred by not paying attention to how they are related.
*) Removal of core_type_idx, processor_class_idx, processor_type_idx arguments to get_risc_binary
*) Removal of Memory::pack_data_into_text and direct implementation of that functionality during the conversion from ELF. (This routine contained the obsolete workaround and was given incorrect text and data base addresses for wormhole_b0 N300)

Checklist

  • [YES] Post commit CI passes
  • [YES] Blackhole Post commit (if applicable)
  • Model regression CI testing passes (if applicable)
  • Device performance regression CI testing passes (if applicable)
  • (For models and ops writers) Full new models tests passes
  • New/Existing tests provide coverage for changes

Copy link
Contributor

@pgkeller pgkeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! A bunch of mess gone

@nathan-TT nathan-TT merged commit 434bd8e into main Dec 13, 2024
120 checks passed
@nathan-TT nathan-TT deleted the nsidwell/span-main branch December 13, 2024 15:55
SeanNijjar added a commit that referenced this pull request Dec 17, 2024
This reverts commit 434bd8e.

> Conflicts:
>	tests/tt_metal/tt_metal/test_compile_sets_kernel_binaries.cpp
nathan-TT added a commit that referenced this pull request Dec 17, 2024
nathan-TT added a commit that referenced this pull request Dec 18, 2024
### Ticket
n/A

### Problem description
This assert was added when kernel data packing was implemented (data
load address immediately after text). But that only worked for
(non-idle) erisc kernels by accident due to another bug an obsolete
workaround. I fixed this with
```
* 434bd8e 2024-12-13 | #13944: Redesign memory packing API (#15980)
```

By not packing such erisc kernels. This worked in production builds
because asserts are disabled, so I didn't fall over this problem.

### What's changed
Remove asserts, update comments to reflect reality.
working on updating non-idle erisc to allow packing
(perhaps CI optimized builds should enable asserts, Remember
CMAKE_BUILD_TYPE=RelWithDebInfo doesn't do that)

### Checklist
- [Yes] Post commit CI passes
- [ ] Blackhole Post commit (if applicable)
- [ ] Model regression CI testing passes (if applicable)
- [ ] Device performance regression CI testing passes (if applicable)
- [ ] New/Existing tests provide coverage for changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants