Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#13609: Uplift dram and l1 allocators to use dram/l1 specific alignment #13762

Open
wants to merge 43 commits into
base: main
Choose a base branch
from

Conversation

abhullar-tt
Copy link
Contributor

@abhullar-tt abhullar-tt commented Oct 11, 2024

Ticket

#13609

Problem description

Using the max of DRAM and L1 alignment for both DRAM and L1 buffers was causing pcc mismatches in i2s and s2i.

What's changed

Use L1/DRAM specific alignment for respective allocations. This will require some ops to be uplifted to handle re-alignment
@yugaoTT and @ntarafdar to add corresponding op changes

Checklist

Below post commits were triggered 12/03

@abhullar-tt abhullar-tt linked an issue Oct 11, 2024 that may be closed by this pull request
@abhullar-tt
Copy link
Contributor Author

@llongTT could you re-run post-commit, BH post commit, Model tests, and ttnn sweeps and update the links in the PR description

@yugaoTT
Copy link
Contributor

yugaoTT commented Jan 6, 2025

Please also run the device perf and model pipelines, there are CNN models using group norm so need to make sure those are passing

@abhullar-tt
Copy link
Contributor Author

@cmaryanTT we may need to announce when this change goes in because using l1/dram specific alignment for allocating buffers could expose noc violations when doing misaligned dram -> l1 reads. We could explain how ops are handling this

@llongTT
Copy link
Contributor

llongTT commented Jan 6, 2025

@llongTT could you re-run post-commit, BH post commit, Model tests, and ttnn sweeps and update the links in the PR description

Have them running now.

@llongTT
Copy link
Contributor

llongTT commented Jan 6, 2025

Please also run the device perf and model pipelines, there are CNN models using group norm so need to make sure those are passing

Have them running now.

@cmaryanTT
Copy link

@cmaryanTT we may need to announce when this change goes in because using l1/dram specific alignment for allocating buffers could expose noc violations when doing misaligned dram -> l1 reads. We could explain how ops are handling this

Noted. @llongTT - please announce on #tt-metal-developers with a brief description of the change and the risks.

Also, @llongTT will be presenting this in the learning forum in a couple of weeks.

@llongTT llongTT self-assigned this Jan 6, 2025
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Clang-Tidy found issue(s) with the introduced code (1/1)

@@ -449,7 +449,7 @@ DeviceAddr Buffer::bank_local_page_address(uint32_t bank_id, uint32_t page_index
}

uint32_t Buffer::alignment() const {
return this->allocator_->config.alignment;
return this->device_->get_allocator_alignment(this->buffer_type_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ clang-diagnostic-error ⚠️
no matching member function for call to get_allocator_alignment

@@ -1406,14 +1407,14 @@ allocator::Statistics Device::get_memory_allocation_statistics(const BufferType
return allocator::get_statistics(*allocator, buffer_type);
}

uint32_t Device::get_allocator_alignment() const {
uint32_t Device::get_allocator_alignment(const BufferType &buffer_type) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ clang-diagnostic-error ⚠️
out-of-line definition of get_allocator_alignment does not match any declaration in tt::tt_metal::Device

@@ -129,8 +129,8 @@ class Device : public IDevice {
allocator::Statistics get_memory_allocation_statistics(const BufferType &buffer_type) const override;
allocator::Statistics get_memory_allocation_statistics(const BufferType &buffer_type, SubDeviceId sub_device_id) const override;

uint32_t get_allocator_alignment() const override;
uint32_t get_allocator_alignment(SubDeviceId sub_device_id) const override;
uint32_t get_allocator_alignment(const BufferType &buffer_type) const override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ clang-diagnostic-error ⚠️
non-virtual member function marked override hides virtual member functions

uint32_t get_allocator_alignment() const override;
uint32_t get_allocator_alignment(SubDeviceId sub_device_id) const override;
uint32_t get_allocator_alignment(const BufferType &buffer_type) const override;
uint32_t get_allocator_alignment(const BufferType &buffer_type, SubDeviceId sub_device_id) const override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ clang-diagnostic-error ⚠️
non-virtual member function marked override hides virtual member functions

@@ -19,7 +19,7 @@ uint64_t get_alloc_limit(const tt::tt_metal::IDevice* device) {
auto dispatch_core_config = dispatch_core_manager::instance().get_dispatch_core_config(device->id());
auto storage_core_bank_size =
tt::get_storage_core_bank_size(device->id(), device->num_hw_cqs(), dispatch_core_config);
const uint32_t allocator_alignment = device->get_allocator_alignment();
const uint32_t allocator_alignment = device->get_allocator_alignment(BufferType::L1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ clang-diagnostic-error ⚠️
no matching member function for call to get_allocator_alignment

@@ -788,7 +788,7 @@ void detail::Program_::allocate_circular_buffers(const IDevice* device) {
}
}
}
computed_addr = align(computed_addr, device->get_allocator_alignment());
computed_addr = align(computed_addr, device->get_allocator_alignment(BufferType::DRAM));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ clang-diagnostic-error ⚠️
no matching member function for call to get_allocator_alignment

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Clang-Tidy found issue(s) with the introduced code (1/1)

@@ -676,7 +676,6 @@
// face-lines from C tiles to form a single tile, we can load a single tile and then write out its face-lines to C
// tiles
uint32_t alignment = dst_buffer->buffer_type() == tt::tt_metal::BufferType::DRAM ? hal::get_dram_alignment()
: hal::get_l1_alignment();
bool misaligned = alignment > sub_tile_line_bytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ clang-diagnostic-error ⚠️
expected ( for function-style cast or type construction

llongTT and others added 2 commits January 10, 2025 13:33
…ose_program_factory.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Clang-Tidy found issue(s) with the introduced code (1/1)

@@ -675,8 +675,7 @@ operation::ProgramWithCallbacks transpose_hc_multi_core(
// TODO: noc_async_write only require 16B alignment for both DRAM and L1 for Blackhole, so instead of reading in
// face-lines from C tiles to form a single tile, we can load a single tile and then write out its face-lines to C
// tiles
uint32_t alignment = dst_buffer->buffer_type() == tt::tt_metal::BufferType::DRAM ? hal::get_dram_alignment()
: hal::get_l1_alignment();
uint32_t alignment = dst_buffer->buffer_type() == tt::tt_metal::BufferType::DRAM ? hal::get_dram_alignment();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ clang-diagnostic-error ⚠️
expected :

Suggested change
uint32_t alignment = dst_buffer->buffer_type() == tt::tt_metal::BufferType::DRAM ? hal::get_dram_alignment();
uint32_t alignment = dst_buffer->buffer_type() == tt::tt_metal::BufferType::DRAM ? hal::get_dram_alignment(): ;

@@ -675,8 +675,7 @@
// TODO: noc_async_write only require 16B alignment for both DRAM and L1 for Blackhole, so instead of reading in
// face-lines from C tiles to form a single tile, we can load a single tile and then write out its face-lines to C
// tiles
uint32_t alignment = dst_buffer->buffer_type() == tt::tt_metal::BufferType::DRAM ? hal::get_dram_alignment()
: hal::get_l1_alignment();
uint32_t alignment = dst_buffer->buffer_type() == tt::tt_metal::BufferType::DRAM ? hal::get_dram_alignment();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ clang-diagnostic-error ⚠️
expected expression

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allocator uses 32B alignment for both DRAM and L1
9 participants