Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Infrastructure to exactly calculate L1 Memory Usage for Conv2D #15088 #15455

Merged
merged 10 commits into from
Jan 12, 2025

Conversation

sankarmanoj-tt
Copy link
Contributor

@sankarmanoj-tt sankarmanoj-tt commented Nov 26, 2024

Ticket

#15088

Problem description

Accurate ahead of time calculation of L1 usage is necessary for enabling lots of features and optimizations.

What's changed

Implemented code to calculate L1 usage, and cross-verify it with the actual L1 usage. The L1 usage of an op consists of the output tensor and the circular buffers.

The L1 usage for the output tensor can be obtained by checking the global L1 memory allocation stats before and after the op.
The L1 usage by Circular Buffers is calculated from the program object. A new method was added to the program object to be able to get the total memory usage by Circular Buffer before memory allocation.

Checklist

  • Post commit CI passes
  • New/Existing tests provide coverage for changes

@sankarmanoj-tt sankarmanoj-tt force-pushed the smanoj/estimate_l1_conv branch 2 times, most recently from 2b3a64d to d153dac Compare December 2, 2024 13:12
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Clang-Tidy found issue(s) with the introduced code (1/1)

tt_metal/impl/program/program.cpp Outdated Show resolved Hide resolved
@sankarmanoj-tt sankarmanoj-tt force-pushed the smanoj/estimate_l1_conv branch from d153dac to be2a301 Compare December 4, 2024 12:54
@sankarmanoj-tt sankarmanoj-tt changed the title Create Infrastructure to exactly calculate L1 Memory Usage for Width Sharded Conv2D #15088 Create Infrastructure to exactly calculate L1 Memory Usage for Conv2D #15088 Dec 4, 2024
@sankarmanoj-tt sankarmanoj-tt force-pushed the smanoj/estimate_l1_conv branch from be2a301 to 8d9ba30 Compare December 4, 2024 14:49
@sankarmanoj-tt sankarmanoj-tt marked this pull request as ready for review December 5, 2024 09:09
@sankarmanoj-tt sankarmanoj-tt force-pushed the smanoj/estimate_l1_conv branch 3 times, most recently from 277ae36 to 73a0fe8 Compare December 11, 2024 09:18
@sankarmanoj-tt sankarmanoj-tt force-pushed the smanoj/estimate_l1_conv branch from d7e027d to 41c2469 Compare December 16, 2024 13:48
Copy link
Contributor

@pavlejosipovic pavlejosipovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. (I was looking at the wrong diff)

Copy link
Contributor

@pavlejosipovic pavlejosipovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@pavlejosipovic pavlejosipovic self-requested a review December 31, 2024 12:01
ttnn/cpp/ttnn/operations/conv/conv2d/conv2d_utils.cpp Outdated Show resolved Hide resolved
uint32_t input_channels_alignment = 32;
bool deallocate_activation = false;
bool reallocate_halo_output = false;
uint32_t act_block_h_override = 0; // This argument is ignored when shard_layout == WIDTH_SHARDED.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this config is user facing, i think would be nicer to organize comments for each of the parameter -- something like:

/* this arg does that
 */
dtype arg = default_val;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it. Is this okay?

@sankarmanoj-tt sankarmanoj-tt force-pushed the smanoj/estimate_l1_conv branch from bbfb12e to 9d1b535 Compare January 6, 2025 10:14
}

conv_op_l1_usage conv2d::estimate_L1_usage(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exact calculation, right? Perhaps rename to calculate_L1_usage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I have renamed it.

device->get_memory_allocation_statistics(tt::tt_metal::BufferType::L1).total_allocated_bytes;
auto actual_cb_size = program_with_cbs.program.get_cb_memory_size();

auto [calc_output_size, calc_CB_size] = estimate_L1_usage(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this have e2e perf degradation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function itself just does a handful of calculations, and so should take very little time. It's invoked when the program object is created, in operation::launch_op. AFAIK, that should only be called the first time, and subsequent calls should hit the cache.

Copy link
Contributor

@mywoodstock mywoodstock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sankarmanoj-tt please wait merging until @pavlejosipovic also approves.

@sankarmanoj-tt sankarmanoj-tt force-pushed the smanoj/estimate_l1_conv branch from 42751fc to 4435b19 Compare January 8, 2025 15:44
@sankarmanoj-tt sankarmanoj-tt force-pushed the smanoj/estimate_l1_conv branch from 7c7ac78 to cd60c9f Compare January 9, 2025 11:26
Copy link
Member

@ayerofieiev-tt ayerofieiev-tt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unblocking. I am still not clear why existing tracing functionality can’t be leveraged, but I assume you did the research.

@sankarmanoj-tt sankarmanoj-tt force-pushed the smanoj/estimate_l1_conv branch from a01be9d to 93461c8 Compare January 12, 2025 04:50
@sankarmanoj-tt sankarmanoj-tt requested a review from a team as a code owner January 12, 2025 04:50
@sankarmanoj-tt sankarmanoj-tt merged commit 880f700 into main Jan 12, 2025
199 checks passed
@sankarmanoj-tt sankarmanoj-tt deleted the smanoj/estimate_l1_conv branch January 12, 2025 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants