Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#14257: few more optimization for yolo #15582

Merged
merged 2 commits into from
Dec 3, 2024
Merged

#14257: few more optimization for yolo #15582

merged 2 commits into from
Dec 3, 2024

Conversation

shwetankTT
Copy link
Contributor

@shwetankTT shwetankTT commented Dec 1, 2024

Ticket

#14257:

Problem description

YoloV4 Optimization.
Improvement:--> Getting 200fps from 170fps.

What's changed

Adding conv optimization flag for few layers. Removed few sharding which are not needed.

Checklist

  • Post commit CI passes
  • Blackhole Post commit (if applicable)
  • Model regression CI testing passes (if applicable)
  • Device performance regression CI testing passes (if applicable)
  • New/Existing tests provide coverage for changes

@shwetankTT
Copy link
Contributor Author

@shwetankTT shwetankTT marked this pull request as ready for review December 2, 2024 07:40
Copy link
Contributor

@pavlejosipovic pavlejosipovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any perf impact of this change?
Does trace size needs to be adjusted?

@shwetankTT
Copy link
Contributor Author

Is there any perf impact of this change? Does trace size needs to be adjusted?

Yes. The current perf is 200fps from 170. I ran the nightly CI no need for trace size adjastment.

@pavlejosipovic
Copy link
Contributor

Is there any perf impact of this change? Does trace size needs to be adjusted?

Yes. The current perf is 200fps from 170. I ran the nightly CI no need for trace size adjastment.

We don't have a perf CI in place for yolo ATM?

@shwetankTT
Copy link
Contributor Author

Is there any perf impact of this change? Does trace size needs to be adjusted?

Yes. The current perf is 200fps from 170. I ran the nightly CI no need for trace size adjastment.

We don't have a perf CI in place for yolo ATM?

Yeah we don't hv perf CI for yolo. I am getting the numbers from IRD machines(1st sheet with final name.).
https://docs.google.com/spreadsheets/d/1MRNswbQJeF-v9vzA_ajSRlBRHFTY8dAdDlnN6FiMRHQ/edit?gid=174224023#gid=174224023

@pavlejosipovic
Copy link
Contributor

Is there any perf impact of this change? Does trace size needs to be adjusted?

Yes. The current perf is 200fps from 170. I ran the nightly CI no need for trace size adjastment.

We don't have a perf CI in place for yolo ATM?

Yeah we don't hv perf CI for yolo. I am getting the numbers from IRD machines(1st sheet with final name.). https://docs.google.com/spreadsheets/d/1MRNswbQJeF-v9vzA_ajSRlBRHFTY8dAdDlnN6FiMRHQ/edit?gid=174224023#gid=174224023

Ok than, document the perf diff in PR description/commit message.
Btw I would prioritise adding perf CI before doing any perf optimisations.

@shwetankTT
Copy link
Contributor Author

To calculate the FPS we take the sum of device kernel duration(ns) Column divided by 10^9 (sec --> nano sec conversion).

@tt-rkim
Copy link
Collaborator

tt-rkim commented Dec 2, 2024

Please run:

  • Nightly fast dispatch

@shwetankTT
Copy link
Contributor Author

Please run:

  • Nightly fast dispatch

https://github.com/tenstorrent/tt-metal/actions/runs/12103838737

@mywoodstock
Copy link
Contributor

@shwetankTT Lets get this on the perf CI so that this does not regress.

@shwetankTT
Copy link
Contributor Author

shwetankTT commented Dec 2, 2024

@mywoodstock Yeah. Added the perf on CI. Validating it.

@shwetankTT shwetankTT merged commit 6a524ab into main Dec 3, 2024
168 of 176 checks passed
@shwetankTT shwetankTT deleted the shwetankTT/yolo_2 branch December 3, 2024 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants