#13397: Add data parallel suppport for SqueezeBERT model #13418

kkeerthana0573 · 2024-10-03T13:16:02Z

Ticket

Link to Github Issue

Problem description

The SqueezeBERT model is configured to run on either N150 or N300, depending on the available machine.

Checklist

Post commit CI passes
Blackhole Post commit (if applicable)
Model regression CI testing passes (if applicable)
Device performance regression CI testing passes (if applicable)
New/Existing tests provide coverage for changes

tt-rkim · 2024-11-18T16:40:20Z

Can you post passing links...

kkeerthana0573 · 2024-11-19T15:43:02Z

@tt-rkim,

Passing CI links :
All post-commit tests
(Single-card) Demo tests
(Single-card) Device perf regressions
(Single-card) Model perf tests
Nightly fast dispatch tests

tt-rkim · 2024-11-19T17:06:12Z

You posted the wrong device perf link.

I found it: https://github.com/tenstorrent/tt-metal/actions/runs/11908678446/job/33185055179

Please post the right link next time.

By the way, it seems to have failed on your model.

kkeerthana0573 · 2024-11-19T18:31:16Z

@tt-rkim,
I might have overlooked the links. I'll update the PR.
Thank you.

kkeerthana0573 · 2024-11-20T15:34:13Z

Passing CI links:
All post-commit tests
(Single-card) Demo tests
(Single-card) Device perf regressions

(Single-card) Model perf tests - In Progress
Nightly fast dispatch tests - Failed on GS, We're debugging this.

tt-rkim · 2024-11-20T16:23:46Z

I will approve to unblock since it's only one left, but please ensure ttnn nightly passes.

models/demos/wormhole/squeezebert/demo/demo.py

uaydonat · 2024-11-29T02:26:51Z

models/demos/wormhole/squeezebert/demo/demo.py

+
+                del tt_output
+            i += 1
+        eval_score = squad_metric.compute(predictions=pred_labels, references=true_labels)


it would be safer if the test fails if eval_score is lower than some threshold

The evaluation scores should ideally remain consistent with the batch size and number of iterations specified in the demo. However, they may or may not vary with changes in batch size. I’ve now added an assertion to validate the expected scores.

the check should be tighter. The purpose of this assert is to catch regressions if a bad commit goes in. For example, if I push a change to a kernel and the score drops from 98 to 97.9, this is a bug, and this test should catch it.

So, we want to assert if there are any changes in the eval score, maybe with a small margin.

models/demos/wormhole/squeezebert/tests/test_perf_device_squeezebert.py

uaydonat · 2024-11-29T02:28:27Z

models/demos/wormhole/squeezebert/tests/test_performance.py

+
+
+def get_expected_times(squeezebert):
+    return {ttnn_functional_squeezebert: (29.29, 15.5)}[squeezebert]


what's the current run time? How close is it to the expected times?

The current runtimes in the test file have been updated based on the average times observed during CI runs.
We’re uncertain about the target numbers. Is there any other metric, besides the average CI times, that we can use to determine the target numbers?

cc: @boris-drazic, @mbahnasTT.

You want the expected times to be close to the current time, but not so close that a small variation will cause test to fail. Maybe 10-20% margin is good.

models/demos/wormhole/squeezebert/tt/ttnn_functional_squeezebert.py

uaydonat · 2024-12-04T02:23:11Z

models/demos/wormhole/squeezebert/demo/demo.py

+        mesh_device=mesh_device,
+        use_program_cache=use_program_cache,
+        model_name=model_name,
+        batch_size=8,


this should call with correct batch size

uaydonat · 2024-12-04T02:23:26Z

models/demos/wormhole/squeezebert/demo/demo.py

+        mesh_device=mesh_device,
+        use_program_cache=use_program_cache,
+        model_name=model_name,
+        batch_size=8,


this should call with correct batch size

uaydonat · 2024-12-04T02:23:53Z

models/demos/wormhole/squeezebert/demo/demo.py

+
+    profiler.start(f"preprocessing_parameter")
+    mesh_device_flag = is_wormhole_b0() and ttnn.GetNumAvailableDevices() == 2
+    batch_size = batch_size * 2 if mesh_device_flag else batch_size


do not overwrite the batch_size that the caller gives

uaydonat · 2024-12-04T02:24:04Z

models/demos/wormhole/squeezebert/demo/demo.py

+    tt_model_name = f"ttnn_{model_name}_optimized"
+
+    mesh_device_flag = is_wormhole_b0() and ttnn.GetNumAvailableDevices() == 2
+    batch_size = batch_size * 2 if mesh_device_flag else batch_size


do not overwrite the batch_size that the caller gives

uaydonat · 2024-12-04T02:30:44Z

models/demos/wormhole/squeezebert/demo/demo.py

+
+                del tt_output
+            i += 1
+        eval_score = squad_metric.compute(predictions=pred_labels, references=true_labels)


the check should be tighter. The purpose of this assert is to catch regressions if a bad commit goes in. For example, if I push a change to a kernel and the score drops from 98 to 97.9, this is a bug, and this test should catch it.

So, we want to assert if there are any changes in the eval score, maybe with a small margin.

uaydonat · 2024-12-04T02:32:22Z

models/demos/wormhole/squeezebert/tests/test_performance.py

+
+
+def get_expected_times(squeezebert):
+    return {ttnn_functional_squeezebert: (29.29, 15.5)}[squeezebert]


You want the expected times to be close to the current time, but not so close that a small variation will cause test to fail. Maybe 10-20% margin is good.

kkeerthana0573 requested review from Sudharsan-V and saichandax October 3, 2024 13:16

kkeerthana0573 requested review from uaydonat, eyonland, ayerofieiev-tt, dmakoviichuk-tt, rfurko-tt, cfjchu, TT-BrianLiu, razorback3, dongjin-na and tt-rkim as code owners October 3, 2024 13:16

kkeerthana0573 temporarily deployed to dev October 3, 2024 13:26 — with GitHub Actions Inactive

kkeerthana0573 temporarily deployed to dev October 3, 2024 13:37 — with GitHub Actions Inactive

kkeerthana0573 force-pushed the keerthana/functional_squeezebert_dataparallel branch from e698cc6 to e041c8a Compare October 4, 2024 03:35

kkeerthana0573 temporarily deployed to dev October 4, 2024 03:35 — with GitHub Actions Inactive

kkeerthana0573 temporarily deployed to dev October 4, 2024 03:36 — with GitHub Actions Inactive

kkeerthana0573 force-pushed the keerthana/functional_squeezebert_dataparallel branch from e02f5ea to 7842fe6 Compare November 19, 2024 08:03

kkeerthana0573 force-pushed the keerthana/functional_squeezebert_dataparallel branch from 7842fe6 to 5189ca0 Compare November 20, 2024 07:55

tt-rkim approved these changes Nov 20, 2024

View reviewed changes

kkeerthana0573 force-pushed the keerthana/functional_squeezebert_dataparallel branch 2 times, most recently from 2f5a5b4 to 010fa8e Compare November 22, 2024 07:13

uaydonat requested changes Nov 29, 2024

View reviewed changes

kkeerthana0573 force-pushed the keerthana/functional_squeezebert_dataparallel branch 3 times, most recently from 436aa1d to b25513c Compare November 29, 2024 11:22

kkeerthana0573 mentioned this pull request Nov 29, 2024

#9208: Functional SqueezeBERT model Demo #9371

Merged

5 tasks

saichandax requested a review from uaydonat December 3, 2024 04:56

kkeerthana0573 force-pushed the keerthana/functional_squeezebert_dataparallel branch from b25513c to 8ddc243 Compare December 3, 2024 10:40

kkeerthana0573 requested a review from a team as a code owner December 3, 2024 10:40

kkeerthana0573 force-pushed the keerthana/functional_squeezebert_dataparallel branch from 8ddc243 to 080e26d Compare December 3, 2024 12:58

uaydonat requested changes Dec 4, 2024

View reviewed changes

kkeerthana0573 force-pushed the keerthana/functional_squeezebert_dataparallel branch 3 times, most recently from 5dd5731 to 9d46e12 Compare December 9, 2024 14:54

kkeerthana0573 requested a review from bbradelTT as a code owner December 9, 2024 14:54

#13397: Add data parallel suppport for SqueezeBERT model

6a66479

kkeerthana0573 force-pushed the keerthana/functional_squeezebert_dataparallel branch from 9d46e12 to 6a66479 Compare December 9, 2024 15:51

tt-rkim approved these changes Dec 9, 2024

View reviewed changes

kkeerthana0573 requested a review from uaydonat December 10, 2024 01:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#13397: Add data parallel suppport for SqueezeBERT model #13418

#13397: Add data parallel suppport for SqueezeBERT model #13418

kkeerthana0573 commented Oct 3, 2024 •

edited

Loading

tt-rkim commented Nov 18, 2024

kkeerthana0573 commented Nov 19, 2024

tt-rkim commented Nov 19, 2024

kkeerthana0573 commented Nov 19, 2024

kkeerthana0573 commented Nov 20, 2024

tt-rkim commented Nov 20, 2024

uaydonat Nov 29, 2024

kkeerthana0573 Nov 29, 2024

uaydonat Dec 4, 2024

uaydonat Nov 29, 2024

kkeerthana0573 Nov 29, 2024 •

edited

Loading

uaydonat Dec 4, 2024

uaydonat Dec 4, 2024

uaydonat Dec 4, 2024

uaydonat Dec 4, 2024

uaydonat Dec 4, 2024

uaydonat Dec 4, 2024

uaydonat Dec 4, 2024



		def get_expected_times(squeezebert):
		return {ttnn_functional_squeezebert: (29.29, 15.5)}[squeezebert]

#13397: Add data parallel suppport for SqueezeBERT model #13418

Are you sure you want to change the base?

#13397: Add data parallel suppport for SqueezeBERT model #13418

Conversation

kkeerthana0573 commented Oct 3, 2024 • edited Loading

Ticket

Problem description

Checklist

tt-rkim commented Nov 18, 2024

kkeerthana0573 commented Nov 19, 2024

tt-rkim commented Nov 19, 2024

kkeerthana0573 commented Nov 19, 2024

kkeerthana0573 commented Nov 20, 2024

tt-rkim commented Nov 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kkeerthana0573 Nov 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kkeerthana0573 commented Oct 3, 2024 •

edited

Loading

kkeerthana0573 Nov 29, 2024 •

edited

Loading