Re-land: Make `as_strided_copy` materialize a new tensor with `index`. #6697

ysiraichi · 2024-03-08T14:24:20Z

Re-land: #6624

This PR adds a fast path on top of #6624 changes.

Fast path: keep old behavior of as_strided_copy

Check that the size and strides specify a non-overlapping and dense tensor

Slow path: new behavior

Slower due to CPU dispatch and computation
Should work with any argument combination

cc @miladm @JackCaoG @lsy323

ysiraichi · 2024-03-08T14:26:03Z

I will test for the regression described here on the GPU machine I have access to.

JackCaoG · 2024-03-08T18:06:16Z

dynamo issue can be fixed by rebasing, fine to ignore.

ysiraichi · 2024-03-08T19:59:08Z

@lsy323 Could you help me checking if the regression is gone?

JackCaoG · 2024-03-08T22:28:25Z

Do we need this pr in the 2.3 release? It is a rather dangerous change, if we don;t have a strong reason I'd rather leave it in nightly for now.

miladm · 2024-03-11T17:37:45Z

@vanbasten23 can you please help @ysiraichi benchmark this fix on TPU and confirm perf outcome?

@JackCaoG given the risk, I'd be ok we leave this PR out for 2.3

JackCaoG · 2024-03-11T18:41:03Z

yea, unless there is a strong reason I would prefer to leave this out of 2.3 releas.

JackCaoG · 2024-03-18T23:01:35Z

Do we have bandwidth to test this one? Otherwise we can merge and see if DDP test started to fail tmr....

vanbasten23 · 2024-03-19T03:25:05Z

Do we have bandwidth to test this one? Otherwise we can merge and see if DDP test started to fail tmr....

I'm running the tests in #6624 (comment).

vanbasten23 · 2024-03-19T04:16:09Z

@ysiraichi sorry for the delayed response. I tested on my v3-8. Before this PR (master branch 6ac3223):

root@67df528db184:/ansible# PJRT_DEVICE=TPU python pytorch/xla/test/test_train_mp_imagenet.py --model=resnet50 --log_steps=200 --ddp --pjrt_distributed --fake_data --batch_size=256
Epoch 1 train begin 03:32:23
| Training Device=xla:0/2 Epoch=1 Step=0 Loss=6.89620 Rate=0.00 GlobalRate=0.00 Time=03:33:05
| Training Device=xla:1/5 Epoch=1 Step=0 Loss=6.89620 Rate=0.00 GlobalRate=0.00 Time=03:33:05
| Training Device=xla:0/0 Epoch=1 Step=0 Loss=6.89620 Rate=0.00 GlobalRate=0.00 Time=03:33:05
| Training Device=xla:1/7 Epoch=1 Step=0 Loss=6.89620 Rate=0.00 GlobalRate=0.00 Time=03:33:05
| Training Device=xla:0/4 Epoch=1 Step=0 Loss=6.89620 Rate=0.00 GlobalRate=0.00 Time=03:33:05
| Training Device=xla:0/6 Epoch=1 Step=0 Loss=6.89620 Rate=0.00 GlobalRate=0.00 Time=03:33:05
| Training Device=xla:1/1 Epoch=1 Step=0 Loss=6.89620 Rate=0.00 GlobalRate=0.00 Time=03:33:05
| Training Device=xla:1/3 Epoch=1 Step=0 Loss=6.89620 Rate=0.00 GlobalRate=0.00 Time=03:33:05
| Training Device=xla:1/3 Epoch=1 Step=200 Loss=0.05069 Rate=0.00 GlobalRate=0.00 Time=03:36:09
| Training Device=xla:1/1 Epoch=1 Step=200 Loss=0.05069 Rate=0.00 GlobalRate=0.00 Time=03:36:09
| Training Device=xla:1/7 Epoch=1 Step=200 Loss=0.05069 Rate=0.00 GlobalRate=0.00 Time=03:36:09
| Training Device=xla:0/2 Epoch=1 Step=200 Loss=0.05069 Rate=0.00 GlobalRate=0.00 Time=03:36:09
| Training Device=xla:1/5 Epoch=1 Step=200 Loss=0.05069 Rate=0.00 GlobalRate=0.00 Time=03:36:09
| Training Device=xla:0/6 Epoch=1 Step=200 Loss=0.05069 Rate=0.00 GlobalRate=0.00 Time=03:36:09
| Training Device=xla:0/0 Epoch=1 Step=200 Loss=0.05069 Rate=0.00 GlobalRate=0.00 Time=03:36:09
| Training Device=xla:0/4 Epoch=1 Step=200 Loss=0.05069 Rate=0.00 GlobalRate=0.00 Time=03:36:09
| Training Device=xla:1/3 Epoch=1 Step=400 Loss=0.01512 Rate=0.00 GlobalRate=0.00 Time=03:37:42
| Training Device=xla:1/1 Epoch=1 Step=400 Loss=0.01512 Rate=0.00 GlobalRate=0.00 Time=03:37:42
| Training Device=xla:0/2 Epoch=1 Step=400 Loss=0.01512 Rate=0.00 GlobalRate=0.00 Time=03:37:42
| Training Device=xla:0/6 Epoch=1 Step=400 Loss=0.01512 Rate=0.00 GlobalRate=0.00 Time=03:37:42
| Training Device=xla:0/0 Epoch=1 Step=400 Loss=0.01512 Rate=0.00 GlobalRate=0.00 Time=03:37:42
| Training Device=xla:1/5 Epoch=1 Step=400 Loss=0.01512 Rate=0.00 GlobalRate=0.00 Time=03:37:42
| Training Device=xla:1/7 Epoch=1 Step=400 Loss=0.01512 Rate=0.00 GlobalRate=0.00 Time=03:37:42
| Training Device=xla:0/4 Epoch=1 Step=400 Loss=0.01512 Rate=0.00 GlobalRate=0.00 Time=03:37:42

With the PR:

root@67df528db184:/ansible# PJRT_DEVICE=TPU python pytorch/xla/test/test_train_mp_imagenet.py --model=resnet50 --log_steps=200 --ddp --pjrt_distributed --fake_data --batch_size=256
| Training Device=xla:0/4 Epoch=1 Step=0 Loss=6.89620 Rate=0.00 GlobalRate=0.00 Time=04:07:07
| Training Device=xla:1/7 Epoch=1 Step=0 Loss=6.89620 Rate=0.00 GlobalRate=0.00 Time=04:07:07
Epoch 1 train begin 04:06:25
| Training Device=xla:0/0 Epoch=1 Step=0 Loss=6.89620 Rate=0.00 GlobalRate=0.00 Time=04:07:07
| Training Device=xla:0/2 Epoch=1 Step=0 Loss=6.89620 Rate=0.00 GlobalRate=0.00 Time=04:07:07
| Training Device=xla:1/3 Epoch=1 Step=0 Loss=6.89620 Rate=0.00 GlobalRate=0.00 Time=04:07:07
| Training Device=xla:1/5 Epoch=1 Step=0 Loss=6.89620 Rate=0.00 GlobalRate=0.00 Time=04:07:07
| Training Device=xla:1/1 Epoch=1 Step=0 Loss=6.89620 Rate=0.00 GlobalRate=0.00 Time=04:07:07
| Training Device=xla:0/6 Epoch=1 Step=0 Loss=6.89620 Rate=0.00 GlobalRate=0.00 Time=04:07:07
| Training Device=xla:1/7 Epoch=1 Step=200 Loss=0.05069 Rate=0.00 GlobalRate=0.00 Time=04:09:56
| Training Device=xla:0/2 Epoch=1 Step=200 Loss=0.05069 Rate=0.00 GlobalRate=0.00 Time=04:09:56
| Training Device=xla:1/1 Epoch=1 Step=200 Loss=0.05069 Rate=0.00 GlobalRate=0.00 Time=04:09:56
| Training Device=xla:1/5 Epoch=1 Step=200 Loss=0.05069 Rate=0.00 GlobalRate=0.00 Time=04:09:56
| Training Device=xla:0/0 Epoch=1 Step=200 Loss=0.05069 Rate=0.00 GlobalRate=0.00 Time=04:09:56
| Training Device=xla:1/3 Epoch=1 Step=200 Loss=0.05069 Rate=0.00 GlobalRate=0.00 Time=04:09:56
| Training Device=xla:0/4 Epoch=1 Step=200 Loss=0.05069 Rate=0.00 GlobalRate=0.00 Time=04:09:56
| Training Device=xla:0/6 Epoch=1 Step=200 Loss=0.05069 Rate=0.00 GlobalRate=0.00 Time=04:09:56
| Training Device=xla:0/2 Epoch=1 Step=400 Loss=0.01512 Rate=0.00 GlobalRate=0.00 Time=04:11:29
| Training Device=xla:0/6 Epoch=1 Step=400 Loss=0.01512 Rate=0.00 GlobalRate=0.00 Time=04:11:29
| Training Device=xla:0/4 Epoch=1 Step=400 Loss=0.01512 Rate=0.00 GlobalRate=0.00 Time=04:11:29
| Training Device=xla:1/1 Epoch=1 Step=400 Loss=0.01512 Rate=0.00 GlobalRate=0.00 Time=04:11:29
| Training Device=xla:1/3 Epoch=1 Step=400 Loss=0.01512 Rate=0.00 GlobalRate=0.00 Time=04:11:29
| Training Device=xla:1/7 Epoch=1 Step=400 Loss=0.01512 Rate=0.00 GlobalRate=0.00 Time=04:11:29
| Training Device=xla:0/0 Epoch=1 Step=400 Loss=0.01512 Rate=0.00 GlobalRate=0.00 Time=04:11:29
| Training Device=xla:1/5 Epoch=1 Step=400 Loss=0.01512 Rate=0.00 GlobalRate=0.00 Time=04:11:29

I don't see any slowdown. The change lgtm. Thanks Yukio.

ysiraichi · 2024-03-19T13:34:55Z

Thanks, @vanbasten23.

ysiraichi added the xla:gpu label Mar 8, 2024

ysiraichi requested review from lsy323 and JackCaoG March 8, 2024 14:24

ysiraichi added 3 commits March 8, 2024 16:00

Old PR contents.

1171ca9

Add fast path.

c63a081

Fix lint issue.

1a204f7

ysiraichi force-pushed the ysiraichi/fix-asstrided branch from 84ec60b to 1a204f7 Compare March 8, 2024 19:00

JackCaoG approved these changes Mar 8, 2024

View reviewed changes

ysiraichi mentioned this pull request Mar 11, 2024

Failing Torchbench Models: tracking issue #5932

Open

vanbasten23 approved these changes Mar 19, 2024

View reviewed changes

ysiraichi merged commit 27a7dd3 into master Mar 19, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-land: Make `as_strided_copy` materialize a new tensor with `index`. #6697

Re-land: Make `as_strided_copy` materialize a new tensor with `index`. #6697

ysiraichi commented Mar 8, 2024

ysiraichi commented Mar 8, 2024

JackCaoG commented Mar 8, 2024

ysiraichi commented Mar 8, 2024

JackCaoG commented Mar 8, 2024

miladm commented Mar 11, 2024

JackCaoG commented Mar 11, 2024

JackCaoG commented Mar 18, 2024

vanbasten23 commented Mar 19, 2024

vanbasten23 commented Mar 19, 2024

ysiraichi commented Mar 19, 2024

Re-land: Make as_strided_copy materialize a new tensor with index. #6697

Re-land: Make as_strided_copy materialize a new tensor with index. #6697

Conversation

ysiraichi commented Mar 8, 2024

ysiraichi commented Mar 8, 2024

JackCaoG commented Mar 8, 2024

ysiraichi commented Mar 8, 2024

JackCaoG commented Mar 8, 2024

miladm commented Mar 11, 2024

JackCaoG commented Mar 11, 2024

JackCaoG commented Mar 18, 2024

vanbasten23 commented Mar 19, 2024

vanbasten23 commented Mar 19, 2024

ysiraichi commented Mar 19, 2024

Re-land: Make `as_strided_copy` materialize a new tensor with `index`. #6697

Re-land: Make `as_strided_copy` materialize a new tensor with `index`. #6697