How to use it with Multi GPU #1

Hesene · 2019-08-10T03:40:24Z

Thank you for your sharing!!! when I run with single GPU，it runs well, but when I run with multi GPU, it occur error
RuntimeError: Function CatBackward returned an invalid gradient at index 1 - expected device cuda:1 but got cuda:0
could you give some advice on this error?

The text was updated successfully, but these errors were encountered:

zhoudaxia233 · 2019-08-13T15:42:00Z

@Hesene Hello Hesene, in my lab I only have one single 2080Ti, therefore I cannot replicate this issue. I'm sorry about it!

Hesene · 2019-08-13T15:48:15Z

@Hesene Hello Hesene, in my lab I only have one single 2080Ti, therefore I cannot replicate this issue. I'm sorry about it!

Ok, thank you for your code, it help me a lot

AtsunoriFujita · 2019-10-15T20:26:14Z

I face the same problem.
Which part is the cause?

goodgoodstudy92 · 2020-01-17T11:52:33Z

did you use torch.nn.DataParallel()?

zhoudaxia233 · 2020-01-18T04:56:12Z

did you use torch.nn.DataParallel()?

no I didn't, but I think it may work

zhoudaxia233 · 2020-01-18T04:57:33Z

I face the same problem.
Which part is the cause?

I'm not sure, but I think you can try to integrate nn.DataParallel() into the source code

goodgoodstudy92 · 2020-01-18T05:48:06Z

I face the same problem.
Which part is the cause?

I'm not sure, but I think you can try to integrate nn.DataParallel() into the source code

I use efficientnet as backbone to trian a object detection model, and the nn.DataParallel() works fine, the only issue is the speed of multi gpu is quit slow

ryanstout · 2020-04-25T18:47:22Z

I'm seeing a similar issue when running with nn.DataParallel:

RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/efficientunet/efficientunet.py", line 106, in forward
    x = torch.cat([x, blocks.popitem()[1]], dim=1)
RuntimeError: All input tensors must be on the same device. Received cuda:0 and cuda:1

Any ideas?

Thanks!

Vipermdl · 2020-09-22T02:41:10Z

I'm seeing a similar issue when running with nn.DataParallel:

RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/efficientunet/efficientunet.py", line 106, in forward
    x = torch.cat([x, blocks.popitem()[1]], dim=1)
RuntimeError: All input tensors must be on the same device. Received cuda:0 and cuda:1

Any ideas?

Thanks!

Hi, bro.
Are you solved the problem?

If-only1 · 2020-11-08T12:37:10Z

I suspect that this problem is due to the sharing of a certain module in Efficientunet, which results in this module being only on one GPU, perhaps the encoder……

TianyiFranklinWang · 2021-03-13T07:52:50Z

I suspect that this problem is due to the sharing of a certain module in Efficientunet, which results in this module being only on one GPU, perhaps the encoder……

I agree, I'm now facing the same problem.

zhoudaxia233 · 2021-04-20T09:23:08Z

@NPU-Franklin Franklin created a PR (#11 ) to support multi GPUs. I do not have multi cards therefore I cannot test it. But maybe you can give it a try.

zhoudaxia233 added the help wanted Extra attention is needed label Aug 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use it with Multi GPU #1

How to use it with Multi GPU #1

Hesene commented Aug 10, 2019

zhoudaxia233 commented Aug 13, 2019

Hesene commented Aug 13, 2019

AtsunoriFujita commented Oct 15, 2019

goodgoodstudy92 commented Jan 17, 2020

zhoudaxia233 commented Jan 18, 2020

zhoudaxia233 commented Jan 18, 2020

goodgoodstudy92 commented Jan 18, 2020

ryanstout commented Apr 25, 2020

Vipermdl commented Sep 22, 2020

If-only1 commented Nov 8, 2020

TianyiFranklinWang commented Mar 13, 2021

zhoudaxia233 commented Apr 20, 2021

How to use it with Multi GPU #1

How to use it with Multi GPU #1

Comments

Hesene commented Aug 10, 2019

zhoudaxia233 commented Aug 13, 2019

Hesene commented Aug 13, 2019

AtsunoriFujita commented Oct 15, 2019

goodgoodstudy92 commented Jan 17, 2020

zhoudaxia233 commented Jan 18, 2020

zhoudaxia233 commented Jan 18, 2020

goodgoodstudy92 commented Jan 18, 2020

ryanstout commented Apr 25, 2020

Vipermdl commented Sep 22, 2020

If-only1 commented Nov 8, 2020

TianyiFranklinWang commented Mar 13, 2021

zhoudaxia233 commented Apr 20, 2021