Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多卡训练,精度大幅度下降 #47

Closed
coldlarry opened this issue Oct 21, 2021 · 5 comments
Closed

多卡训练,精度大幅度下降 #47

coldlarry opened this issue Oct 21, 2021 · 5 comments
Labels
good first issue Good for newcomers

Comments

@coldlarry
Copy link

作者您好,我使用多卡训练(4张),并且增大了你设置的默认单卡bathsize。

训练时,第一次测试mAP接近0.(单卡训练时,第一次一般是0.6)。

想请教一下,这是什么原因呀?是学习率的问题吗?

@coldlarry
Copy link
Author

单卡训练,调大batchsize,精度也会显著下降。这是为啥呀..................

@yuantn
Copy link
Owner

yuantn commented Oct 22, 2021

你在调大 batch size 的同时调节学习率了吗?


Did you adjust the learning rate while increasing the batch size?

@coldlarry
Copy link
Author

您好,确实没有调节学习率(没想到学习率影响这么大)。
另外,还想问下您:现在的代码支持多卡训练吗?

@coldlarry
Copy link
Author

我多卡训练,训练到中间就不print任何信息了,也没报错,我最后kill掉进程了。
看到之前issue有问多卡训练的,所以想问下,是否支持多卡呢现在。

@yuantn
Copy link
Owner

yuantn commented Oct 25, 2021

你可以提供一下输出日志吗?一个可能的原因是各个进程在多个GPU上没有对齐,导致 GPU 利用率达到 100% 卡住。

目前的代码最好是用单 GPU 运行。如果想使用多 GPU 的话,你可以参考 问题集锦中训练和测试部分的第 3 个问题问题 #11


Can you provide the output log? One possible reason is that each process is not aligned on multiple GPUs, causing the GPU utilization to reach 100% and stuck.

The current code is best run on a single GPU. If you want to use multiple GPUs, you can refer to the third question of Training and Test in FAQ and Issue #11

@yuantn yuantn added the good first issue Good for newcomers label Oct 27, 2021
@yuantn yuantn closed this as completed Nov 22, 2021
yuantn added a commit that referenced this issue Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants