-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
多卡训练,精度大幅度下降 #47
Comments
单卡训练,调大batchsize,精度也会显著下降。这是为啥呀.................. |
你在调大 batch size 的同时调节学习率了吗? Did you adjust the learning rate while increasing the batch size? |
您好,确实没有调节学习率(没想到学习率影响这么大)。 |
我多卡训练,训练到中间就不print任何信息了,也没报错,我最后kill掉进程了。 |
你可以提供一下输出日志吗?一个可能的原因是各个进程在多个GPU上没有对齐,导致 GPU 利用率达到 100% 卡住。 目前的代码最好是用单 GPU 运行。如果想使用多 GPU 的话,你可以参考 问题集锦中训练和测试部分的第 3 个问题 和 问题 #11 Can you provide the output log? One possible reason is that each process is not aligned on multiple GPUs, causing the GPU utilization to reach 100% and stuck. The current code is best run on a single GPU. If you want to use multiple GPUs, you can refer to the third question of Training and Test in FAQ and Issue #11 |
作者您好,我使用多卡训练(4张),并且增大了你设置的默认单卡bathsize。
训练时,第一次测试mAP接近0.(单卡训练时,第一次一般是0.6)。
想请教一下,这是什么原因呀?是学习率的问题吗?
The text was updated successfully, but these errors were encountered: