Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练损失出现nan的问题 #27

Open
FangJingYunner opened this issue Apr 20, 2023 · 2 comments
Open

训练损失出现nan的问题 #27

FangJingYunner opened this issue Apr 20, 2023 · 2 comments

Comments

@FangJingYunner
Copy link

您好,我用您的代码训练自己采集的旋转目标检测数据集。数据集中不一定是每一张图都有旋转目标检测的标注。但是网络在计算损失的时候有时候会出现一些变量是nan的情况,比如offset0,loss_fg,loss_neg是nan的情况。而且这个还时好时坏的,有的时候网络参数或者训练策略改改就不会有,有的时候在某些网络参数下训练了几百个step就会出现nan

@FangJingYunner
Copy link
Author

好吧,现在定位到是这一句代码出了问题,这个变量会在网络的头几个step就变得很大,最后溢出导致错误。有什么解决办法吗?

pred_l1234 = torch.exp(conv_raw_l1234) * stride ##l1-l4, 上-右-下-左

@Crescent-Ao
Copy link
Collaborator

请问有更加详细的log记录吗,这边帮你排查一下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants