NaN values for train_loss while training UNETR model #818
Replies: 3 comments
-
Normally this happens when there is data issue.. |
Beta Was this translation helpful? Give feedback.
-
Thanks for opening this discussion, @neerajamaha I've faced the same problem before. There are different things you can do/change: 1/ Optimizer and/or learning rate (smaller than 1e-4) (https://github.com/Project-MONAI/MONAILabel/blob/main/sample-apps/radiology/lib/trainers/deepedit.py#L77) - You could also try Novograd 2/ Consider changing these random transforms for training: https://github.com/Project-MONAI/MONAILabel/blob/main/sample-apps/radiology/lib/trainers/deepedit.py#L108-L112 - They may not be the best for your dataset 3/ If after these changes you're still facing NaN values, I'd check whether the labels/ground truth aren't corrupted - happy to help checking this if needed Hope this helps |
Beta Was this translation helpful? Give feedback.
-
@neerajamaha if this issue is still persisting at your end, please try turning off 'amp' by setting it to False. It's a known issue with UNETR, that sometime with 'amp' turned on it goes into NaN's |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am training a DeepEdit model with a UNETR network from the radiology sample app on pediatric Lung MRI images. I have noticed that during the training, after 10 or so epochs, the train_loss values become 'NaN', and the Dice score is not recalculated.
I have used the default configuration of the sample app and have only changed the spatial size to 128x128x128 and the target spacing to 0.703x0.703x0.703 mm. I was wondering if there were any suggestions to remedy this or if others have faced this issue as well?
I have also trained a DeepEdit model with a DynUNET network and have not seen a 'NaN' value come up. I wanted to see if there were any thoughts as to why we see this divergence for the UNETR network but not for the DynUNET network.
Any feedback would be greatly appreciated.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions