Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: merge_sort: failed to synchronize: an illegal memory access was encountered #34

Open
Tian14267 opened this issue Sep 18, 2019 · 3 comments

Comments

@Tian14267
Copy link

Tian14267 commented Sep 18, 2019

Hello Author,I get a problem " loss=inf " when I train my data,it happend in several epoch, and the detail is here:
..........
(650 / 2000) - Loss: 1.3488 - tr_loss: 0.2789 - tcl_loss: 0.4046 - sin_loss: 0.1108 - cos_loss: 0.0985 - radii_loss: 0.4560
(700 / 2000) - Loss: 0.9112 - tr_loss: 0.2656 - tcl_loss: 0.3378 - sin_loss: 0.1111 - cos_loss: 0.0474 - radii_loss: 0.1492
(750 / 2000) - Loss: 1.3603 - tr_loss: 0.2677 - tcl_loss: 0.3537 - sin_loss: 0.0388 - cos_loss: 0.0105 - radii_loss: 0.6896
(800 / 2000) - Loss: 1.5277 - tr_loss: 0.2856 - tcl_loss: 0.3284 - sin_loss: 0.1668 - cos_loss: 0.0420 - radii_loss: 0.7048
/home/hj/smbshare/fffan/Detector/TextSnake/TextSnake.pytorch-master/util/misc.py:85: RuntimeWarning: invalid value encountered in double_scalars
return v[1] / l
/home/hj/smbshare/fffan/Detector/TextSnake/TextSnake.pytorch-master/util/misc.py:91: RuntimeWarning: invalid value encountered in double_scalars
return v[0] / l

(850 / 2000) - Loss: 2.1355 - tr_loss: 0.4698 - tcl_loss: 0.4413 - sin_loss: 0.1460 - cos_loss: 0.1237 - radii_loss: 0.9547
(900 / 2000) - Loss: 1.9340 - tr_loss: 0.4183 - tcl_loss: 0.3667 - sin_loss: 0.0760 - cos_loss: 0.0620 - radii_loss: 1.0110
(950 / 2000) - Loss: 1.2544 - tr_loss: 0.2943 - tcl_loss: 0.3723 - sin_loss: 0.0869 - cos_loss: 0.0938 - radii_loss: 0.4072
(1000 / 2000) - Loss: 1.2810 - tr_loss: 0.3134 - tcl_loss: 0.4176 - sin_loss: 0.1045 - cos_loss: 0.0622 - radii_loss: 0.3833
(1050 / 2000) - Loss: 2.0816 - tr_loss: 0.2053 - tcl_loss: 0.3180 - sin_loss: 0.0600 - cos_loss: 0.0517 - radii_loss: 1.4467
(1100 / 2000) - Loss: 1.7673 - tr_loss: 0.2696 - tcl_loss: 0.4861 - sin_loss: 0.1194 - cos_loss: 0.0957 - radii_loss: 0.7965
(1150 / 2000) - Loss: inf - tr_loss: 0.3392 - tcl_loss: 0.4790 - sin_loss: inf - cos_loss: 0.1815 - radii_loss: 0.1718

Traceback (most recent call last):
File "train_textsnake.py", line 239, in
main()
File "train_textsnake.py", line 224, in main
train(model, train_loader, criterion, scheduler, optimizer, epoch, logger)
File "train_textsnake.py", line 73, in train
criterion(output, tr_mask, tcl_mask, sin_map, cos_map, radius_map, train_mask)
File "/home/hj/.pyenv/versions/fffan-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/hj/smbshare/fffan/Detector/TextSnake/TextSnake.pytorch-master/network/loss.py", line 61, in forward
loss_tr = self.ohem(tr_pred, tr_mask.long(), train_mask.long())
File "/home/hj/smbshare/fffan/Detector/TextSnake/TextSnake.pytorch-master/network/loss.py", line 24, in ohem
loss_neg, _ = torch.topk(loss_neg, n_neg)
RuntimeError: merge_sort: failed to synchronize: an illegal memory access was encountered

And always get this error during training "util/misc.py:91: RuntimeWarning: invalid value encountered in double_scalars"
Can you solve those problems?
PS: My data is ICPR

@princewang1994
Copy link
Owner

princewang1994 commented Sep 20, 2019

It seem the business of dataset:
vector_sin and vector_cos is called like this:

            sin_theta = vector_sin(c2 - c1)
            cos_theta = vector_cos(c2 - c1)

in this line, if c2 and c1 are the same, it will cause a divide zero error.

So in data loading process, you can check if there are consequent duplicated points.

@Tian14267
Copy link
Author

How can I solve this ? when I use the data "SynthText" , but it still have this problem.
I think the data is ok.

E:\fffan\Detec\TextSnake\TextSnake.pytorch-master\util\misc.py:85: RuntimeWarning: invalid value encountered in double_scalars
return v[1] / l
E:\fffan\Detec\TextSnake\TextSnake.pytorch-master\util\misc.py:91: RuntimeWarning: invalid value encountered in double_scalars
return v[0] / l
E:\fffan\Detec\TextSnake\TextSnake.pytorch-master\util\misc.py:197: RuntimeWarning: invalid value encountered in double_scalars
ratio = end_shift / edge_length[cur_node]
.........

@Tian14267
Copy link
Author

@princewang1994

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants