Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The problem of fsns train_fsns.py #69

Open
MS-MA opened this issue Mar 31, 2019 · 6 comments
Open

The problem of fsns train_fsns.py #69

MS-MA opened this issue Mar 31, 2019 · 6 comments

Comments

@MS-MA
Copy link

MS-MA commented Mar 31, 2019

there is a problem of fsns train_fsns.py:
first:The NCCL already is installed in my new environment by following steps
1. https://developer.nvidia.com/nccl
2.sudo dpkg -i nccl-repo-ubuntu1604-2.2.12-ga-cuda8.0_1-1_amd64.deb
3.sudo apt update
4.sudo apt-get install libnccl2=2.2.12-1+cuda8.0 libnccl-dev=2.2.12-1+cuda8.0
5.sudo cp /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/lib64
6. sudo cp /usr/include/nccl.h /usr/local/cuda/include/
7.chmod a+r /usr/local/cuda/include/nccl.h /usr/local/cuda/lib64/libnccl.so.2
second :when i try to excute python train_fsns.py,the below problem is occuring.

(SEE) mayongjuan@visionGroup:/home/code/mayongjuan/see/chainer$ python train_fsns.py /home/data/fsns/image/curriculum.json /home/code/mayongjuan/see/fsns-model --blank-label 0 --char-map ../datasets/fsns/fsns_char_map.json -b 50
Traceback (most recent call last):
File "train_fsns.py", line 169, in
updater = MultiprocessParallelUpdater(train_iterators, optimizer, devices=args.gpus)
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 116, in init
'NCCL is not enabled. MultiprocessParallelUpdater '
Exception: NCCL is not enabled. MultiprocessParallelUpdater requires NCCL.
Please reinstall chainer after you install NCCL.
(see https://github.com/chainer/chainer#installation).
Exception ignored in: <bound method MultiprocessIterator.del of <chainer.iterators.multiprocess_iterator.MultiprocessIterator object at 0x7f431e1987f0>>
Traceback (most recent call last):
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 117, in del
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 244, in terminate
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/threading.py", line 347, in notify
TypeError: 'NoneType' object is not callable
Exception ignored in: <bound method MultiprocessIterator.del of <chainer.iterators.multiprocess_iterator.MultiprocessIterator object at 0x7f431e1987b8>>
Traceback (most recent call last):
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 117, in del
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 244, in terminate
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/threading.py", line 347, in notify
TypeError: 'NoneType' object is not callable

I can't figure out why the problem is exiting,i 'm looking forward to your answer. very thanks

@Bartzi
Copy link
Owner

Bartzi commented Apr 3, 2019

did you do this Please reinstall chainer after you install NCCL.?

@MS-MA
Copy link
Author

MS-MA commented Apr 4, 2019

@Bartzi
Yes, I uninstalled the previously installed 3.2.0 version of the chainer, reinstalling chainer==6.0.0b3 from this URL “https://github.com/chainer/chainer”, but when I execute this command" python train_fsns.py /home/data/fsns /image/curriculum.json /home/code/mayongjuan/see-master/fsns-model --char-map ../datasets/fsns/fsns_char_map.json --blank-label 0 -b 20 -g 0 3",I have encountered the following problem again.
--------------------------------------------------------------------------------
CuPy (cupy) version 2.2.0 may not be compatible with this version of Chainer.
Please consider installing the supported version by running:
$ pip install 'cupy==6.0.0b3'
See the following page for more details:
https://docs-cupy.chainer.org/en/latest/install.html

so I executed the two commands "pip uninstall cupy==2.2.0" and reinstalled "cupy-cuda80==6.0.0b3"

nextly I executed this command" python train_fsns.py /home/data/fsns /image/curriculum.json /home/code/mayongjuan/see-master/fsns-model --char-map ../datasets/fsns/fsns_char_map.json --blank-label 0 -b 20 -g 0 3” , but the following problem is raising.

/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:151: UserWarning: optimizer.eps is changed to 2e-08 by MultiprocessParallelUpdater for new batch size.
format(optimizer.eps))
Segmentation fault (core dumped)

could you tell me why ?

@Bartzi
Copy link
Owner

Bartzi commented Apr 4, 2019

Are you sure you are using CUDA 8.0 on your machine?

@MS-MA
Copy link
Author

MS-MA commented Apr 4, 2019

yeah i am sure.

@Bartzi
Copy link
Owner

Bartzi commented Apr 4, 2019

Well, then I don't know... I did not ever use cupy in Version 6, yet... So this might be an issue.
Did you try to use the docker container?

@MS-MA
Copy link
Author

MS-MA commented Apr 8, 2019

No, I haven't used the docker container before, maybe I can try it. very thanks@Bartzi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants