-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The problem of fsns train_fsns.py #69
Comments
did you do this |
@Bartzi so I executed the two commands "pip uninstall cupy==2.2.0" and reinstalled "cupy-cuda80==6.0.0b3" nextly I executed this command" python train_fsns.py /home/data/fsns /image/curriculum.json /home/code/mayongjuan/see-master/fsns-model --char-map ../datasets/fsns/fsns_char_map.json --blank-label 0 -b 20 -g 0 3” , but the following problem is raising. /home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:151: UserWarning: optimizer.eps is changed to 2e-08 by MultiprocessParallelUpdater for new batch size. could you tell me why ? |
Are you sure you are using CUDA 8.0 on your machine? |
yeah i am sure. |
Well, then I don't know... I did not ever use cupy in Version 6, yet... So this might be an issue. |
No, I haven't used the docker container before, maybe I can try it. very thanks@Bartzi |
there is a problem of fsns train_fsns.py:
first:The NCCL already is installed in my new environment by following steps
1. https://developer.nvidia.com/nccl
2.sudo dpkg -i nccl-repo-ubuntu1604-2.2.12-ga-cuda8.0_1-1_amd64.deb
3.sudo apt update
4.sudo apt-get install libnccl2=2.2.12-1+cuda8.0 libnccl-dev=2.2.12-1+cuda8.0
5.sudo cp /usr/lib/x86_64-linux-gnu/libnccl.so.2 /usr/local/cuda/lib64
6. sudo cp /usr/include/nccl.h /usr/local/cuda/include/
7.chmod a+r /usr/local/cuda/include/nccl.h /usr/local/cuda/lib64/libnccl.so.2
second :when i try to excute python train_fsns.py,the below problem is occuring.
(SEE) mayongjuan@visionGroup:/home/code/mayongjuan/see/chainer$ python train_fsns.py /home/data/fsns/image/curriculum.json /home/code/mayongjuan/see/fsns-model --blank-label 0 --char-map ../datasets/fsns/fsns_char_map.json -b 50
Traceback (most recent call last):
File "train_fsns.py", line 169, in
updater = MultiprocessParallelUpdater(train_iterators, optimizer, devices=args.gpus)
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 116, in init
'NCCL is not enabled. MultiprocessParallelUpdater '
Exception: NCCL is not enabled. MultiprocessParallelUpdater requires NCCL.
Please reinstall chainer after you install NCCL.
(see https://github.com/chainer/chainer#installation).
Exception ignored in: <bound method MultiprocessIterator.del of <chainer.iterators.multiprocess_iterator.MultiprocessIterator object at 0x7f431e1987f0>>
Traceback (most recent call last):
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 117, in del
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 244, in terminate
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/threading.py", line 347, in notify
TypeError: 'NoneType' object is not callable
Exception ignored in: <bound method MultiprocessIterator.del of <chainer.iterators.multiprocess_iterator.MultiprocessIterator object at 0x7f431e1987b8>>
Traceback (most recent call last):
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 117, in del
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/site-packages/chainer/iterators/multiprocess_iterator.py", line 244, in terminate
File "/home/code/mayongjuan/anaconda3/envs/SEE/lib/python3.6/threading.py", line 347, in notify
TypeError: 'NoneType' object is not callable
I can't figure out why the problem is exiting,i 'm looking forward to your answer. very thanks
The text was updated successfully, but these errors were encountered: