Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error raised when running baseline #19

Open
DonghyunAhn opened this issue Apr 27, 2020 · 1 comment
Open

Error raised when running baseline #19

DonghyunAhn opened this issue Apr 27, 2020 · 1 comment

Comments

@DonghyunAhn
Copy link

DonghyunAhn commented Apr 27, 2020

Hello, we are kaist_8. We find the issue when running the baseline.
Here is the error that we are facing now:

TensorFlow Version 1.13.1

Container image Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
Copyright 2017-2019 The TensorFlow Authors.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

Traceback (most recent call last):
  File "main.py", line 22, in <module>
    import torchvision
  File "/usr/local/lib/python3.5/dist-packages/torchvision/__init__.py", line 3, in <module>
    from torchvision import models
  File "/usr/local/lib/python3.5/dist-packages/torchvision/models/__init__.py", line 12, in <module>
    from . import detection
  File "/usr/local/lib/python3.5/dist-packages/torchvision/models/detection/__init__.py", line 1, in <module>
    from .faster_rcnn import *
  File "/usr/local/lib/python3.5/dist-packages/torchvision/models/detection/faster_rcnn.py", line 7, in <module>
    from torchvision.ops import misc as misc_nn_ops
  File "/usr/local/lib/python3.5/dist-packages/torchvision/ops/__init__.py", line 13, in <module>
    _register_custom_op()
  File "/usr/local/lib/python3.5/dist-packages/torchvision/ops/_register_onnx_ops.py", line 51, in _register_custom_op
    register_custom_op_symbolic('torchvision::_new_empty_tensor_op', new_empty_tensor_op, _onnx_opset_version)
  File "/usr/local/lib/python3.5/dist-packages/torch/onnx/__init__.py", line 200, in register_custom_op_symbolic
    return utils.register_custom_op_symbolic(symbolic_name, symbolic_fn, opset_version)
  File "/usr/local/lib/python3.5/dist-packages/torch/onnx/utils.py", line 793, in register_custom_op_symbolic
    .format(symbolic_name))
RuntimeError: Failed to register operator torchvision::_new_empty_tensor_op.                            The symbolic name must match the format Domain::Name,                            and sould start with a letter and contain only                            alphanumerical characters
User session exited

When I find this error on google, they point out it is the version problem of torch.
However, we tried to change it by modifying requirements.txt, but it still not works.
I would appreciate any comments to solve this problem.

@nsml-admin
Copy link

You can change tensorflow version or docker image by change requirements.txt document

Or I recommend you to change your code following this pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants