Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConnectionResetError at word segmentation #40

Closed
jbuehler1337 opened this issue May 6, 2020 · 3 comments
Closed

ConnectionResetError at word segmentation #40

jbuehler1337 opened this issue May 6, 2020 · 3 comments

Comments

@jbuehler1337
Copy link

Hi, I already mentioned my problem but I didin't find an issue describing what I experience at the moment. When I start the 2_line_word_segmentation.ipynb I get the following error:

ConnectionResetErrorTraceback (most recent call last)
<ipython-input-13-fbd64d2ad138> in <module>
      3     cls_metric = mx.metric.Accuracy()
      4     box_metric = mx.metric.MAE()
----> 5     train_loss = run_epoch(e, net, train_data, trainer, log_dir, print_name="train", is_train=True, update_metric=False)
      6     test_loss = run_epoch(e, net, test_data, trainer, log_dir, print_name="test", is_train=False, update_metric=True)
      7     if test_loss < best_test_loss:

<ipython-input-6-6b90c6f2ae19> in run_epoch(e, network, dataloader, trainer, log_dir, print_name, is_train, update_metric)
     32 
     33     total_losses = [0 for ctx_i in ctx]
---> 34     for i, (X, Y) in enumerate(dataloader):
     35         X = gluon.utils.split_and_load(X, ctx)
     36         Y = gluon.utils.split_and_load(Y, ctx)

/usr/local/lib/python3.6/dist-packages/mxnet/gluon/data/dataloader.py in __next__(self)
    503         try:
    504             if self._dataset is None:
--> 505                 batch = pickle.loads(ret.get(self._timeout))
    506             else:
    507                 batch = ret.get(self._timeout)

/usr/local/lib/python3.6/dist-packages/mxnet/gluon/data/dataloader.py in rebuild_ndarray(pid, fd, shape, dtype)
     59             fd = multiprocessing.reduction.rebuild_handle(fd)
     60         else:
---> 61             fd = fd.detach()
     62         return nd.NDArray(nd.ndarray._new_from_shared_mem(pid, fd, shape, dtype))
     63 

/usr/lib/python3.6/multiprocessing/resource_sharer.py in detach(self)
     55         def detach(self):
     56             '''Get the fd.  This should only be called once.'''
---> 57             with _resource_sharer.get_connection(self._id) as conn:
     58                 return reduction.recv_handle(conn)
     59 

/usr/lib/python3.6/multiprocessing/resource_sharer.py in get_connection(ident)
     85         from .connection import Client
     86         address, key = ident
---> 87         c = Client(address, authkey=process.current_process().authkey)
     88         c.send((key, os.getpid()))
     89         return c

/usr/lib/python3.6/multiprocessing/connection.py in Client(address, family, authkey)
    491 
    492     if authkey is not None:
--> 493         answer_challenge(c, authkey)
    494         deliver_challenge(c, authkey)
    495 

/usr/lib/python3.6/multiprocessing/connection.py in answer_challenge(connection, authkey)
    730     import hmac
    731     assert isinstance(authkey, bytes)
--> 732     message = connection.recv_bytes(256)         # reject large message
    733     assert message[:len(CHALLENGE)] == CHALLENGE, 'message = %r' % message
    734     message = message[len(CHALLENGE):]

/usr/lib/python3.6/multiprocessing/connection.py in recv_bytes(self, maxlength)
    214         if maxlength is not None and maxlength < 0:
    215             raise ValueError("negative maxlength")
--> 216         buf = self._recv_bytes(maxlength)
    217         if buf is None:
    218             self._bad_message_length()

/usr/lib/python3.6/multiprocessing/connection.py in _recv_bytes(self, maxsize)
    405 
    406     def _recv_bytes(self, maxsize=None):
--> 407         buf = self._recv(4)
    408         size, = struct.unpack("!i", buf.getvalue())
    409         if maxsize is not None and size > maxsize:

/usr/lib/python3.6/multiprocessing/connection.py in _recv(self, size, read)
    377         remaining = size
    378         while remaining > 0:
--> 379             chunk = read(handle, remaining)
    380             n = len(chunk)
    381             if n == 0:

ConnectionResetError: [Errno 104] Connection reset by peer

I am using a Docker Image on a Linux system. Can you help me to get the notebook to run?

@jbuehler1337
Copy link
Author

Hey, I just ran through all notebooks without an error. Pickle files are generated properly but I am still getting that error: ConnectionResetError: [Errno 104] Connection reset by peer

@jbuehler1337
Copy link
Author

Hey again. I solved that problem. It was the num_workers of 2 was to high. I put the num of workers to 1 and it works fine.

@jonomon
Copy link
Contributor

jonomon commented May 8, 2020

Great!

@jonomon jonomon closed this as completed May 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants