You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have no idea why this happens. Even when bootstrapping from a local DHT node, initialization may fail with all kinds of errors:
Sep 04 06:37:01.905 [INFO] Server started with 3 modules:
Sep 04 06:37:01.905 [INFO] expert.0: PraxisMLP, 525568 parameters
Sep 04 06:37:01.905 [INFO] expert.1: PraxisMLP, 525568 parameters
Sep 04 06:37:01.905 [INFO] expert.2: PraxisMLP, 525568 parameters
Sep 04 06:37:01.936 [ERROR] [hivemind.moe.server.connection_handler._run:63] ConnectionHandler failed to start:
Traceback (most recent call last):
File "/home/crow/repos/praxis/venv/lib/python3.12/site-packages/multiaddr/transforms.py", line 86, in bytes_iter
proto = protocol_with_code(code)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/crow/repos/praxis/venv/lib/python3.12/site-packages/multiaddr/protocols.py", line 290, in protocol_with_code
return REGISTRY.find_by_code(code)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/crow/repos/praxis/venv/lib/python3.12/site-packages/multiaddr/protocols.py", line 260, in find_by_code
raise exceptions.ProtocolNotFoundError(code, "code")
multiaddr.exceptions.ProtocolNotFoundError: No protocol with code 465 found
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/crow/repos/praxis/venv/lib/python3.12/site-packages/hivemind/moe/server/connection_handler.py", line 59, in _run
self._p2p = await self.dht.replicate_p2p()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/crow/repos/praxis/venv/lib/python3.12/site-packages/hivemind/dht/dht.py", line 327, in replicate_p2p
self._p2p_replica = await P2P.replicate(daemon_listen_maddr)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/crow/repos/praxis/venv/lib/python3.12/site-packages/hivemind/p2p/p2p_daemon.py", line 312, in replicate
await self._ping_daemon()
File "/home/crow/repos/praxis/venv/lib/python3.12/site-packages/hivemind/p2p/p2p_daemon.py", line 317, in _ping_daemon
logger.debug(f"Launched p2pd with peer id = {self.peer_id}, host multiaddrs = {self._visible_maddrs}")
^^^^^^^^^^^^^^^^^^^^^^
File "/home/crow/repos/praxis/venv/lib/python3.12/site-packages/multiaddr/multiaddr.py", line 147, in __repr__
return "<Multiaddr %s>" % str(self)
^^^^^^^^^
File "/home/crow/repos/praxis/venv/lib/python3.12/site-packages/multiaddr/multiaddr.py", line 135, in __str__
return bytes_to_string(self._bytes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/crow/repos/praxis/venv/lib/python3.12/site-packages/multiaddr/transforms.py", line 30, in bytes_to_string
for _, proto, codec, part in bytes_iter(buf):
File "/home/crow/repos/praxis/venv/lib/python3.12/site-packages/multiaddr/transforms.py", line 89, in bytes_iter
raise exceptions.BinaryParseError(
multiaddr.exceptions.BinaryParseError: Invalid binary MultiAddr protocol 465: Unknown Protocol
Sep 04 06:37:01.940 [ERROR] [hivemind.utils.mpfuture._process_updates_in_background:198] Could not retrieve update: caught TypeError("BinaryParseError.__init__() missing 2 required positional arguments: 'binary' and 'protocol'") (pid=242958)
Traceback (most recent call last):
File "/home/crow/repos/praxis/venv/lib/python3.12/site-packages/hivemind/utils/mpfuture.py", line 177, in _process_updates_in_background
uid, update_type, payload = receiver_pipe.recv()
^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: BinaryParseError.__init__() missing 2 required positional arguments: 'binary' and 'protocol'
I was sort of able to make this problem less frequent by adding a delay to startup, but it honestly doesn't work very well, if at all.
Could use some help with this one. I've been running into issues like this in Hivemind for years.
The text was updated successfully, but these errors were encountered:
I found a solution to this problem. Long story short, if you call dht.get_visible_maddrs() before attempting to start the server, it will never hang. Clearly, this is not intended behavior, and this method should have no bearing on server bootstrapping... but it does. So, we fixed it with a hack, until upstream fixes this.
I have no idea why this happens. Even when bootstrapping from a local DHT node, initialization may fail with all kinds of errors:
I was sort of able to make this problem less frequent by adding a delay to startup, but it honestly doesn't work very well, if at all.
Could use some help with this one. I've been running into issues like this in Hivemind for years.
The text was updated successfully, but these errors were encountered: