-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH IPC Improvements - Support TCP, ZMQ and Better multi-node handling #32
Conversation
…ling for sockets.
…ements raw and zmq.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
In new_task.md, line 662, there is an ultra minor typo: " tricky-to-track-dow" -> "tricky-to-track-down"
-
The ZMQ and SSH code could be removed, up to you. I would remove it just to keep the code cleaner, but whatever you prefer
-
If I understand correctly the communicator will scan the ports to find an empty one. Is this because the ports could be used by other communicators? Could in the future this be flagged as an attack, if we run some secuity software on our machines?
-
HEAD[::-1] Love this!! Sequence in reverse! Very cool! HELLO -> OLLEH :-)
I'll address the typo (and something else I just thought of). For the other comments:
|
The port scanning was just a thought. Let's merge it for now |
Description
This PR overhauls socket-based IPC. Support for TCP is added, and SSH tunnels are provided as a backup if Unix sockets are used across multiple nodes. A first iteration of ZMQ is added as an alternative to raw sockets for future use. The use of ZMQ is currently controlled by a global
bool
but this will be configurable in a more standard manner later.Note: Message order is not controlled. If running with multiple ranks the execution order of the rank as well as the read schedule (and the network) will determine when messages arrive. Messages arriving via the
PipeCommunicator
are not synchronized with those arriving via theSocketCommunicator
and may also arrive out of order.Checklist
SocketCommunicator
supports TCP sockets.SocketCommunicator
:Task
-side will open SSH tunnel if on different machine and using Unix sockets.SocketCommunicator
adds ZMQ supportExecutor
passes along information about what host it is on, or what ports to use.LUTE_USE_TCP
environment variable.PR Type:
Address issues:
Testing
Tested using direct Python submission and with SLURM across multiple nodes using all test classes. Example output
Testing TCP communication with raw sockets
MultiNodeCommunicationTester
Task
which uses MPI.SocketTester
a non-MPI task.Testing Unix with raw sockets
MultiNodeCommunicationTester
Task
which uses MPI.SocketTester
a non-MPI task. (no SSH)Testing TCP with ZMQ
MultiNodeCommunicationTester
Task
which uses MPI.Testing Unix with ZMQ
Screenshots