This is the NOWLAB fork of upstream PyTorch. The key features we've added are: Support for building PyTorch's distributed module with any [CUDA/ROCm]-aware MPI library including NOWLAB's MVAPICH2-GDR Support for MPI + fp16 communication operations