Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce_scatter on MacOS issue #365

Open
ChengjieLi28 opened this issue Jul 10, 2023 · 0 comments
Open

reduce_scatter on MacOS issue #365

ChengjieLi28 opened this issue Jul 10, 2023 · 0 comments

Comments

@ChengjieLi28
Copy link

Hi team,

I successfully compiled gloo on MacOS by setting USE_LIBUV ON,
but when I test the reduce_scatter OP, I found that core dump at runtime.

I use pybind11 to bind python interface, here's the code:

def worker_reduce_scatter(rank):
    from .. import xoscar_pygloo as xp

    if rank == 0:
        if os.path.exists(fileStore_path):
            shutil.rmtree(fileStore_path)
        os.makedirs(fileStore_path)
    else:
        time.sleep(0.5)

    context = xp.rendezvous.Context(rank, 3)

    if system_name == "Linux":
        attr = xp.transport.tcp.attr("localhost")
        dev = xp.transport.tcp.CreateDevice(attr)
    else:
        attr = xp.transport.uv.attr("localhost")
        dev = xp.transport.uv.CreateDevice(attr)

    fileStore = xp.rendezvous.FileStore(fileStore_path)
    store = xp.rendezvous.PrefixStore(str(3), fileStore)

    context.connectFullMesh(store, dev)

    sendbuf = np.array(
        [i + 1 for i in range(sum([j + 1 for j in range(3)]))], dtype=np.float32
    )
    print(f'Send buf: {sendbuf}')
    sendptr = sendbuf.ctypes.data

    recvbuf = np.zeros(2, dtype=np.float32)
    recvptr = recvbuf.ctypes.data
    recvElems = [2, 2, 2]

    data_size = (
        sendbuf.size if isinstance(sendbuf, np.ndarray) else sendbuf.numpy().size
    )
    print(f'Data size: {data_size}')
    datatype = xp.glooDataType_t.glooFloat32
    op = xp.ReduceOp.SUM

    xp.reduce_scatter(context, sendptr, recvptr, data_size, recvElems, datatype, op)

    print(f"rank {rank} sends {sendbuf}, receives {recvbuf}")

def test_reduce_scatter():
    process1 = mp.Process(target=worker_reduce_scatter, args=(0,))
    process1.start()
    process2 = mp.Process(target=worker_reduce_scatter, args=(1,))
    process2.start()
    process3 = mp.Process(target=worker_reduce_scatter, args=(2,))
    process3.start()

    process1.join()
    process2.join()
    process3.join()

This test not work on MacOS, but works on Linux.

May I ask that why this happens? Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant