alltoall performance regression after upgrading from 2021.1-beta07-1 to 1.10 #34

Peach-He · 2022-01-21T08:50:05Z

Hi,
We upgraded torch-ccl from 2021.1-beta07-1 to 1.10 and noticed some performance regression for all_to_all. overall, ccl 1.10 is 2x worse than 2021.1-beta07-1.
system config:

single node, 2 proc_per_node, so no network communication

Any idea on the root cause?

all_to_all profiling for torch ccl 1.10

all_to_all profiling for torch ccl 2021.1-beta07-1

test code:

import torch
import extend_distributed as ext_dist

if __name__ == "__main__":
    ext_dist.init_distributed(backend='ccl')
    input = []
    tensor = torch.ones(262144, 16, dtype=torch.bfloat16)
    input.append(tensor)
    with torch.autograd.profiler.profile(True) as prof:
        for _ in range(10):
            a2a_req = ext_dist.alltoall(input, None)
            ly_sparse = a2a_req.wait()
    print(prof.key_averages().table(sort_by="cpu_time_total"))

For extend_distributed, please refer to https://github.com/IntelAI/models/blob/master/models/recommendation/pytorch/dlrm/training/bfloat16/extend_distributed.py

Thanks

The text was updated successfully, but these errors were encountered:

chengjunlu · 2022-01-28T01:40:48Z

Hi Peach-He,
Thanks for raise the regression issue you found.
We are investigating this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alltoall performance regression after upgrading from 2021.1-beta07-1 to 1.10 #34

alltoall performance regression after upgrading from 2021.1-beta07-1 to 1.10 #34

Peach-He commented Jan 21, 2022

chengjunlu commented Jan 28, 2022

alltoall performance regression after upgrading from 2021.1-beta07-1 to 1.10 #34

alltoall performance regression after upgrading from 2021.1-beta07-1 to 1.10 #34

Comments

Peach-He commented Jan 21, 2022

chengjunlu commented Jan 28, 2022