Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Communication and compute on separate Streams do not overlap #64

Open
garrett361 opened this issue May 28, 2024 · 0 comments
Open

Communication and compute on separate Streams do not overlap #64

garrett361 opened this issue May 28, 2024 · 0 comments

Comments

@garrett361
Copy link

Cross-posting this issue from ipex, in case the torch-ccl team is not aware of it.

Key issues:

  • Compute and collective communications do not overlap on intel GPU devices
  • Collectives block the host thread, rather than launching a kernel and immediately returning (as on NVIDIA devices)

The pytorch profiler traces highlight the issues (copied from the other thread):

A100 Trace

nvidia_a100_trace

Non-blocking kernel launch and comms/compute overlap.

Intel Max 1550 Trace

intel_1550_trace

Blocking kernel launch and no comms/compute overlap.

See the other thread for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant