-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving communication overlap for the case of multi kernel queue usage #1308
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Youngeun Kwon <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>
for more information, see https://pre-commit.ci
@erhoo82 Hi Sangkug, this is a PR for launch ordering work. Could you please assign a reviewer? |
transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp
Outdated
Show resolved
Hide resolved
transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp
Outdated
Show resolved
Hide resolved
transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: Youngeun Kwon <[email protected]>
for more information, see https://pre-commit.ci
@youngeunkwon0405 The TP overlap unit tests explicitly set Also please launch the L1 tests with |
Signed-off-by: Youngeun Kwon <[email protected]>
for more information, see https://pre-commit.ci
Hi @denera, I have updated the Also, could you please elaborate on more details about the following? I am new to writing a test and also new to the ci process.
I have tested the modified test case only and the following was a new result. ../lustre/fsw/coreai_dlalgo_llm/youngeunk/nemo/nemo.dev/mount/TransformerEngine-youngeunk/tests/pytorch/distributed/test_comm_gemm_overlap.py::test_bulk_overlaps[ALL-GATHER - BF16 - 1 connections] PASSED ========================= 6 passed in 93.35s (0:01:33) ========================= |
/te-ci pytorch L1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending rebase on latest TE/main and clean CI results.
@denera Rebased with the main. Could you please let me know what the next step would be? |
/te-ci pytorch L1 |
/te-ci pytorch L1 |
Signed-off-by: Youngeun Kwon <[email protected]>
for more information, see https://pre-commit.ci
/te-ci pytorch L1 |
/te-ci pytorch L0 L1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! We can merge, pending clean CI results. Your CI permissions might not have taken effect yet because I don't see any pipelines for this PR from your trigger. I triggered it again and the pass/fail should show up on GitHub when it's done.
Signed-off-by: Youngeun Kwon <[email protected]>
Signed-off-by: Youngeun Kwon <[email protected]>
/te-ci pytorch L0 L1 |
Description
The current TP-overlap relay is on a single kernel queue to configure launch ordering to control compute-communication overlap, which fails to overlap when multi kernel queue is used.
This PR enforces launch ordering using the LaunchCompletionEvent feature between the communication kernel and the compute kernel to ensure the overlap.
This feature is specific to Hopper and applies only to bulk overlap cases.
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: