Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code enhancements #1429

Open
wants to merge 30 commits into
base: develop
Choose a base branch
from
Open

Code enhancements #1429

wants to merge 30 commits into from

Conversation

PJAvinash
Copy link
Contributor

Details

Enhancing code

Work item: "Internal"
What were the changes?
Modifying / enhancing failure path in ncclIBInit()

Why were the changes made?
Code improvements

How was the outcome achieved?
Analyzing the code with debugger
Additional Documentation:
pthread_detach() call is removed as it is a side effect of earlier nccl sync

Approval Checklist

Do not approve until these items are satisfied.

  • Verify the CHANGELOG has been updated, if
    • there are any NCCL API version changes,
    • any changes impact library users, and/or
    • any changes impact any other ROCm library.

Copy link
Contributor

@thananon thananon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@@ -505,9 +510,6 @@ ncclResult_t ncclIbInit(ncclDebugLogger_t logFunction) {
ncclIbMergedDevs[mergedDev].speed += ncclIbDevs[ncclNIbDevs].speed;
ncclNIbDevs++;
nPorts++;
// [RCCL]
pthread_detach(ncclIbAsyncThread);
// [/RCCL]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing pthread_detach? This was added to fix a memory leak issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pthread_detach() is being called twice with same parameter, first time at line 486, second call will simply return -1 and it is not being checked, and has no effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants