Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: Fix bug in shmem_team_split_2d #1156

Merged
merged 1 commit into from
Oct 23, 2024

Conversation

philipmarshall21
Copy link
Collaborator

PR #1153 adds a fix to shmem_team_split_strided to interpret the stride argument provided to the API as being relative to the parent team (i.e. consistent with OpenSHMEM spec).

The fix was implemented in shmem_internal_team_split_strided, which is also utilized by shmem_internal_split_2d. However, shmem_internal_team_split_2d was already passing parent team-relative stride values to its calls to shmem_internal_team_split_strided, so it is now being doubly converted following the recent changes.

@philipmarshall21
Copy link
Collaborator Author

Hey @davidozog! I think this is the fix needed for shmem_team_split_2d, but would need to test to be sure,

@philipmarshall21
Copy link
Collaborator Author

I think it is working as expected:

image

In the case above the "2d split failed" message is always from PEs 1 and 3 (i.e. odd-valued PEs), which is expected as even_team will be SHMEM_TEAM_INVALID on those PEs. It may be good to only perform the 2d split on even_team for even-numbered PEs (so check even_team != SHMEM_TEAM_INVALID prior to calls to check_2d(even_team, ...)) unless we are intentionally testing that ret is non-zero in such cases. Though if that is the case maybe there should some way to indicate if the split fail is expected because right now the branch in check_2d that prints that message does not increment errors, but I can imagine if there is a broken implementation of shmem_team_split_2d you could encounter a case where the operation fails when it is not expected to fail.

@philipmarshall21
Copy link
Collaborator Author

philipmarshall21 commented Oct 17, 2024

I think it is working as expected:

image

In the case above the "2d split failed" message is always from PEs 1 and 3 (i.e. odd-valued PEs), which is expected as even_team will be SHMEM_TEAM_INVALID on those PEs. It may be good to only perform the 2d split on even_team for even-numbered PEs (so check even_team != SHMEM_TEAM_INVALID prior to calls to check_2d(even_team, ...)) unless we are intentionally testing that ret is non-zero in such cases. Though if that is the case maybe there should some way to indicate if the split fail is expected because right now the branch in check_2d that prints that message does not increment errors, but I can imagine if there is a broken implementation of shmem_team_split_2d you could encounter a case where the operation fails when it is not expected to fail.

I've confirmed that putting check_2d(even_team, ...) calls inside an if (even_team != SHMEM_TEAM_INVALID) conditional does prevent "2d split failed" message from being printed

@davidozog
Copy link
Member

I've confirmed that putting check_2d(even_team, ...) calls inside an if (even_team != SHMEM_TEAM_INVALID) conditional does prevent "2d split failed" message from being printed

Yes, I think this is expected/intended. The spec behavior we are working with is:

If parent_team compares equal to SHMEM_TEAM_INVALID then no new teams will be created, both xaxis_team and yaxis_team will be assigned the value SHMEM_TEAM_INVALID, and shmem_team_split_2d will return an nonzero value.

Thanks!

@davidozog davidozog added this to the v1.5.3 milestone Oct 19, 2024
@davidozog davidozog added the bug label Oct 19, 2024
@davidozog davidozog self-assigned this Oct 19, 2024
@davidozog davidozog merged commit 099caf3 into Sandia-OpenSHMEM:main Oct 23, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants