Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add error checking for objects on external heap #1116

Conversation

philipmarshall21
Copy link
Collaborator

@philipmarshall21 philipmarshall21 commented Mar 14, 2024

Now checking if an object resides on the external heap when determining if said object is symmetric. This PR also revises the overlap check to allow overlap in target and source buffers for RMA operations if target_pe != source_pe

@philipmarshall21
Copy link
Collaborator Author

Not convinced the change to SHMEM_ERR_CHECK_SYMMETRIC_HEAP is quite correct yet

@philipmarshall21 philipmarshall21 force-pushed the add_error_check_to_external_heap branch 3 times, most recently from 70b9957 to fad6775 Compare March 19, 2024 15:14
@philipmarshall21 philipmarshall21 force-pushed the add_error_check_to_external_heap branch from fad6775 to 0734a51 Compare March 19, 2024 15:16
@philipmarshall21 philipmarshall21 marked this pull request as ready for review April 2, 2024 17:45
@philipmarshall21 philipmarshall21 force-pushed the add_error_check_to_external_heap branch from 941a87d to d1954f2 Compare April 12, 2024 21:56
@philipmarshall21
Copy link
Collaborator Author

Seems like the overlap checks are missing entirely from data_f.c, synchronization_f.c, and collectives_f.c. Are those files still being maintained / does the SHMEM_ERR_CHECK_OVERLAP check need to be added to them?

@davidozog
Copy link
Member

Seems like the overlap checks are missing entirely from data_f.c, synchronization_f.c, and collectives_f.c. Are those files still being maintained / does the SHMEM_ERR_CHECK_OVERLAP check need to be added to them?

That’s probably not a concern, those are for Fortran interfaces, which are all deprecated.

Copy link
Collaborator

@parkerha1 parkerha1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -287,7 +287,8 @@ SHMEM_PROF_DEF_CTX_PUT_N_SIGNAL_NBI(`mem')
SHMEM_ERR_CHECK_SYMMETRIC(target, sizeof(TYPE) * nelems); \
SHMEM_ERR_CHECK_NULL(source, nelems); \
SHMEM_ERR_CHECK_OVERLAP(target, source, sizeof(TYPE) * \
nelems, sizeof(TYPE) * nelems, 0); \
nelems, sizeof(TYPE) * nelems, 0, \
(shmem_internal_my_pe == pe)); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do the checks in collectives_c get passed a 1 but these are passing the conditional? Do we know the collectives_c macros are already on the same PE?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The "1" value is indicating the source/destination overlap check should always occur for APIs such as the collectives. However, for RMA operations, the source and destination buffers can overlap so long as the destination PE and source PE are different (i.e. shmem_internal_my_pe == pe evaluates to 0, ignoring the overlap check).

ptr_ext, shmem_internal_heap_base, heap_ext); \
} \
} \
else if (shmem_external_heap_base) { \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of devices could provide an external heap? GPU's would be my first thought, but I was under the impression that SOS doesn't support using their memory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of Intel® SHMEM, we are using the external heap to allow SOS to interact with data in device (GPU) memory. SOS supports this usage when configured with --enable-ofi-hmem.

Copy link
Member

@davidozog davidozog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 💯, thanks @philipmarshall21!

@davidozog davidozog added this to the v1.5.3 milestone May 1, 2024
@philipmarshall21 philipmarshall21 merged commit 87442b2 into Sandia-OpenSHMEM:main May 1, 2024
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants