-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add error checking for objects on external heap #1116
Add error checking for objects on external heap #1116
Conversation
Not convinced the change to SHMEM_ERR_CHECK_SYMMETRIC_HEAP is quite correct yet |
70b9957
to
fad6775
Compare
fad6775
to
0734a51
Compare
941a87d
to
d1954f2
Compare
Seems like the overlap checks are missing entirely from data_f.c, synchronization_f.c, and collectives_f.c. Are those files still being maintained / does the SHMEM_ERR_CHECK_OVERLAP check need to be added to them? |
That’s probably not a concern, those are for Fortran interfaces, which are all deprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -287,7 +287,8 @@ SHMEM_PROF_DEF_CTX_PUT_N_SIGNAL_NBI(`mem') | |||
SHMEM_ERR_CHECK_SYMMETRIC(target, sizeof(TYPE) * nelems); \ | |||
SHMEM_ERR_CHECK_NULL(source, nelems); \ | |||
SHMEM_ERR_CHECK_OVERLAP(target, source, sizeof(TYPE) * \ | |||
nelems, sizeof(TYPE) * nelems, 0); \ | |||
nelems, sizeof(TYPE) * nelems, 0, \ | |||
(shmem_internal_my_pe == pe)); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do the checks in collectives_c get passed a 1 but these are passing the conditional? Do we know the collectives_c macros are already on the same PE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. The "1" value is indicating the source/destination overlap check should always occur for APIs such as the collectives. However, for RMA operations, the source and destination buffers can overlap so long as the destination PE and source PE are different (i.e. shmem_internal_my_pe == pe
evaluates to 0, ignoring the overlap check).
ptr_ext, shmem_internal_heap_base, heap_ext); \ | ||
} \ | ||
} \ | ||
else if (shmem_external_heap_base) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kind of devices could provide an external heap? GPU's would be my first thought, but I was under the impression that SOS doesn't support using their memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of Intel® SHMEM, we are using the external heap to allow SOS to interact with data in device (GPU) memory. SOS supports this usage when configured with --enable-ofi-hmem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great 💯, thanks @philipmarshall21!
Now checking if an object resides on the external heap when determining if said object is symmetric. This PR also revises the overlap check to allow overlap in target and source buffers for RMA operations if
target_pe != source_pe