-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fence on sample only #209
Fence on sample only #209
Conversation
Putting in tool_invoked_fence code.
Fixing tool induced fences to always fence on device with DevID 0. Fencing with DevID will be a done in subsequent patch (where Pair object will be used in the hash table to capture the begin sample's information. Note that the pair/tuple object can capture other state information to store between the beginning of sampling event and ending of it.
…tools into fenceOnSampleOnly
Output for stream with Kokkos CUDA backend on Perlmutter with sampler, having fences on, when Kernel logger is sampled. The output shows the change gives correct behavior of the sampler.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only read the environment variable during Init
Passing devID to invoke_ktools_febce() instead of 0 is in a separate PR. Checking fence is done only on devID hasn't been tested in this PR and isn't directly related to this PR.
Here is another test with the globFence check at runtime taken out, i.e., the latest change as advised by @crtrott. The sampler skip rate is set to 7. This shows output of Kokkos stream with the CUDA backend, using the kokkos sampler applied to the kernel logger, on Perlmutter. Note that the device being printed out is not the physical device ID (on, e.g., a node of supercomputer) but a Kokkos execution space identifier.
|
This is the same run as the previous post, but with the KOKKOS_TOOLS_SAMPLER_VERBOSE set to 2 instead of 1. This shows the invocation of the Kokkos Tools tool-induced fence via the print, insta. The print statement for this fence shows the physical device ID (converted from the execution space ID). We see from the output that the device ID is 0. This is correct, given each begin/end tools callback invokes a tool-induced fence using the parameter 0.
|
This PR fences only when a sample event is taken, i.e., at the beginning of the sample in kokkosp_begin_xyz( mykID) and at the end of the corresponding sample in kokkosp_end_xyz(mykID). This improves efficiency of Kokkos Tools, when sampling is done.
Note that provide tools programming interface must be exposed in profiling/all/kp_core.hpp. This wasn't done previously, i.e., it is not in the develop branch of Kokkos Tools, and it is useful for other tools needing the tools programming interface.
Notes from PR #194 are relevant to this PR.