Skip to content

Commit

Permalink
ANN_BENCH: AnnGPU::uses_stream() for optional algo GPU sync (#2314)
Browse files Browse the repository at this point in the history
Introduce a new virtual member `uses_stream()` for the `AnnGPU` class. Overriding this allows an algorithm inform the benchmark whether the stream synchronization is needed between benchmark iterations.

This is relevant for a potential persistent kernel where the CPU threads use an independent mechanics to synchronize and get the results from the GPU.
This is different from just not implementing `AnnGPU` for an algorithm in that it allows the algorithm to decide whether the synchronization is needed (depending on input parameters at runtime), while still providing the `get_sync_stream()` functionality.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #2314
  • Loading branch information
achirkin authored May 15, 2024
1 parent 92d4301 commit 6cc7134
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 2 deletions.
11 changes: 10 additions & 1 deletion cpp/bench/ann/src/common/ann_types.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,16 @@ class AnnGPU {
* end.
*/
[[nodiscard]] virtual auto get_sync_stream() const noexcept -> cudaStream_t = 0;
virtual ~AnnGPU() noexcept = default;
/**
* By default a GPU algorithm uses a fixed stream to order GPU operations.
* However, an algorithm may need to synchronize with the host at the end of its execution.
* In that case, also synchronizing with a benchmark event would put it at disadvantage.
*
* We can disable event sync by passing `false` here
* - ONLY IF THE ALGORITHM HAS PRODUCED ITS OUTPUT BY THE TIME IT SYNCHRONIZES WITH CPU.
*/
[[nodiscard]] virtual auto uses_stream() const noexcept -> bool { return true; }
virtual ~AnnGPU() noexcept = default;
};

template <typename T>
Expand Down
4 changes: 3 additions & 1 deletion cpp/bench/ann/src/common/util.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,9 @@ struct cuda_timer {
static inline auto extract_stream(AnnT* algo) -> std::optional<cudaStream_t>
{
auto gpu_ann = dynamic_cast<AnnGPU*>(algo);
if (gpu_ann != nullptr) { return std::make_optional(gpu_ann->get_sync_stream()); }
if (gpu_ann != nullptr && gpu_ann->uses_stream()) {
return std::make_optional(gpu_ann->get_sync_stream());
}
return std::nullopt;
}

Expand Down

0 comments on commit 6cc7134

Please sign in to comment.