-
Notifications
You must be signed in to change notification settings - Fork 912
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add throughput metrics for REDUCTION_BENCH/REDUCTION_NVBENCH benchmar…
…ks (#16126) This PR addresses #13735 for reduction benchmarks. There are 3 new utils added. - `int64_t estimate_size(cudf::table_view)` returns a size estimate for the given table. #13984 was a previous attempt to add a similar utility, but this implementation uses `cudf::row_bit_count()` as suggested in #13984 (comment) instead of manually estimating the size. - `void set_items_processed(State& state, int64_t items_processed_per_iteration)` is a thin wrapper of `State.SetItemsProcessed()`. This wrapper takes `items_processed_per_iteration` as a parameter instead of `total_items_processed`. This could be useful to avoid repeating `State.iterations() * items_processed_per_iteration` in each benchmark class. - `void set_throughputs(nvbench::state& state)` is added as a workaround for NVIDIA/nvbench#175. We sometimes want to set throughput statistics after `state.exec()` calls especially when it is hard to estimate the result size upfront. Here are snippets of reduction benchmarks after this change. ``` $ cpp/build/benchmarks/REDUCTION_BENCH ... ----------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ----------------------------------------------------------------------------------------------------------------- Reduction/bool_all/10000/manual_time 10257 ns 26845 ns 68185 bytes_per_second=929.907M/s items_per_second=975.078M/s Reduction/bool_all/100000/manual_time 11000 ns 27454 ns 63634 bytes_per_second=8.46642G/s items_per_second=9.09075G/s Reduction/bool_all/1000000/manual_time 12671 ns 28658 ns 55261 bytes_per_second=73.5018G/s items_per_second=78.922G/s ... $ cpp/build/benchmarks/REDUCTION_NVBENCH ... ## rank_scan ### [0] NVIDIA RTX A5500 | T | null_probability | data_size | Samples | CPU Time | Noise | GPU Time | Noise | Elem/s | GlobalMem BW | BWUtil | |-----------------|------------------|-----------|---------|------------|--------|------------|-------|----------|--------------|-----------| | I32 | 0 | 10000 | 16992x | 33.544 us | 14.95% | 29.446 us | 5.58% | 82.321M | 5.596 TB/s | 728.54% | | I32 | 0.1 | 10000 | 16512x | 34.358 us | 13.66% | 30.292 us | 2.87% | 80.020M | 5.286 TB/s | 688.17% | | I32 | 0.5 | 10000 | 16736x | 34.058 us | 14.31% | 29.890 us | 3.40% | 81.097M | 5.430 TB/s | 706.89% | ... ``` Note that, when the data type is a 1-byte-width type in the google benchmark result summary, `bytes_per_second` appears to be smaller than `items_per_second`. This is because the former is a multiple of 1000 whereas the latter is a multiple of 1024. They are in fact the same number. Implementation-wise, these are what I'm not sure if I made a best decision. - Each of new utils above is declared and defined in different files. I did this because I could not find a good place to have them all, and they seem to belong to different utilities. Please let me know if there is a better place for them. - All the new utils are defined in the global namespace since other util functions seem to have been defined in the same way. Please let me know if this is not the convention. Authors: - Jihoon Son (https://github.com/jihoonson) Approvers: - Mark Harris (https://github.com/harrism) - David Wendt (https://github.com/davidwendt) URL: #16126
- Loading branch information
Showing
14 changed files
with
314 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
/* | ||
* Copyright (c) 2024, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
#include "benchmark_utilities.hpp" | ||
|
||
void set_items_processed(::benchmark::State& state, int64_t items_processed_per_iteration) | ||
{ | ||
state.SetItemsProcessed(state.iterations() * items_processed_per_iteration); | ||
} | ||
|
||
void set_bytes_processed(::benchmark::State& state, int64_t bytes_processed_per_iteration) | ||
{ | ||
state.SetBytesProcessed(state.iterations() * bytes_processed_per_iteration); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
/* | ||
* Copyright (c) 2024, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
#pragma once | ||
|
||
#include <benchmark/benchmark.h> | ||
|
||
/** | ||
* @brief Sets the number of items processed during the benchmark. | ||
* | ||
* This function could be used instead of ::benchmark::State.SetItemsProcessed() | ||
* to avoid repeatedly computing ::benchmark::State.iterations() * items_processed_per_iteration. | ||
* | ||
* @param state the benchmark state | ||
* @param items_processed_per_iteration number of items processed per iteration | ||
*/ | ||
void set_items_processed(::benchmark::State& state, int64_t items_processed_per_iteration); | ||
|
||
/** | ||
* @brief Sets the number of bytes processed during the benchmark. | ||
* | ||
* This function could be used instead of ::benchmark::State.SetItemsProcessed() | ||
* to avoid repeatedly computing ::benchmark::State.iterations() * bytes_processed_per_iteration. | ||
* | ||
* @param state the benchmark state | ||
* @param bytes_processed_per_iteration number of bytes processed per iteration | ||
*/ | ||
void set_bytes_processed(::benchmark::State& state, int64_t bytes_processed_per_iteration); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
/* | ||
* Copyright (c) 2024, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
#include "nvbench_utilities.hpp" | ||
|
||
#include <nvbench/nvbench.cuh> | ||
|
||
// This function is copied over from | ||
// https://github.com/NVIDIA/nvbench/blob/a171514056e5d6a7f52a035dd6c812fa301d4f4f/nvbench/detail/measure_cold.cu#L190-L224. | ||
void set_throughputs(nvbench::state& state) | ||
{ | ||
double avg_cuda_time = state.get_summary("nv/cold/time/gpu/mean").get_float64("value"); | ||
|
||
if (const auto items = state.get_element_count(); items != 0) { | ||
auto& summ = state.add_summary("nv/cold/bw/item_rate"); | ||
summ.set_string("name", "Elem/s"); | ||
summ.set_string("hint", "item_rate"); | ||
summ.set_string("description", "Number of input elements processed per second"); | ||
summ.set_float64("value", static_cast<double>(items) / avg_cuda_time); | ||
} | ||
|
||
if (const auto bytes = state.get_global_memory_rw_bytes(); bytes != 0) { | ||
const auto avg_used_gmem_bw = static_cast<double>(bytes) / avg_cuda_time; | ||
{ | ||
auto& summ = state.add_summary("nv/cold/bw/global/bytes_per_second"); | ||
summ.set_string("name", "GlobalMem BW"); | ||
summ.set_string("hint", "byte_rate"); | ||
summ.set_string("description", | ||
"Number of bytes read/written per second to the CUDA " | ||
"device's global memory"); | ||
summ.set_float64("value", avg_used_gmem_bw); | ||
} | ||
|
||
{ | ||
const auto peak_gmem_bw = | ||
static_cast<double>(state.get_device()->get_global_memory_bus_bandwidth()); | ||
|
||
auto& summ = state.add_summary("nv/cold/bw/global/utilization"); | ||
summ.set_string("name", "BWUtil"); | ||
summ.set_string("hint", "percentage"); | ||
summ.set_string("description", | ||
"Global device memory utilization as a percentage of the " | ||
"device's peak bandwidth"); | ||
summ.set_float64("value", avg_used_gmem_bw / peak_gmem_bw); | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
/* | ||
* Copyright (c) 2024, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
#pragma once | ||
|
||
namespace nvbench { | ||
struct state; | ||
} | ||
|
||
/** | ||
* @brief Sets throughput statistics, such as "Elem/s", "GlobalMem BW", and "BWUtil" for the | ||
* nvbench results summary. | ||
* | ||
* This function could be used to work around a known issue that the throughput statistics | ||
* should be added before the nvbench::state.exec() call, otherwise they will not be printed | ||
* in the summary. See https://github.com/NVIDIA/nvbench/issues/175 for more details. | ||
*/ | ||
void set_throughputs(nvbench::state& state); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
/* | ||
* Copyright (c) 2024, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
#include "table_utilities.hpp" | ||
|
||
#include <cudf/reduction.hpp> | ||
#include <cudf/transform.hpp> | ||
|
||
#include <cmath> | ||
|
||
int64_t estimate_size(cudf::column_view const& col) | ||
{ | ||
return estimate_size(cudf::table_view({col})); | ||
} | ||
|
||
int64_t estimate_size(cudf::table_view const& view) | ||
{ | ||
// Compute the size in bits for each row. | ||
auto const row_sizes = cudf::row_bit_count(view); | ||
// Accumulate the row sizes to compute a sum. | ||
auto const agg = cudf::make_sum_aggregation<cudf::reduce_aggregation>(); | ||
cudf::data_type sum_dtype{cudf::type_id::INT64}; | ||
auto const total_size_scalar = cudf::reduce(*row_sizes, *agg, sum_dtype); | ||
auto const total_size_in_bits = | ||
static_cast<cudf::numeric_scalar<int64_t>*>(total_size_scalar.get())->value(); | ||
// Convert the size in bits to the size in bytes. | ||
return static_cast<int64_t>(std::ceil(static_cast<double>(total_size_in_bits) / 8)); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
/* | ||
* Copyright (c) 2024, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
#pragma once | ||
|
||
#include <cudf/table/table_view.hpp> | ||
|
||
/** | ||
* @brief Estimates the column size in bytes. | ||
* | ||
* @remark As this function internally uses cudf::row_bit_count() to estimate each row size | ||
* and accumulates them, the returned estimate may be an inexact approximation in some | ||
* cases. See cudf::row_bit_count() for more details. | ||
* | ||
* @param view The column view to estimate its size | ||
*/ | ||
int64_t estimate_size(cudf::column_view const& view); | ||
|
||
/** | ||
* @brief Estimates the table size in bytes. | ||
* | ||
* @remark As this function internally uses cudf::row_bit_count() to estimate each row size | ||
* and accumulates them, the returned estimate may be an inexact approximation in some | ||
* cases. See cudf::row_bit_count() for more details. | ||
* | ||
* @param view The table view to estimate its size | ||
*/ | ||
int64_t estimate_size(cudf::table_view const& view); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.