-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bytes_per_second
to groupby max benchmark.
#13984
base: branch-23.10
Are you sure you want to change the base?
Add bytes_per_second
to groupby max benchmark.
#13984
Conversation
Pull requests from external contributors require approval from a |
Hi, I added a few helper function which I would use also for the other groupby benchmarks. PS: Should I add the other |
Would other benchmarks benefit from those helper functions as well? If not, we can move them to
Sounds reasonable to me. |
I think so. They could be used in a few of the benchmarks to simplify the code. |
/ok to test |
5879d40
to
405ca60
Compare
Hi, the PR is not ready, I had not enough time the last two days. So no need to review it 😃. |
405ca60
to
9a54ab1
Compare
This patch adds memory statistics for the GROUPBY_NVBENCH benchmarks. For this purpose helper functions are introduced to compute the payload size for: - Column - Table - Groupby execution results This patch relates to rapidsai#13735.
9a54ab1
to
c5f49ee
Compare
Hi @PointKernel , @davidwendt, in the current state of the PR every NVBENCH benchmark has PS: Does something like a nested column iterator which iterates over all columns, including child columns, of a table. I've not found anything like this, I'm also unsure if it feasible to implement, but would be great for PPS: benchmark output perf_log.txt |
Hi @Blonck, this is nice work and I'd like to get this change merged. However, it seems that you haven't had a chance to finish up this. Would you mind if I take this from where you left off and finish up? |
You may want to look into using |
Thanks @davidwendt, this is very useful! I will check out the function you suggested. |
…ks (#16126) This PR addresses #13735 for reduction benchmarks. There are 3 new utils added. - `int64_t estimate_size(cudf::table_view)` returns a size estimate for the given table. #13984 was a previous attempt to add a similar utility, but this implementation uses `cudf::row_bit_count()` as suggested in #13984 (comment) instead of manually estimating the size. - `void set_items_processed(State& state, int64_t items_processed_per_iteration)` is a thin wrapper of `State.SetItemsProcessed()`. This wrapper takes `items_processed_per_iteration` as a parameter instead of `total_items_processed`. This could be useful to avoid repeating `State.iterations() * items_processed_per_iteration` in each benchmark class. - `void set_throughputs(nvbench::state& state)` is added as a workaround for NVIDIA/nvbench#175. We sometimes want to set throughput statistics after `state.exec()` calls especially when it is hard to estimate the result size upfront. Here are snippets of reduction benchmarks after this change. ``` $ cpp/build/benchmarks/REDUCTION_BENCH ... ----------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ----------------------------------------------------------------------------------------------------------------- Reduction/bool_all/10000/manual_time 10257 ns 26845 ns 68185 bytes_per_second=929.907M/s items_per_second=975.078M/s Reduction/bool_all/100000/manual_time 11000 ns 27454 ns 63634 bytes_per_second=8.46642G/s items_per_second=9.09075G/s Reduction/bool_all/1000000/manual_time 12671 ns 28658 ns 55261 bytes_per_second=73.5018G/s items_per_second=78.922G/s ... $ cpp/build/benchmarks/REDUCTION_NVBENCH ... ## rank_scan ### [0] NVIDIA RTX A5500 | T | null_probability | data_size | Samples | CPU Time | Noise | GPU Time | Noise | Elem/s | GlobalMem BW | BWUtil | |-----------------|------------------|-----------|---------|------------|--------|------------|-------|----------|--------------|-----------| | I32 | 0 | 10000 | 16992x | 33.544 us | 14.95% | 29.446 us | 5.58% | 82.321M | 5.596 TB/s | 728.54% | | I32 | 0.1 | 10000 | 16512x | 34.358 us | 13.66% | 30.292 us | 2.87% | 80.020M | 5.286 TB/s | 688.17% | | I32 | 0.5 | 10000 | 16736x | 34.058 us | 14.31% | 29.890 us | 3.40% | 81.097M | 5.430 TB/s | 706.89% | ... ``` Note that, when the data type is a 1-byte-width type in the google benchmark result summary, `bytes_per_second` appears to be smaller than `items_per_second`. This is because the former is a multiple of 1000 whereas the latter is a multiple of 1024. They are in fact the same number. Implementation-wise, these are what I'm not sure if I made a best decision. - Each of new utils above is declared and defined in different files. I did this because I could not find a good place to have them all, and they seem to belong to different utilities. Please let me know if there is a better place for them. - All the new utils are defined in the global namespace since other util functions seem to have been defined in the same way. Please let me know if this is not the convention. Authors: - Jihoon Son (https://github.com/jihoonson) Approvers: - Mark Harris (https://github.com/harrism) - David Wendt (https://github.com/davidwendt) URL: #16126
This patch adds memory statistics for the
GROUPBY_NVBENCH
benchmark using themax
aggregation.For this purpose helper functions are introduced to compute the payload size for:
This patch relates to #13735.
Checklist