-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Refactor hash-based algorithms with new cuco data structures #12261
Labels
2 - In Progress
Currently a work in progress
feature request
New feature or request
improvement
Improvement / enhancement to an existing function
libcudf
Affects libcudf (C++/CUDA) code.
Milestone
Comments
PointKernel
added
feature request
New feature or request
Needs Triage
Need team to review and classify
libcudf
Affects libcudf (C++/CUDA) code.
tech debt
improvement
Improvement / enhancement to an existing function
labels
Nov 29, 2022
GregoryKimball
added
0 - Blocked
Cannot progress due to external reasons
and removed
Needs Triage
Need team to review and classify
labels
Dec 1, 2022
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
May 15, 2023
This PR improves `distinct_count` with the new `cuco::static_set` data structure. Related to #12261 Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Nghia Truong (https://github.com/ttnghia) - Divye Gala (https://github.com/divyegala) URL: #13343
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Aug 28, 2023
In JSON tree algorithms of JSON reader, cuco static_map is used as a set. This PR replaces it with static_set. No tests are changed. No significant runtime changes. Addresses part of #12261 Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) URL: #13928
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Sep 26, 2023
Contributes to #12261 This PR refactors `contains_table` to use the new `cuco::static_set` data structure. It also adds a `contains_table` benchmark to track the performance before and after this work. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #14064
PointKernel
added a commit
to NVIDIA/cuCollections
that referenced
this issue
Sep 26, 2023
Depends on #349 This PR adds an example demonstrating how to create multiple subsets with one single storage. It includes necessary changes and cleanups that will unblock orc/parquet dictionary encoding (rapidsai/cudf#12261) to use the new map/set data structures. --------- Co-authored-by: Daniel Juenger <[email protected]>
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Oct 4, 2023
Part of ##12261 This PR simplifies the `contains_column` implementation by invoking `contains_table` and gets rid of the use of the cudf `unordered_multiset`. It also removes the `unordered_multiset` header file from libcudf. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #14238
3 tasks
6 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Feb 29, 2024
Depends on #14849 Contributes to #12261 This PR migrates hash groupby to use the new `cuco::static_set` data structure. It doesn't change any existing libcudf behavior but uncovers the fact that the cudf python `value_counts` doesn't guarantee output orders thus the PR becomes a breaking change. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - David Wendt (https://github.com/davidwendt) - Vyas Ramasubramani (https://github.com/vyasr) URL: #14813
3 tasks
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
May 23, 2024
This completes the final two steps and closes #15537. Also addresses one step of #12261. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - David Wendt (https://github.com/davidwendt) - Shruti Shivakumar (https://github.com/shrshi) - Matthew Roeschke (https://github.com/mroeschke) URL: #15813
3 tasks
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Aug 30, 2024
…::static_map` (#16541) Part of #12261. This PR refactors the dictionary encoding in Parquet writers to migrate from `cuco::legacy::static_map` to `cuco::static_map` to build the dictionaries. ### Performance Results The changes result in +0.08% average speed improvement and +16.22% average memory footprint increase (stems from the adjusted sizes by `cuco::make_window_extent` due to [prime gap](https://en.wikipedia.org/wiki/Prime_gap)) across the benchmark cases extended from #16591 Currently, we do see a roughly 8% speed improvement in map insert and find kernels which is counteracted by the map init and map collect kernels as they have to process 16.22% more slots. With a cuco bump, the average speed improvement will increase from 0.08% to +3% and the memory footprint change will go back from 16.22% to +0%. ### Hardware used for benchmarking ``` `NVIDIA RTX 5880 Ada Generation` * SM Version: 890 (PTX Version: 860) * Number of SMs: 110 * SM Default Clock Rate: 18446744071874 MHz * Global Memory: 23879 MiB Free / 48632 MiB Total * Global Memory Bus Peak: 960 GB/sec (384-bit DDR @10001MHz) * Max Shared Memory: 100 KiB/SM, 48 KiB/Block * L2 Cache Size: 98304 KiB * Maximum Active Blocks: 24/SM * Maximum Active Threads: 1536/SM, 1024/Block * Available Registers: 65536/SM, 65536/Block * ECC Enabled: No ``` Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Yunsong Wang (https://github.com/PointKernel) - David Wendt (https://github.com/davidwendt) URL: #16541
rapids-bot bot
pushed a commit
that referenced
this issue
Sep 18, 2024
This PR refactors `mixed_semi_join` by replacing **cuco** legacy `static_map` with latest `static_set`. Contributes to #12261. Authors: - Srinivas Yadav (https://github.com/srinivasyadav18) - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) URL: #16230
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Oct 12, 2024
…map` (#17049) Part of #12261. This PR refactors ORC writer's dictionary encoding to migrate from `cuco::legacy::static_map` to the new `cuco::static_map`. No performance impact measured. Results [here](#17049 (comment)). Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Vukasin Milovanovic (https://github.com/vuule) URL: #17049
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2 - In Progress
Currently a work in progress
feature request
New feature or request
improvement
Improvement / enhancement to an existing function
libcudf
Affects libcudf (C++/CUDA) code.
Is your feature request related to a problem? Please describe.
cuco::static_map
andcuco::static_multimap
are used to perform hash-based operations in libcudf. Depending on NVIDIA/cuCollections#110, a lot of existing use cases could be replaced withcuco::static_set
orcuco:static_multiset
since the payload part in the hash map is not used.Moreover, some libcudf implementations are still using
concurrent_unordered_map
orunordered_multiset
and they should be refactored withcuco::static_set/multiset
as well.Describe the solution you'd like
Replace existing implementations with new data structures like
cuco::static_set
,cuco::static_multiset
orcuco::static_map
.There should also be benchmarks showing the performance changes after replacement for most works listed below.
Note: under "Related APIs", the
ref::
prefix refers to the cuco device-side API.Part 1: Updates ready to be addressed in libcudf
cuco::static_set
insert
,insert_if
cuco::static_map
(legacy)ref::insert
,ref::find
cuco::static_map
insert_async
,ref::find
cuco::static_set
insert_if_async
,ref::find
cuco::static_set
insert_async
,insert_if_async
,contains_async
,contains_if_async
cudf::detail::unordered_multiset
cuco::static_set
insert_async
,insert_if_async
,contains_async
cudf::detail::unordered_multiset
.unordered_multiset
can be removed once this is done.cuco::static_set
ref::insert_and_find
,retrieve_all
concurrent_unordered_map
cuco::static_map
(legacy)insert_async
,ref::find
Needs migration fromDeleted in #15813concurrent_unordered_map
. Looks like a rational use of hash map. To be replaced bycuco::static_map
cuco::static_set
insert_async
retrieve_all
,ref::insert_and_find
cuco::static_map
, but it is blocked on performance concernscuco::static_map
(experimental)ref::insert
,ref::find
cuco::static_map
(legacy)cuco::static_set
insert
,insert_if
,ref::contains
cuco::static_map
(legacy)contains
cuco::static_set
insert_async
retrieve_all
,ref::insert_and_find
cuco::static_map
(experimental)ref::insert
,ref::find
cuco::static_map
(legacy)cuco::static_multiset
count
,count_outer
,retrieve
,retrieve_outer
cuco::static_multimap
(legacy) Related issues: #11176, #13116,cuco::static_multiset
cuco::static_multimap
(legacy)The text was updated successfully, but these errors were encountered: