Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Refactor hash-based algorithms with new cuco data structures #12261

Open
PointKernel opened this issue Nov 29, 2022 · 0 comments
Open

[FEA] Refactor hash-based algorithms with new cuco data structures #12261

PointKernel opened this issue Nov 29, 2022 · 0 comments
Labels
2 - In Progress Currently a work in progress feature request New feature or request improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code.

Comments

@PointKernel
Copy link
Member

PointKernel commented Nov 29, 2022

Is your feature request related to a problem? Please describe.
cuco::static_map and cuco::static_multimap are used to perform hash-based operations in libcudf. Depending on NVIDIA/cuCollections#110, a lot of existing use cases could be replaced with cuco::static_set or cuco:static_multiset since the payload part in the hash map is not used.

Moreover, some libcudf implementations are still using concurrent_unordered_map or unordered_multiset and they should be refactored with cuco::static_set/multiset as well.

Describe the solution you'd like
Replace existing implementations with new data structures like cuco::static_set, cuco::static_multiset or cuco::static_map.

There should also be benchmarks showing the performance changes after replacement for most works listed below.
Note: under "Related APIs", the ref:: prefix refers to the cuco device-side API.

Part 1: Updates ready to be addressed in libcudf

Algorithm Desired Data Structure Related APIs PRs Notes
distinct_count cuco::static_set insert, insert_if #13343
orc dictionary encoding (issue #10495) cuco::static_map (legacy) ref::insert, ref::find #13580 Similar to parquet dictionary encoding but the current implementation is still using a custom dictionary
byte_pair_encoding cuco::static_map insert_async, ref::find #13807 uses heterogeneous lookup
json_tree cuco::static_set insert_if_async, ref::find #13928
search/contains_table cuco::static_set insert_async, insert_if_async, contains_async, contains_if_async #14064 Needs migration from cudf::detail::unordered_multiset
search/contains_column cuco::static_set insert_async, insert_if_async, contains_async #14238 Needs migration from cudf::detail::unordered_multiset. unordered_multiset can be removed once this is done.
hash-based groupby (issue #10401) cuco::static_set ref::insert_and_find, retrieve_all #14813 Needs migration from concurrent_unordered_map
legacy json reader cuco::static_map (legacy) insert_async, ref::find #15813 Needs migration from concurrent_unordered_map. Looks like a rational use of hash map. To be replaced by cuco::static_map Deleted in #15813
distinct cuco::static_set insert_async retrieve_all, ref::insert_and_find #15984 We have work in flight to move this to cuco::static_map, but it is blocked on performance concerns ⚠️ #16484
parquet dictionary encoding cuco::static_map (experimental) ref::insert, ref::find #16541 Needs migration from cuco::static_map (legacy)
mixed_semi_joins cuco::static_set insert, insert_if, ref::contains #16230 Needs migration from cuco::static_map (legacy)
semi/anti joins Refactoring not needed anymore, as they use contains
histogram cuco::static_set insert_async retrieve_all, ref::insert_and_find #16485
orc dictionary encoding cuco::static_map (experimental) ref::insert, ref::find #17049 Needs migration from cuco::static_map (legacy)
joins cuco::static_multiset count, count_outer, retrieve, retrieve_outer Needs migration from cuco::static_multimap (legacy) Related issues: #11176, #13116,
mixed joins cuco::static_multiset Needs migration from cuco::static_multimap (legacy)
@PointKernel PointKernel added feature request New feature or request Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. tech debt improvement Improvement / enhancement to an existing function labels Nov 29, 2022
@GregoryKimball GregoryKimball added 0 - Blocked Cannot progress due to external reasons and removed Needs Triage Need team to review and classify labels Dec 1, 2022
@GregoryKimball GregoryKimball moved this to 23.06 in libcudf Jan 6, 2023
@GregoryKimball GregoryKimball removed the status in libcudf Jan 9, 2023
@PointKernel PointKernel removed the 0 - Blocked Cannot progress due to external reasons label May 12, 2023
rapids-bot bot pushed a commit that referenced this issue May 15, 2023
This PR improves `distinct_count` with the new `cuco::static_set` data structure.

Related to #12261

Authors:
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Divye Gala (https://github.com/divyegala)

URL: #13343
@GregoryKimball GregoryKimball moved this to Story Issue in libcudf Aug 2, 2023
@PointKernel PointKernel removed the status in libcudf Aug 7, 2023
@GregoryKimball GregoryKimball moved this to Story Issue in libcudf Aug 8, 2023
@GregoryKimball GregoryKimball added the 2 - In Progress Currently a work in progress label Aug 10, 2023
rapids-bot bot pushed a commit that referenced this issue Aug 28, 2023
In JSON tree algorithms of JSON reader, cuco static_map is used as a set. This PR replaces it with static_set.
No tests are changed. No significant runtime changes.
Addresses part of  #12261

Authors:
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Yunsong Wang (https://github.com/PointKernel)

URL: #13928
rapids-bot bot pushed a commit that referenced this issue Sep 26, 2023
Contributes to #12261

This PR refactors `contains_table` to use the new `cuco::static_set` data structure. It also adds a `contains_table` benchmark to track the performance before and after this work.

Authors:
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Bradley Dice (https://github.com/bdice)

URL: #14064
PointKernel added a commit to NVIDIA/cuCollections that referenced this issue Sep 26, 2023
Depends on #349 

This PR adds an example demonstrating how to create multiple subsets
with one single storage. It includes necessary changes and cleanups that
will unblock orc/parquet dictionary encoding
(rapidsai/cudf#12261) to use the new map/set
data structures.

---------

Co-authored-by: Daniel Juenger <[email protected]>
rapids-bot bot pushed a commit that referenced this issue Oct 4, 2023
Part of ##12261

This PR simplifies the `contains_column` implementation by invoking `contains_table` and gets rid of the use of the cudf `unordered_multiset`. It also removes the `unordered_multiset` header file from libcudf.

Authors:
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Bradley Dice (https://github.com/bdice)

URL: #14238
@vyasr vyasr removed the tech debt label Feb 23, 2024
rapids-bot bot pushed a commit that referenced this issue Feb 29, 2024
Depends on #14849

Contributes to #12261

This PR migrates hash groupby to use the new `cuco::static_set` data structure. It doesn't change any existing libcudf behavior but uncovers the fact that the cudf python `value_counts` doesn't guarantee output orders thus the PR becomes a breaking change.

Authors:
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #14813
rapids-bot bot pushed a commit that referenced this issue May 23, 2024
This completes the final two steps and closes #15537. Also addresses one step of #12261.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)
  - David Wendt (https://github.com/davidwendt)
  - Shruti Shivakumar (https://github.com/shrshi)
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #15813
rapids-bot bot pushed a commit that referenced this issue Aug 30, 2024
…::static_map` (#16541)

Part of #12261. This PR refactors the dictionary encoding in Parquet writers to migrate from `cuco::legacy::static_map` to `cuco::static_map` to build the dictionaries.

### Performance Results
The changes result in +0.08% average speed improvement and +16.22% average memory footprint increase (stems from the adjusted sizes by `cuco::make_window_extent` due to [prime gap](https://en.wikipedia.org/wiki/Prime_gap)) across the benchmark cases extended from #16591 

Currently, we do see a roughly 8% speed improvement in map insert and find kernels which is counteracted by the map init and map collect kernels as they have to process 16.22% more slots. With a cuco bump, the average speed improvement will increase from 0.08% to +3% and the memory footprint change will go back from 16.22% to +0%.

### Hardware used for benchmarking
```
 `NVIDIA RTX 5880 Ada Generation`
* SM Version: 890 (PTX Version: 860)
* Number of SMs: 110
* SM Default Clock Rate: 18446744071874 MHz
* Global Memory: 23879 MiB Free / 48632 MiB Total
* Global Memory Bus Peak: 960 GB/sec (384-bit DDR @10001MHz)
* Max Shared Memory: 100 KiB/SM, 48 KiB/Block
* L2 Cache Size: 98304 KiB
* Maximum Active Blocks: 24/SM
* Maximum Active Threads: 1536/SM, 1024/Block
* Available Registers: 65536/SM, 65536/Block
* ECC Enabled: No
```

Authors:
  - Muhammad Haseeb (https://github.com/mhaseeb123)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - David Wendt (https://github.com/davidwendt)

URL: #16541
rapids-bot bot pushed a commit that referenced this issue Sep 18, 2024
This PR refactors `mixed_semi_join` by replacing **cuco** legacy `static_map` with latest `static_set`. 
Contributes to #12261.

Authors:
  - Srinivas Yadav (https://github.com/srinivasyadav18)
  - Muhammad Haseeb (https://github.com/mhaseeb123)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - Nghia Truong (https://github.com/ttnghia)

URL: #16230
rapids-bot bot pushed a commit that referenced this issue Oct 12, 2024
…map` (#17049)

Part of #12261. This PR refactors ORC writer's dictionary encoding to migrate from `cuco::legacy::static_map` to the new `cuco::static_map`. No performance impact measured. Results [here](#17049 (comment)).

Authors:
  - Muhammad Haseeb (https://github.com/mhaseeb123)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #17049
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress feature request New feature or request improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code.
Projects
Status: Story Issue
Development

No branches or pull requests

3 participants