-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward-merge branch-24.10 into branch-24.12 #16912
Conversation
For releases, since the polars release cadence is quite a lot faster than rapids, we propose to hard-pin to a known good version. In this case, 1.8.x. At the same time, remove pin in CI scripts and update list of xfailing tests in the polars test suite. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - James Lamb (https://github.com/jameslamb) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #16886
FAILURE - Unable to forward-merge due to an error, manual merge is necessary. Do not use the IMPORTANT: When merging this PR, do not use the auto-merger (i.e. the |
This PR adds `cudf-polars` to the top level build script. Authors: - https://github.com/brandon-b-miller - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Jake Awe (https://github.com/AyodeAwe) URL: #16898
Before when `columns=` was a `cudf.Series/Index` we would call `return array.unique.to_pandas()`, but `.unique` is a method not a property so this would have raised an error. Also took the time to refactor the helper methods here and push down the `errors=` keyword to `Frame._drop_column` Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Bradley Dice (https://github.com/bdice) URL: #16712
This PR is a first pass at #15937. We will close #15937 after #15162 is closed Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #16810
Fixes #16625 This PR fixes a slow implementation of the centroid merging step during the tdigest merge aggregation. Previously it was doing a linear march over the individual tdigests per group and merging them one by one. This led to terrible performance for large numbers of groups. In principle though, all this really was doing was a segmented sort of centroid values. So that's what this PR changes it to. Speedup for 1,000,000 input tidests with 1,000,000 individual groups is ~1000x, ``` Old --------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------------------------------------- TDigest/many_tiny_groups/1000000/1/1/10000/iterations:8/manual_time 7473 ms 7472 ms 8 TDigest/many_tiny_groups2/1000000/1/1/1000/iterations:8/manual_time 7433 ms 7431 ms 8 ``` ``` New --------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------------------------------------- TDigest/many_tiny_groups/1000000/1/1/10000/iterations:8/manual_time 6.72 ms 6.79 ms 8 TDigest/many_tiny_groups2/1000000/1/1/1000/iterations:8/manual_time 1.24 ms 1.32 ms 8 ``` Authors: - https://github.com/nvdbaranec - Muhammad Haseeb (https://github.com/mhaseeb123) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - Nghia Truong (https://github.com/ttnghia) - Mike Wilson (https://github.com/hyperbolic2346) URL: #16780
This PR displays delta's for CPU and GPU usage metrics that are extracted from `cudf.pandas` pytests. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: #16864
…15979) Part of #15903. 1. Introduces the Compressed Sparse Row (CSR) format to store the adjacency information of the column tree. 2. Analogous to `reduce_to_column_tree`, `reduce_to_column_tree_csr` reduces node tree representation to column tree stored in CSR format. TODO: - [x] Correctness test Authors: - Shruti Shivakumar (https://github.com/shrshi) - Vukasin Milovanovic (https://github.com/vuule) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) - Karthikeyan (https://github.com/karthikeyann) - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #15979
Forward-merge triggered by push to branch-24.10 that creates a PR to keep branch-24.12 up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See forward-merger docs for more info.