diff --git a/CHANGELOG.md b/CHANGELOG.md index bce764f59e3..7ecad2c9c39 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,300 @@ +# cuDF 24.04.00 (10 Apr 2024) + +## 🚨 Breaking Changes + +- Restructure pylibcudf/arrow interop facilities ([#15325](https://github.com/rapidsai/cudf/pull/15325)) [@vyasr](https://github.com/vyasr) +- Change exceptions thrown by copying APIs ([#15319](https://github.com/rapidsai/cudf/pull/15319)) [@vyasr](https://github.com/vyasr) +- Change strings_column_view::char_size to return int64 ([#15197](https://github.com/rapidsai/cudf/pull/15197)) [@davidwendt](https://github.com/davidwendt) +- Upgrade to `arrow-14.0.2` ([#15108](https://github.com/rapidsai/cudf/pull/15108)) [@galipremsagar](https://github.com/galipremsagar) +- Add support for `pandas-2.2` in `cudf` ([#15100](https://github.com/rapidsai/cudf/pull/15100)) [@galipremsagar](https://github.com/galipremsagar) +- Deprecate cudf::hashing::spark_murmurhash3_x86_32 ([#15074](https://github.com/rapidsai/cudf/pull/15074)) [@davidwendt](https://github.com/davidwendt) +- Align MultiIndex.get_indexder with pandas 2.2 change ([#15059](https://github.com/rapidsai/cudf/pull/15059)) [@mroeschke](https://github.com/mroeschke) +- Raise an error on import for unsupported GPUs. ([#15053](https://github.com/rapidsai/cudf/pull/15053)) [@bdice](https://github.com/bdice) +- Deprecate datelike isin casting strings to dates to match pandas 2.2 ([#15046](https://github.com/rapidsai/cudf/pull/15046)) [@mroeschke](https://github.com/mroeschke) +- Align concat Series name behavior in pandas 2.2 ([#15032](https://github.com/rapidsai/cudf/pull/15032)) [@mroeschke](https://github.com/mroeschke) +- Add `future_stack` to `DataFrame.stack` ([#15015](https://github.com/rapidsai/cudf/pull/15015)) [@galipremsagar](https://github.com/galipremsagar) +- Deprecate groupby fillna ([#15000](https://github.com/rapidsai/cudf/pull/15000)) [@mroeschke](https://github.com/mroeschke) +- Deprecate replace with categorical columns ([#14988](https://github.com/rapidsai/cudf/pull/14988)) [@mroeschke](https://github.com/mroeschke) +- Deprecate delim_whitespace in read_csv for pandas 2.2 ([#14986](https://github.com/rapidsai/cudf/pull/14986)) [@mroeschke](https://github.com/mroeschke) +- Deprecate parameters similar to pandas 2.2 ([#14984](https://github.com/rapidsai/cudf/pull/14984)) [@mroeschke](https://github.com/mroeschke) +- Add missing atomic operators, refactor atomic operators, move atomic operators to detail namespace. ([#14962](https://github.com/rapidsai/cudf/pull/14962)) [@bdice](https://github.com/bdice) +- Add `pandas-2.x` support in `cudf` ([#14916](https://github.com/rapidsai/cudf/pull/14916)) [@galipremsagar](https://github.com/galipremsagar) +- Use cuco::static_set in the hash-based groupby ([#14813](https://github.com/rapidsai/cudf/pull/14813)) [@PointKernel](https://github.com/PointKernel) + +## 🐛 Bug Fixes + +- Fix an issue with creating a series from scalar when `dtype='category'` ([#15476](https://github.com/rapidsai/cudf/pull/15476)) [@galipremsagar](https://github.com/galipremsagar) +- Update pre-commit-hooks to v0.0.3 ([#15355](https://github.com/rapidsai/cudf/pull/15355)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- [BUG][JNI] Trigger MemoryBuffer.onClosed after memory is freed ([#15351](https://github.com/rapidsai/cudf/pull/15351)) [@abellina](https://github.com/abellina) +- Fix an issue with multiple short list rowgroups using the Parquet chunked reader. ([#15342](https://github.com/rapidsai/cudf/pull/15342)) [@nvdbaranec](https://github.com/nvdbaranec) +- Avoid importing dask-expr if "query-planning" config is `False` ([#15340](https://github.com/rapidsai/cudf/pull/15340)) [@rjzamora](https://github.com/rjzamora) +- Fix gtests/ERROR_TEST errors when run in Debug ([#15317](https://github.com/rapidsai/cudf/pull/15317)) [@davidwendt](https://github.com/davidwendt) +- Fix OOB read in `inflate_kernel` ([#15309](https://github.com/rapidsai/cudf/pull/15309)) [@vuule](https://github.com/vuule) +- Work around a cuFile error when running CSV tests with memcheck ([#15293](https://github.com/rapidsai/cudf/pull/15293)) [@vuule](https://github.com/vuule) +- Fix Doxygen upload directory ([#15291](https://github.com/rapidsai/cudf/pull/15291)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Fix Doxygen check ([#15289](https://github.com/rapidsai/cudf/pull/15289)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Reintroduce PANDAS_GE_220 import ([#15287](https://github.com/rapidsai/cudf/pull/15287)) [@wence-](https://github.com/wence-) +- Fix mean computation for the geometric distribution in the data generator ([#15282](https://github.com/rapidsai/cudf/pull/15282)) [@vuule](https://github.com/vuule) +- Fix Parquet decimal64 stats ([#15281](https://github.com/rapidsai/cudf/pull/15281)) [@etseidl](https://github.com/etseidl) +- Make linking of nvtx3-cpp BUILD_LOCAL_INTERFACE ([#15271](https://github.com/rapidsai/cudf/pull/15271)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Workaround compute-sanitizer memcheck bug ([#15259](https://github.com/rapidsai/cudf/pull/15259)) [@davidwendt](https://github.com/davidwendt) +- Cleanup `hostdevice_vector` and add more APIs ([#15252](https://github.com/rapidsai/cudf/pull/15252)) [@ttnghia](https://github.com/ttnghia) +- Fix number of rows in randomly generated lists columns ([#15248](https://github.com/rapidsai/cudf/pull/15248)) [@vuule](https://github.com/vuule) +- Fix wrong output for `collect_list`/`collect_set` of lists column ([#15243](https://github.com/rapidsai/cudf/pull/15243)) [@ttnghia](https://github.com/ttnghia) +- Fix testchunkedPackTwoPasses to copy from the bounce buffer ([#15220](https://github.com/rapidsai/cudf/pull/15220)) [@abellina](https://github.com/abellina) +- Fix accessing `.columns` by an external API ([#15212](https://github.com/rapidsai/cudf/pull/15212)) [@galipremsagar](https://github.com/galipremsagar) +- [JNI] Disable testChunkedPackTwoPasses for now ([#15210](https://github.com/rapidsai/cudf/pull/15210)) [@abellina](https://github.com/abellina) +- Update labeler and codeowner configs for CMake files ([#15208](https://github.com/rapidsai/cudf/pull/15208)) [@PointKernel](https://github.com/PointKernel) +- Avoid dict normalization in ``__dask_tokenize__`` ([#15187](https://github.com/rapidsai/cudf/pull/15187)) [@rjzamora](https://github.com/rjzamora) +- Fix memcheck error in distinct inner join ([#15164](https://github.com/rapidsai/cudf/pull/15164)) [@PointKernel](https://github.com/PointKernel) +- Remove unneeded script parameters in test_cpp_memcheck.sh ([#15158](https://github.com/rapidsai/cudf/pull/15158)) [@davidwendt](https://github.com/davidwendt) +- Fix `ListColumn.to_pandas()` to retain `list` type ([#15155](https://github.com/rapidsai/cudf/pull/15155)) [@galipremsagar](https://github.com/galipremsagar) +- Avoid factorization in MultiIndex.to_pandas ([#15150](https://github.com/rapidsai/cudf/pull/15150)) [@mroeschke](https://github.com/mroeschke) +- Fix GroupBy.get_group and GroupBy.indices ([#15143](https://github.com/rapidsai/cudf/pull/15143)) [@wence-](https://github.com/wence-) +- Remove `const` from `range_window_bounds::_extent`. ([#15138](https://github.com/rapidsai/cudf/pull/15138)) [@mythrocks](https://github.com/mythrocks) +- DataFrame.columns = ... retains RangeIndex & set dtype ([#15129](https://github.com/rapidsai/cudf/pull/15129)) [@mroeschke](https://github.com/mroeschke) +- Correctly handle output for `GroupBy.apply` when chunk results are reindexed series ([#15109](https://github.com/rapidsai/cudf/pull/15109)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Fix Series.groupby.shift with a MultiIndex ([#15098](https://github.com/rapidsai/cudf/pull/15098)) [@mroeschke](https://github.com/mroeschke) +- Fix reductions when DataFrame has MulitIndex columns ([#15097](https://github.com/rapidsai/cudf/pull/15097)) [@mroeschke](https://github.com/mroeschke) +- Fix deprecation warnings for deprecated hash() calls ([#15095](https://github.com/rapidsai/cudf/pull/15095)) [@davidwendt](https://github.com/davidwendt) +- Add support for arrow `large_string` in `cudf` ([#15093](https://github.com/rapidsai/cudf/pull/15093)) [@galipremsagar](https://github.com/galipremsagar) +- Fix `sort_values` pytest failure with pandas-2.x regression ([#15092](https://github.com/rapidsai/cudf/pull/15092)) [@galipremsagar](https://github.com/galipremsagar) +- Resolve path parsing issues in `get_json_object` ([#15082](https://github.com/rapidsai/cudf/pull/15082)) [@SurajAralihalli](https://github.com/SurajAralihalli) +- Fix bugs in handling of delta encodings ([#15075](https://github.com/rapidsai/cudf/pull/15075)) [@etseidl](https://github.com/etseidl) +- Fix `is_device_write_preferred` in `void_sink` and `user_sink_wrapper` ([#15064](https://github.com/rapidsai/cudf/pull/15064)) [@vuule](https://github.com/vuule) +- Eliminate duplicate allocation of nested string columns ([#15061](https://github.com/rapidsai/cudf/pull/15061)) [@vuule](https://github.com/vuule) +- Raise an error on import for unsupported GPUs. ([#15053](https://github.com/rapidsai/cudf/pull/15053)) [@bdice](https://github.com/bdice) +- Align concat Series name behavior in pandas 2.2 ([#15032](https://github.com/rapidsai/cudf/pull/15032)) [@mroeschke](https://github.com/mroeschke) +- Fix `Index.difference` to handle duplicate values when one of the inputs is empty ([#15016](https://github.com/rapidsai/cudf/pull/15016)) [@galipremsagar](https://github.com/galipremsagar) +- Add `future_stack` to `DataFrame.stack` ([#15015](https://github.com/rapidsai/cudf/pull/15015)) [@galipremsagar](https://github.com/galipremsagar) +- Fix handling of values=None in pylibcudf GroupBy.get_groups ([#14998](https://github.com/rapidsai/cudf/pull/14998)) [@shwina](https://github.com/shwina) +- Fix `DataFrame.sort_index` to respect `ignore_index` on all axis ([#14995](https://github.com/rapidsai/cudf/pull/14995)) [@galipremsagar](https://github.com/galipremsagar) +- Raise for pyarrow array that is tz-aware ([#14980](https://github.com/rapidsai/cudf/pull/14980)) [@mroeschke](https://github.com/mroeschke) +- Direct ``SeriesGroupBy.aggregate`` to ``SeriesGroupBy.agg`` ([#14971](https://github.com/rapidsai/cudf/pull/14971)) [@rjzamora](https://github.com/rjzamora) +- Respect IntervalDtype and CategoricalDtype objects passed by users ([#14961](https://github.com/rapidsai/cudf/pull/14961)) [@mroeschke](https://github.com/mroeschke) +- unset `CUDF_SPILL` after a pytest ([#14958](https://github.com/rapidsai/cudf/pull/14958)) [@galipremsagar](https://github.com/galipremsagar) +- Fix Null literals to be not parsed as string when mixed types as string is enabled in JSON reader ([#14939](https://github.com/rapidsai/cudf/pull/14939)) [@karthikeyann](https://github.com/karthikeyann) +- Fix chunked reads of Parquet delta encoded pages ([#14921](https://github.com/rapidsai/cudf/pull/14921)) [@etseidl](https://github.com/etseidl) +- Fix reading offset for data stream in ORC reader ([#14911](https://github.com/rapidsai/cudf/pull/14911)) [@ttnghia](https://github.com/ttnghia) +- Enable sanitizer check for a test case testORCReadAndWriteForDecimal128 ([#14897](https://github.com/rapidsai/cudf/pull/14897)) [@res-life](https://github.com/res-life) +- Fix dask token normalization ([#14829](https://github.com/rapidsai/cudf/pull/14829)) [@rjzamora](https://github.com/rjzamora) +- Fix 24.04 versions ([#14825](https://github.com/rapidsai/cudf/pull/14825)) [@raydouglass](https://github.com/raydouglass) +- Ensure slow private attrs are maybe proxies ([#14380](https://github.com/rapidsai/cudf/pull/14380)) [@mroeschke](https://github.com/mroeschke) + +## 📖 Documentation + +- Ignore DLManagedTensor in the docs build ([#15392](https://github.com/rapidsai/cudf/pull/15392)) [@davidwendt](https://github.com/davidwendt) +- Revert "Temporarily disable docs errors. ([#15265)" (#15269](https://github.com/rapidsai/cudf/pull/15265)" (#15269)) [@bdice](https://github.com/bdice) +- Temporarily disable docs errors. ([#15265](https://github.com/rapidsai/cudf/pull/15265)) [@bdice](https://github.com/bdice) +- Update `developer_guide.md` with new guidance on quoted internal includes ([#15238](https://github.com/rapidsai/cudf/pull/15238)) [@harrism](https://github.com/harrism) +- Fix broken link for developer guide ([#15025](https://github.com/rapidsai/cudf/pull/15025)) [@sanjana098](https://github.com/sanjana098) +- [DOC] Update typo in docs example of structs_column_wrapper ([#14949](https://github.com/rapidsai/cudf/pull/14949)) [@karthikeyann](https://github.com/karthikeyann) +- Update cudf.pandas FAQ. ([#14940](https://github.com/rapidsai/cudf/pull/14940)) [@bdice](https://github.com/bdice) +- Optimize doc builds ([#14856](https://github.com/rapidsai/cudf/pull/14856)) [@vyasr](https://github.com/vyasr) +- Add developer guideline to use east const. ([#14836](https://github.com/rapidsai/cudf/pull/14836)) [@bdice](https://github.com/bdice) +- Document how cuDF is pronounced ([#14753](https://github.com/rapidsai/cudf/pull/14753)) [@pentschev](https://github.com/pentschev) +- Notes convert to Pandas-compat ([#12641](https://github.com/rapidsai/cudf/pull/12641)) [@Touutae-lab](https://github.com/Touutae-lab) + +## 🚀 New Features + +- Address inconsistency in single quote normalization in JSON reader ([#15324](https://github.com/rapidsai/cudf/pull/15324)) [@shrshi](https://github.com/shrshi) +- Use JNI pinned pool resource with cuIO ([#15255](https://github.com/rapidsai/cudf/pull/15255)) [@abellina](https://github.com/abellina) +- Add DELTA_BYTE_ARRAY encoder for Parquet ([#15239](https://github.com/rapidsai/cudf/pull/15239)) [@etseidl](https://github.com/etseidl) +- Migrate filling operations to pylibcudf ([#15225](https://github.com/rapidsai/cudf/pull/15225)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- [JNI] rmm based pinned pool ([#15219](https://github.com/rapidsai/cudf/pull/15219)) [@abellina](https://github.com/abellina) +- Implement zero-copy host buffer source instead of using an arrow implementation ([#15189](https://github.com/rapidsai/cudf/pull/15189)) [@vuule](https://github.com/vuule) +- Enable creation of columns from scalar ([#15181](https://github.com/rapidsai/cudf/pull/15181)) [@vyasr](https://github.com/vyasr) +- Use NVTX from GitHub. ([#15178](https://github.com/rapidsai/cudf/pull/15178)) [@bdice](https://github.com/bdice) +- Implement `segmented_row_bit_count` for computing row sizes by segments of rows ([#15169](https://github.com/rapidsai/cudf/pull/15169)) [@ttnghia](https://github.com/ttnghia) +- Implement search using pylibcudf ([#15166](https://github.com/rapidsai/cudf/pull/15166)) [@vyasr](https://github.com/vyasr) +- Add distinct left join ([#15149](https://github.com/rapidsai/cudf/pull/15149)) [@PointKernel](https://github.com/PointKernel) +- Add cardinality control for groupby benchs with flat types ([#15134](https://github.com/rapidsai/cudf/pull/15134)) [@PointKernel](https://github.com/PointKernel) +- Add ability to request Parquet encodings on a per-column basis ([#15081](https://github.com/rapidsai/cudf/pull/15081)) [@etseidl](https://github.com/etseidl) +- Automate include grouping order in .clang-format ([#15063](https://github.com/rapidsai/cudf/pull/15063)) [@harrism](https://github.com/harrism) +- Requesting a clean build directory also clears Jitify cache ([#15052](https://github.com/rapidsai/cudf/pull/15052)) [@robertmaynard](https://github.com/robertmaynard) +- API for JSON unquoted whitespace normalization ([#15033](https://github.com/rapidsai/cudf/pull/15033)) [@shrshi](https://github.com/shrshi) +- Implement concatenate, lists.explode, merge, sorting, and stream compaction in pylibcudf ([#15011](https://github.com/rapidsai/cudf/pull/15011)) [@vyasr](https://github.com/vyasr) +- Implement replace in pylibcudf ([#15005](https://github.com/rapidsai/cudf/pull/15005)) [@vyasr](https://github.com/vyasr) +- Add distinct key inner join ([#14990](https://github.com/rapidsai/cudf/pull/14990)) [@PointKernel](https://github.com/PointKernel) +- Implement rolling in pylibcudf ([#14982](https://github.com/rapidsai/cudf/pull/14982)) [@vyasr](https://github.com/vyasr) +- Implement joins in pylibcudf ([#14972](https://github.com/rapidsai/cudf/pull/14972)) [@vyasr](https://github.com/vyasr) +- Implement scans and reductions in pylibcudf ([#14970](https://github.com/rapidsai/cudf/pull/14970)) [@vyasr](https://github.com/vyasr) +- Rewrite cudf internals using pylibcudf groupby ([#14946](https://github.com/rapidsai/cudf/pull/14946)) [@vyasr](https://github.com/vyasr) +- Implement groupby in pylibcudf ([#14945](https://github.com/rapidsai/cudf/pull/14945)) [@vyasr](https://github.com/vyasr) +- Support casting of Map type to string in JSON reader ([#14936](https://github.com/rapidsai/cudf/pull/14936)) [@karthikeyann](https://github.com/karthikeyann) +- POC for whitespace removal in input JSON data using FST ([#14931](https://github.com/rapidsai/cudf/pull/14931)) [@shrshi](https://github.com/shrshi) +- Support for LZ4 compression in ORC and Parquet ([#14906](https://github.com/rapidsai/cudf/pull/14906)) [@vuule](https://github.com/vuule) +- Remove supports_streams from cuDF custom memory resources. ([#14857](https://github.com/rapidsai/cudf/pull/14857)) [@harrism](https://github.com/harrism) +- Migrate unary operations to pylibcudf ([#14850](https://github.com/rapidsai/cudf/pull/14850)) [@vyasr](https://github.com/vyasr) +- Migrate binary operations to pylibcudf ([#14821](https://github.com/rapidsai/cudf/pull/14821)) [@vyasr](https://github.com/vyasr) +- Add row index and stripe size options to Python ORC chunked writer ([#14785](https://github.com/rapidsai/cudf/pull/14785)) [@vuule](https://github.com/vuule) +- Support CUDA 12.2 ([#14712](https://github.com/rapidsai/cudf/pull/14712)) [@jameslamb](https://github.com/jameslamb) + +## 🛠️ Improvements + +- Use `conda env create --yes` instead of `--force` ([#15403](https://github.com/rapidsai/cudf/pull/15403)) [@bdice](https://github.com/bdice) +- Restructure pylibcudf/arrow interop facilities ([#15325](https://github.com/rapidsai/cudf/pull/15325)) [@vyasr](https://github.com/vyasr) +- Change exceptions thrown by copying APIs ([#15319](https://github.com/rapidsai/cudf/pull/15319)) [@vyasr](https://github.com/vyasr) +- Enable branch testing for `cudf.pandas` ([#15316](https://github.com/rapidsai/cudf/pull/15316)) [@galipremsagar](https://github.com/galipremsagar) +- Replace black with ruff-format ([#15312](https://github.com/rapidsai/cudf/pull/15312)) [@mroeschke](https://github.com/mroeschke) +- This fixes an NPE when trying to read empty JSON data by adding a new API for missing information ([#15307](https://github.com/rapidsai/cudf/pull/15307)) [@revans2](https://github.com/revans2) +- Address poor performance of Parquet string decoding ([#15304](https://github.com/rapidsai/cudf/pull/15304)) [@etseidl](https://github.com/etseidl) +- Update script input name ([#15301](https://github.com/rapidsai/cudf/pull/15301)) [@AyodeAwe](https://github.com/AyodeAwe) +- Make test_read_parquet_partitioned_filtered data deterministic ([#15296](https://github.com/rapidsai/cudf/pull/15296)) [@mroeschke](https://github.com/mroeschke) +- Add timeout for `cudf.pandas` pandas tests ([#15284](https://github.com/rapidsai/cudf/pull/15284)) [@galipremsagar](https://github.com/galipremsagar) +- Add upper bound to prevent usage of NumPy 2 ([#15283](https://github.com/rapidsai/cudf/pull/15283)) [@bdice](https://github.com/bdice) +- Fix cudf::test::to_host return of host_vector ([#15263](https://github.com/rapidsai/cudf/pull/15263)) [@davidwendt](https://github.com/davidwendt) +- Implement grouped product scan ([#15254](https://github.com/rapidsai/cudf/pull/15254)) [@wence-](https://github.com/wence-) +- Add CUDA 12.4 to supported PTX versions ([#15247](https://github.com/rapidsai/cudf/pull/15247)) [@brandon-b-miller](https://github.com/brandon-b-miller) +- Implement DataFrame|Series.squeeze ([#15244](https://github.com/rapidsai/cudf/pull/15244)) [@mroeschke](https://github.com/mroeschke) +- Roll back ipow changes due to register pressure. ([#15242](https://github.com/rapidsai/cudf/pull/15242)) [@pmattione-nvidia](https://github.com/pmattione-nvidia) +- Remove create_chars_child_column utility ([#15241](https://github.com/rapidsai/cudf/pull/15241)) [@davidwendt](https://github.com/davidwendt) +- Update dlpack to version 0.8 ([#15237](https://github.com/rapidsai/cudf/pull/15237)) [@dantegd](https://github.com/dantegd) +- Improve performance in JSON reader when `mixed_types_as_string` option is enabled ([#15236](https://github.com/rapidsai/cudf/pull/15236)) [@shrshi](https://github.com/shrshi) +- Remove row conversion code from libcudf ([#15234](https://github.com/rapidsai/cudf/pull/15234)) [@ttnghia](https://github.com/ttnghia) +- Use variable substitution for RAPIDS version in Doxyfile ([#15231](https://github.com/rapidsai/cudf/pull/15231)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Add ListColumns.to_pandas(arrow_type=) ([#15228](https://github.com/rapidsai/cudf/pull/15228)) [@mroeschke](https://github.com/mroeschke) +- Treat dask-cudf CI artifacts as pure wheels ([#15223](https://github.com/rapidsai/cudf/pull/15223)) [@bdice](https://github.com/bdice) +- Clean up usage of __CUDA_ARCH__ and other macros. ([#15218](https://github.com/rapidsai/cudf/pull/15218)) [@bdice](https://github.com/bdice) +- DOC: use constants in performance-comparisons.ipynb ([#15215](https://github.com/rapidsai/cudf/pull/15215)) [@raybellwaves](https://github.com/raybellwaves) +- Rewrite conversion in terms of column ([#15213](https://github.com/rapidsai/cudf/pull/15213)) [@vyasr](https://github.com/vyasr) +- Switch `pytest-xdist` algo to `worksteal` ([#15207](https://github.com/rapidsai/cudf/pull/15207)) [@galipremsagar](https://github.com/galipremsagar) +- Deprecate strings_column_view::offsets_begin() ([#15205](https://github.com/rapidsai/cudf/pull/15205)) [@davidwendt](https://github.com/davidwendt) +- Add `get_upstream_resource` method to `stream_checking_resource_adaptor` ([#15203](https://github.com/rapidsai/cudf/pull/15203)) [@miscco](https://github.com/miscco) +- Tune up row size estimation in the data generator ([#15202](https://github.com/rapidsai/cudf/pull/15202)) [@vuule](https://github.com/vuule) +- Fix `offset` value for generating test data in `parquet_chunked_reader_test.cu` ([#15200](https://github.com/rapidsai/cudf/pull/15200)) [@ttnghia](https://github.com/ttnghia) +- Change strings_column_view::char_size to return int64 ([#15197](https://github.com/rapidsai/cudf/pull/15197)) [@davidwendt](https://github.com/davidwendt) +- Fix includes for row_operators.cuh ([#15194](https://github.com/rapidsai/cudf/pull/15194)) [@davidwendt](https://github.com/davidwendt) +- Generalize GHA selectors for pure Python testing ([#15191](https://github.com/rapidsai/cudf/pull/15191)) [@bdice](https://github.com/bdice) +- Improvements for `__cuda_array_interface__` tests ([#15188](https://github.com/rapidsai/cudf/pull/15188)) [@bdice](https://github.com/bdice) +- Allow to_pandas to return pandas.ArrowDtype ([#15182](https://github.com/rapidsai/cudf/pull/15182)) [@mroeschke](https://github.com/mroeschke) +- Ignore `byte_range` in `read_json` when the size is not smaller than the input data ([#15180](https://github.com/rapidsai/cudf/pull/15180)) [@vuule](https://github.com/vuule) +- Expose new stable_sort and finish stream_compaction in pylibcudf ([#15175](https://github.com/rapidsai/cudf/pull/15175)) [@wence-](https://github.com/wence-) +- [ci] update matrix filters for dask-cudf builds ([#15174](https://github.com/rapidsai/cudf/pull/15174)) [@jameslamb](https://github.com/jameslamb) +- Change make_strings_children to return uvector ([#15171](https://github.com/rapidsai/cudf/pull/15171)) [@davidwendt](https://github.com/davidwendt) +- Don't override to_pandas for Datelike columns ([#15167](https://github.com/rapidsai/cudf/pull/15167)) [@mroeschke](https://github.com/mroeschke) +- Drop python-snappy from dependencies. ([#15161](https://github.com/rapidsai/cudf/pull/15161)) [@bdice](https://github.com/bdice) +- Add microkernels for fixed-width and fixed-width dictionary in Parquet decode ([#15159](https://github.com/rapidsai/cudf/pull/15159)) [@abellina](https://github.com/abellina) +- Make HostColumnVector.DataType accessor methods public ([#15157](https://github.com/rapidsai/cudf/pull/15157)) [@jbrennan333](https://github.com/jbrennan333) +- Java bindings for left outer distinct join ([#15154](https://github.com/rapidsai/cudf/pull/15154)) [@jlowe](https://github.com/jlowe) +- Forward-merge branch-24.02 to branch-24.04 ([#15153](https://github.com/rapidsai/cudf/pull/15153)) [@bdice](https://github.com/bdice) +- Enable pandas pytests for `cudf.pandas` ([#15147](https://github.com/rapidsai/cudf/pull/15147)) [@galipremsagar](https://github.com/galipremsagar) +- Add java option to keep quotes for JSON reads ([#15146](https://github.com/rapidsai/cudf/pull/15146)) [@revans2](https://github.com/revans2) +- Change cross-pandas-version testing in `cudf` ([#15145](https://github.com/rapidsai/cudf/pull/15145)) [@galipremsagar](https://github.com/galipremsagar) +- Use `hostdevice_vector` in `kernel_error` to avoid the pageable copy ([#15140](https://github.com/rapidsai/cudf/pull/15140)) [@vuule](https://github.com/vuule) +- Clean up Columns.astype & cudf.dtype ([#15125](https://github.com/rapidsai/cudf/pull/15125)) [@mroeschke](https://github.com/mroeschke) +- Simplify some to_pandas implementations ([#15123](https://github.com/rapidsai/cudf/pull/15123)) [@mroeschke](https://github.com/mroeschke) +- Java: Add leak tracking for Scalar instances ([#15121](https://github.com/rapidsai/cudf/pull/15121)) [@jlowe](https://github.com/jlowe) +- Remove calls to strings_column_view::offsets_begin() ([#15112](https://github.com/rapidsai/cudf/pull/15112)) [@davidwendt](https://github.com/davidwendt) +- Add support for Python 3.11, require NumPy 1.23+ ([#15111](https://github.com/rapidsai/cudf/pull/15111)) [@jameslamb](https://github.com/jameslamb) +- Compile-time ipow computation with array lookup ([#15110](https://github.com/rapidsai/cudf/pull/15110)) [@pmattione-nvidia](https://github.com/pmattione-nvidia) +- Upgrade to `arrow-14.0.2` ([#15108](https://github.com/rapidsai/cudf/pull/15108)) [@galipremsagar](https://github.com/galipremsagar) +- Dynamically set version in RAPIDS doc builds ([#15101](https://github.com/rapidsai/cudf/pull/15101)) [@jakirkham](https://github.com/jakirkham) +- Add support for `pandas-2.2` in `cudf` ([#15100](https://github.com/rapidsai/cudf/pull/15100)) [@galipremsagar](https://github.com/galipremsagar) +- Update devcontainers to CUDA Toolkit 12.2 ([#15099](https://github.com/rapidsai/cudf/pull/15099)) [@trxcllnt](https://github.com/trxcllnt) +- Fix `datetime` binop pytest failures in pandas-2.2 ([#15090](https://github.com/rapidsai/cudf/pull/15090)) [@galipremsagar](https://github.com/galipremsagar) +- Validate types in pylibcudf Column/Table constructors ([#15088](https://github.com/rapidsai/cudf/pull/15088)) [@wence-](https://github.com/wence-) +- xfail test_join_ordering_pandas_compat for pandas 2.2 ([#15080](https://github.com/rapidsai/cudf/pull/15080)) [@mroeschke](https://github.com/mroeschke) +- Add general purpose host memory allocator reference to cuIO with a demo of pooled-pinned allocation. ([#15079](https://github.com/rapidsai/cudf/pull/15079)) [@nvdbaranec](https://github.com/nvdbaranec) +- Adjust test_binops for pandas 2.2 ([#15078](https://github.com/rapidsai/cudf/pull/15078)) [@mroeschke](https://github.com/mroeschke) +- Remove offsets_begin() call from nvtext::generate_ngrams ([#15077](https://github.com/rapidsai/cudf/pull/15077)) [@davidwendt](https://github.com/davidwendt) +- Use offsetalator in cudf::detail::has_nonempty_null_rows ([#15076](https://github.com/rapidsai/cudf/pull/15076)) [@davidwendt](https://github.com/davidwendt) +- Deprecate cudf::hashing::spark_murmurhash3_x86_32 ([#15074](https://github.com/rapidsai/cudf/pull/15074)) [@davidwendt](https://github.com/davidwendt) +- Fix cudf::test::to_host to handle both offset types for strings columns ([#15073](https://github.com/rapidsai/cudf/pull/15073)) [@davidwendt](https://github.com/davidwendt) +- Add condition for test_groupby_nulls_basic in pandas 2.2 ([#15072](https://github.com/rapidsai/cudf/pull/15072)) [@mroeschke](https://github.com/mroeschke) +- xfail tests in test_udf_masked_ops due to pandas 2.2 bug ([#15071](https://github.com/rapidsai/cudf/pull/15071)) [@mroeschke](https://github.com/mroeschke) +- target branch-24.04 for GitHub Actions workflows ([#15069](https://github.com/rapidsai/cudf/pull/15069)) [@jameslamb](https://github.com/jameslamb) +- Implement stable version of `cudf::sort` ([#15066](https://github.com/rapidsai/cudf/pull/15066)) [@wence-](https://github.com/wence-) +- Fix ORC and JSON tests failures for pandas 2.2 ([#15062](https://github.com/rapidsai/cudf/pull/15062)) [@mroeschke](https://github.com/mroeschke) +- Adjust test_joining for pandas 2.2 ([#15060](https://github.com/rapidsai/cudf/pull/15060)) [@mroeschke](https://github.com/mroeschke) +- Align MultiIndex.get_indexder with pandas 2.2 change ([#15059](https://github.com/rapidsai/cudf/pull/15059)) [@mroeschke](https://github.com/mroeschke) +- Fix test_resample index dtype checking for pandas 2.2 ([#15058](https://github.com/rapidsai/cudf/pull/15058)) [@mroeschke](https://github.com/mroeschke) +- Split out strings/replace.cu and rework its gtests ([#15054](https://github.com/rapidsai/cudf/pull/15054)) [@davidwendt](https://github.com/davidwendt) +- Avoid incompatible value type setting in test_rolling for pandas 2.2 ([#15050](https://github.com/rapidsai/cudf/pull/15050)) [@mroeschke](https://github.com/mroeschke) +- Change chained replace inplace test to COW test for pandas 2.2 ([#15049](https://github.com/rapidsai/cudf/pull/15049)) [@mroeschke](https://github.com/mroeschke) +- Deprecate datelike isin casting strings to dates to match pandas 2.2 ([#15046](https://github.com/rapidsai/cudf/pull/15046)) [@mroeschke](https://github.com/mroeschke) +- Avoid chained indexing in test_indexing for pandas 2.2 ([#15045](https://github.com/rapidsai/cudf/pull/15045)) [@mroeschke](https://github.com/mroeschke) +- Avoid pandas 2.2 `DeprecationWarning` in test_hdf ([#15044](https://github.com/rapidsai/cudf/pull/15044)) [@mroeschke](https://github.com/mroeschke) +- Use appropriate make_offsets_child_column for building lists columns ([#15043](https://github.com/rapidsai/cudf/pull/15043)) [@davidwendt](https://github.com/davidwendt) +- Factor out position-offsets logic from strings split_helper utility ([#15040](https://github.com/rapidsai/cudf/pull/15040)) [@davidwendt](https://github.com/davidwendt) +- Forward-merge branch-24.02 to branch-24.04 ([#15039](https://github.com/rapidsai/cudf/pull/15039)) [@bdice](https://github.com/bdice) +- Clean up nvtx macros ([#15038](https://github.com/rapidsai/cudf/pull/15038)) [@PointKernel](https://github.com/PointKernel) +- Add xfailures for test_applymap for pandas 2.2 ([#15034](https://github.com/rapidsai/cudf/pull/15034)) [@mroeschke](https://github.com/mroeschke) +- Expose libcudf filter expression in read_parquet ([#15028](https://github.com/rapidsai/cudf/pull/15028)) [@wence-](https://github.com/wence-) +- Adjust tests in test_dataframe.py for pandas 2.2 ([#15023](https://github.com/rapidsai/cudf/pull/15023)) [@mroeschke](https://github.com/mroeschke) +- Adjust test_datetime_infer_format for pandas 2.2 ([#15021](https://github.com/rapidsai/cudf/pull/15021)) [@mroeschke](https://github.com/mroeschke) +- Performance optimizations for parquet sub-rowgroup reader. ([#15020](https://github.com/rapidsai/cudf/pull/15020)) [@nvdbaranec](https://github.com/nvdbaranec) +- JNI bindings for distinct_hash_join ([#15019](https://github.com/rapidsai/cudf/pull/15019)) [@jlowe](https://github.com/jlowe) +- Change copy_if_safe to call thrust instead of the overload function ([#15018](https://github.com/rapidsai/cudf/pull/15018)) [@davidwendt](https://github.com/davidwendt) +- Improve performance of copy_if_else for long strings ([#15017](https://github.com/rapidsai/cudf/pull/15017)) [@davidwendt](https://github.com/davidwendt) +- Fix is_string_dtype test for pandas 2.2 ([#15012](https://github.com/rapidsai/cudf/pull/15012)) [@mroeschke](https://github.com/mroeschke) +- Rework cudf::strings::detail::copy_range for offsetalator ([#15010](https://github.com/rapidsai/cudf/pull/15010)) [@davidwendt](https://github.com/davidwendt) +- Use offsetalator in cudf::get_json_object() ([#15009](https://github.com/rapidsai/cudf/pull/15009)) [@davidwendt](https://github.com/davidwendt) +- Align integral types in ORC to specs ([#15008](https://github.com/rapidsai/cudf/pull/15008)) [@vuule](https://github.com/vuule) +- Clean up detail sequence header inclusion ([#15007](https://github.com/rapidsai/cudf/pull/15007)) [@PointKernel](https://github.com/PointKernel) +- Add groupby.apply(include_groups=) to match pandas 2.2 deprecation ([#15006](https://github.com/rapidsai/cudf/pull/15006)) [@mroeschke](https://github.com/mroeschke) +- Use offsetalator in cudf::interleave_columns() ([#15004](https://github.com/rapidsai/cudf/pull/15004)) [@davidwendt](https://github.com/davidwendt) +- Use offsetalator in cudf::row_bit_count() ([#15003](https://github.com/rapidsai/cudf/pull/15003)) [@davidwendt](https://github.com/davidwendt) +- Use offsetalator in cudf::strings::wrap() ([#15002](https://github.com/rapidsai/cudf/pull/15002)) [@davidwendt](https://github.com/davidwendt) +- Use offsetalator in cudf::strings::reverse ([#15001](https://github.com/rapidsai/cudf/pull/15001)) [@davidwendt](https://github.com/davidwendt) +- Deprecate groupby fillna ([#15000](https://github.com/rapidsai/cudf/pull/15000)) [@mroeschke](https://github.com/mroeschke) +- Ensure to_* IO methods respect pandas 2.2 keyword only deprecation ([#14999](https://github.com/rapidsai/cudf/pull/14999)) [@mroeschke](https://github.com/mroeschke) +- Remove unneeded calls to create_chars_child_column utility ([#14997](https://github.com/rapidsai/cudf/pull/14997)) [@davidwendt](https://github.com/davidwendt) +- Add environment-agnostic scripts for running ctests and pytests ([#14992](https://github.com/rapidsai/cudf/pull/14992)) [@trxcllnt](https://github.com/trxcllnt) +- Filter all `DeprecationWarning`'s by `ArrowTable.to_pandas()` ([#14989](https://github.com/rapidsai/cudf/pull/14989)) [@galipremsagar](https://github.com/galipremsagar) +- Deprecate replace with categorical columns ([#14988](https://github.com/rapidsai/cudf/pull/14988)) [@mroeschke](https://github.com/mroeschke) +- Deprecate delim_whitespace in read_csv for pandas 2.2 ([#14986](https://github.com/rapidsai/cudf/pull/14986)) [@mroeschke](https://github.com/mroeschke) +- Deprecate parameters similar to pandas 2.2 ([#14984](https://github.com/rapidsai/cudf/pull/14984)) [@mroeschke](https://github.com/mroeschke) +- Ensure that `ctest` is called with `--no-tests=error`. ([#14983](https://github.com/rapidsai/cudf/pull/14983)) [@bdice](https://github.com/bdice) +- Deprecate non-integer `periods` in `date_range` and `interval_range` ([#14976](https://github.com/rapidsai/cudf/pull/14976)) [@galipremsagar](https://github.com/galipremsagar) +- Update ops-bot.yaml ([#14974](https://github.com/rapidsai/cudf/pull/14974)) [@AyodeAwe](https://github.com/AyodeAwe) +- Use page statistics in Parquet reader ([#14973](https://github.com/rapidsai/cudf/pull/14973)) [@etseidl](https://github.com/etseidl) +- Use fused types for overloaded function signatures ([#14969](https://github.com/rapidsai/cudf/pull/14969)) [@vyasr](https://github.com/vyasr) +- Deprecate certain frequency strings ([#14967](https://github.com/rapidsai/cudf/pull/14967)) [@galipremsagar](https://github.com/galipremsagar) +- Update copyrights for 24.04. ([#14964](https://github.com/rapidsai/cudf/pull/14964)) [@bdice](https://github.com/bdice) +- Add missing atomic operators, refactor atomic operators, move atomic operators to detail namespace. ([#14962](https://github.com/rapidsai/cudf/pull/14962)) [@bdice](https://github.com/bdice) +- Introduce `GetJsonObjectOptions` in `getJSONObject` Java API ([#14956](https://github.com/rapidsai/cudf/pull/14956)) [@SurajAralihalli](https://github.com/SurajAralihalli) +- JNI JSON read with DataSource and infered schema, along with basic java nested Schema JSON reads ([#14954](https://github.com/rapidsai/cudf/pull/14954)) [@revans2](https://github.com/revans2) +- Make codecov only informational (always pass). ([#14952](https://github.com/rapidsai/cudf/pull/14952)) [@bdice](https://github.com/bdice) +- Replace legacy cudf and dask_cudf imports as (d)gd ([#14944](https://github.com/rapidsai/cudf/pull/14944)) [@mroeschke](https://github.com/mroeschke) +- Replace _is_datetime64tz/interval_dtype with isinstance ([#14943](https://github.com/rapidsai/cudf/pull/14943)) [@mroeschke](https://github.com/mroeschke) +- Update tests for pandas 2. ([#14941](https://github.com/rapidsai/cudf/pull/14941)) [@bdice](https://github.com/bdice) +- Use more public pandas APIs ([#14929](https://github.com/rapidsai/cudf/pull/14929)) [@mroeschke](https://github.com/mroeschke) +- Replace local copyright check with pre-commit-hooks verify-copyright ([#14917](https://github.com/rapidsai/cudf/pull/14917)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Add `pandas-2.x` support in `cudf` ([#14916](https://github.com/rapidsai/cudf/pull/14916)) [@galipremsagar](https://github.com/galipremsagar) +- Use offsetalator in nvtext::byte_pair_encoding ([#14888](https://github.com/rapidsai/cudf/pull/14888)) [@davidwendt](https://github.com/davidwendt) +- De-DOS line-endings ([#14880](https://github.com/rapidsai/cudf/pull/14880)) [@wence-](https://github.com/wence-) +- Add detail `cuco_allocator` ([#14877](https://github.com/rapidsai/cudf/pull/14877)) [@PointKernel](https://github.com/PointKernel) +- Move all core types to using enum class in Cython ([#14876](https://github.com/rapidsai/cudf/pull/14876)) [@vyasr](https://github.com/vyasr) +- Read `cudf.__version__` in Sphinx build ([#14872](https://github.com/rapidsai/cudf/pull/14872)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Use int64 offset types for accessing code-points in nvtext::normalize ([#14868](https://github.com/rapidsai/cudf/pull/14868)) [@davidwendt](https://github.com/davidwendt) +- Read version from VERSION file in CMake ([#14867](https://github.com/rapidsai/cudf/pull/14867)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA) +- Update conda-cpp-post-build-checks to branch-24.04. ([#14854](https://github.com/rapidsai/cudf/pull/14854)) [@bdice](https://github.com/bdice) +- Update cudf for compatibility with the latest cuco ([#14849](https://github.com/rapidsai/cudf/pull/14849)) [@PointKernel](https://github.com/PointKernel) +- Remove deprecated strings functions ([#14848](https://github.com/rapidsai/cudf/pull/14848)) [@davidwendt](https://github.com/davidwendt) +- Fix CI workflows for pandas-tests and add test summary. ([#14847](https://github.com/rapidsai/cudf/pull/14847)) [@bdice](https://github.com/bdice) +- Use offsetalator in cudf::strings::copy_slice ([#14844](https://github.com/rapidsai/cudf/pull/14844)) [@davidwendt](https://github.com/davidwendt) +- Fix V2 Parquet page alignment for use with zStandard compression ([#14841](https://github.com/rapidsai/cudf/pull/14841)) [@etseidl](https://github.com/etseidl) +- Fix calls to deprecated strings factory API in examples. ([#14838](https://github.com/rapidsai/cudf/pull/14838)) [@bdice](https://github.com/bdice) +- Update pre-commit hooks ([#14837](https://github.com/rapidsai/cudf/pull/14837)) [@bdice](https://github.com/bdice) +- Use `rapids_cuda_set_runtime` to determine cuda runtime usage by target ([#14833](https://github.com/rapidsai/cudf/pull/14833)) [@vyasr](https://github.com/vyasr) +- Remove get_mem_info functions from custom memory resources ([#14832](https://github.com/rapidsai/cudf/pull/14832)) [@harrism](https://github.com/harrism) +- Fix debug build by splitting row_operator_tests_utilities.cu ([#14826](https://github.com/rapidsai/cudf/pull/14826)) [@davidwendt](https://github.com/davidwendt) +- Remove -DNVBench_ENABLE_CUPTI=OFF. ([#14820](https://github.com/rapidsai/cudf/pull/14820)) [@bdice](https://github.com/bdice) +- Use cuco::static_set in the hash-based groupby ([#14813](https://github.com/rapidsai/cudf/pull/14813)) [@PointKernel](https://github.com/PointKernel) +- Branch 24.04 merge branch 24.02 ([#14809](https://github.com/rapidsai/cudf/pull/14809)) [@vyasr](https://github.com/vyasr) +- Branch 24.04 merge branch 24.02 ([#14806](https://github.com/rapidsai/cudf/pull/14806)) [@vyasr](https://github.com/vyasr) +- Introduce basic "cudf" backend for Dask Expressions ([#14805](https://github.com/rapidsai/cudf/pull/14805)) [@rjzamora](https://github.com/rjzamora) +- Remove `build_struct|list_column` ([#14786](https://github.com/rapidsai/cudf/pull/14786)) [@mroeschke](https://github.com/mroeschke) +- Use offsetalator in nvtext tokenize functions ([#14783](https://github.com/rapidsai/cudf/pull/14783)) [@davidwendt](https://github.com/davidwendt) +- Reduce execution time of Python ORC tests ([#14776](https://github.com/rapidsai/cudf/pull/14776)) [@vuule](https://github.com/vuule) +- Use offsetalator in cudf::strings::split functions ([#14757](https://github.com/rapidsai/cudf/pull/14757)) [@davidwendt](https://github.com/davidwendt) +- Use offsetalator in cudf::strings::findall ([#14745](https://github.com/rapidsai/cudf/pull/14745)) [@davidwendt](https://github.com/davidwendt) +- Use offsetalator in cudf::strings::url_decode ([#14744](https://github.com/rapidsai/cudf/pull/14744)) [@davidwendt](https://github.com/davidwendt) +- Use get_offset_value utility in strings shift function ([#14743](https://github.com/rapidsai/cudf/pull/14743)) [@davidwendt](https://github.com/davidwendt) +- Use as_column instead of full ([#14698](https://github.com/rapidsai/cudf/pull/14698)) [@mroeschke](https://github.com/mroeschke) +- List all notable breaking changes ([#13535](https://github.com/rapidsai/cudf/pull/13535)) [@galipremsagar](https://github.com/galipremsagar) + # cuDF 24.02.00 (12 Feb 2024) ## 🚨 Breaking Changes