Release [NIGHTLY] v25.02.00 · rapidsai/cudf

🔗 Links

🚨 Breaking Changes

Return empty result for segmented_reduce if input and offsets are both empty (#17437) @davidwendt
Rework minhash APIs for deprecation cycle (#17421) @davidwendt
Change indices for dictionary column to signed integer type (#17390) @davidwendt

🐛 Bug Fixes

Fix a minor potential i32 overflow in thrust::transform_exclusive_scan in PQ reader preprocessing (#17617) @mhaseeb123
Fix dask_cudf.read_csv (#17612) @rjzamora
Fix memcheck error in ReplaceTest.NormalizeNansAndZerosMutable gtest (#17610) @davidwendt
Correctly accept a pandas.CategoricalDtype(pandas.IntervalDtype(...), ...) type (#17604) @mroeschke
Ignore NaN correctly in .quantile (#17593) @mroeschke
Fix ctest fail running libcudf tests in a Debug build (#17576) @davidwendt
Specify a version for rapids_logger dependency (#17573) @jlowe
[JNI] remove rmm argument to set rw access for fabric handles (#17553) @abellina
Document undefined behavior in div_rounding_up_safe (#17542) @davidwendt
Fix nvcc-imposed UB in constexpr functions (#17534) @vuule
Add anonymous namespace to libcudf test source (#17529) @davidwendt
Propagate failures in pandas integration tests and Skip failing tests (#17521) @Matt711
Fix libcudf compile error when logging is disabled (#17512) @davidwendt
Fix Dask-cuDF clip APIs (#17509) @rjzamora
Fix groupby(as_index=False).size not reseting index (#17499) @mroeschke
Revert "Temporarily skip tests due to dask/distributed#8953" (#17492) @Matt711
Workaround for a misaligned access in read_csv on some CUDA versions (#17477) @vuule
Fix some possible thread-id overflow calculations (#17473) @davidwendt
Temporarily skip tests due to dask/distributed#8953 (#17472) @wence-
Support dask>=2024.11.2 in Dask cuDF (#17439) @rjzamora
Fix write_json failure for zero columns in table/struct (#17414) @karthikeyann
Fix Debug-mode failing Arrow test (#17405) @zeroshade
Fix all null list column with missing child column in JSON reader (#17348) @karthikeyann

📖 Documentation

Document interpreter install command for cudf.pandas (#17358) @bdice
add comment to Series.tolist method (#17350) @tequilayu

🚀 New Features

Add JSON reader options structs to pylibcudf (#17614) @Matt711
Add JSON Writer options classes to pylibcudf (#17606) @Matt711
Add ORC reader options structs to pylibcudf (#17601) @Matt711
Add Avro Reader options classes to pylibcudf (#17599) @Matt711
Plumb pylibcudf.io.parquet options classes through cudf python (#17506) @Matt711
Add partition-wise Select support to cuDF-Polars (#17495) @rjzamora
Migrate cudf::io::merge_row_group_metadata to pylibcudf (#17491) @Matt711
Add Parquet Reader options classes to pylibcudf (#17464) @Matt711
Add multi-partition DataFrameScan support to cuDF-Polars (#17441) @rjzamora
Return empty result for segmented_reduce if input and offsets are both empty (#17437) @davidwendt
Abstract polars function expression nodes to ensure they are serializable (#17418) @pentschev
Add CSV Reader options classes to pylibcudf (#17412) @Matt711
Add support for pylibcudf.DataType serialization (#17352) @pentschev
Enable rounding for Decimal32 and Decimal64 in cuDF (#17332) @a-hirota
Remove upper bounds on cuda-python to allow 12.6.2 and 11.8.5 (#17326) @bdice
Expose stream-ordering to groupby APIs (#17324) @shrshi
Migrate ORC Writer to pylibcudf (#17310) @Matt711

🛠️ Improvements

Remove patch that is only needed for clang-tidy to run on test files (#17618) @vyasr
update telemetry actions to fluent-bit friendly style (#17615) @msarahan
Bump the oldest pyarrow version to 14.0.2 in test matrix (#17611) @galipremsagar
Use [[nodiscard]] attribute before __device__ (#17608) @vuule
Use host_vector in flatten_single_pass_aggs (#17605) @vuule
Stop memory_resource.hpp from including itself (#17603) @vyasr
Check if nightlies have succeeded recently enough (#17596) @vyasr
A couple of fixes in rapids-logger usage (#17588) @vyasr
Remove unused functionality in cudf._lib.utils.pyx (#17586) @mroeschke
Use no-sync copy for fixed-width types in cudf::concatenate (#17584) @davidwendt
Remove unused code of json schema in JSON reader (#17581) @karthikeyann
Expose Scalar's constructor and Scalar#getScalarHandle() to public (#17580) @ttnghia
Allow large strings in nvtext benchmarks (#17579) @davidwendt
Remove cudf._lib.reduce in favor of inlining pylibcudf (#17574) @mroeschke
Use batched memcpy when writing ORC statistics (#17572) @vuule
Allow large strings in nvbench strings benchmarks (#17571) @davidwendt
Update version references in workflow (#17568) @AyodeAwe
Enable all json reader options in pylibcudf read_json (#17563) @karthikeyann
Remove cudf._lib.parquet in favor of inlining pylibcudf (#17562) @mroeschke
Fix CMake format in cudf/_lib/CMakeLists.txt (#17559) @mroeschke
Replace direct cudaMemcpyAsync calls with utility functions (within /include) (#17557) @vuule
Remove cudf._lib.interop in favor of inlining pylibcudf (#17555) @mroeschke
gate telemetry dispatch calls on TELEMETRY_ENABLED env var (#17551) @msarahan
Replace direct cudaMemcpyAsync calls with utility functions (within /src) (#17550) @vuule
Remove unused BufferArrayFromVector (#17549) @Matt711
Move cudf._lib.copying to cudf.core._internals (#17548) @mroeschke
Update cuda-python lower bounds to 12.6.2 / 11.8.5 (#17547) @bdice
Fix typos, rename types, and add null_probability benchmark axis for distinct (#17546) @PointKernel
Mark more constexpr functions as device-available (#17545) @vyasr
Use cooperative-groups instead of cub warp-reduce for strings contains (#17540) @davidwendt
Remove cudf._lib.nvtext in favor of inlining pylibcudf (#17535) @mroeschke
Remove unused masked keyword in column_empty (#17530) @mroeschke
Remove Thrust patch in favor of CMake definition for Thrust 32-bit offset types. (#17527) @bdice
[JNI] Enables fabric handles for CUDA async memory pools (#17526) @abellina
Force Thrust to use 32-bit offset type. (#17523) @bdice
Replace cudf::detail::copy_if logic with thrust::copy_if and gather (#17520) @davidwendt
Replaces uses of cudf._lib.Column.from_unique_ptr with pylibcudf.Column.from_libcudf (#17517) @Matt711
Move cudf._lib.aggregation to cudf.core._internals (#17516) @mroeschke
Migrate copy_column and Column.from_scalar to pylibcudf (#17513) @Matt711
Remove cudf._lib.transform in favor of inlining pylibcudf (#17505) @mroeschke
Remove cudf._lib.string.convert/split in favor of inlining pylibcudf (#17496) @mroeschke
Move cudf._lib.sort to cudf.core._internals (#17488) @mroeschke
Remove cudf._lib.csv in favor in inlining pylibcudf (#17485) @mroeschke
Update PyTorch to >=2.4.0 to get fix for CUDA array interface bug, and drop CUDA 11 PyTorch tests. (#17475) @bdice
Remove cudf._lib.binops in favor of inlining pylibcudf (#17468) @mroeschke
Remove cudf._lib.orc in favor of inlining pylibcudf (#17466) @mroeschke
skip most CI on devcontainer-only changes (#17465) @jameslamb
Set build type for all examples (#17463) @vyasr
Update the hook versions in pre-commit (#17462) @wence-
Remove cudf._lib.string_casting in favor of inlining pylibcudf (#17460) @mroeschke
Remove cudf._lib.filling in favor of inlining pylibcudf (#17459) @mroeschke
Update MurmurHash3_x64_128 to use the cuco equivalent implementation (#17457) @PointKernel
Move cudf._lib.stream_compaction to cudf.core._internals (#17456) @mroeschke
Clean up xxhash_64 implementations (#17455) @PointKernel
Update Hadoop dependency in Java pom (#17454) @jlowe
Adapt to rmm logger changes (#17451) @vyasr
Require approval to run CI on draft PRs (#17450) @bdice
Expose stream-ordering in nvtext API (#17446) @shrshi
Use exec_policy_nosync in write_json (#17445) @karthikeyann
Remove cudf._lib.json in favor of inlining pylibcudf (#17443) @mroeschke
Remove cudf._lib.null_mask in favor of inlining pylibcudf (#17440) @mroeschke
Expose stream-ordering in replace API (#17436) @shrshi
Apply clang-tidy autofixes from new rules (#17431) @vyasr
Remove cudf._lib.round in favor of inlining pylibcudf (#17430) @mroeschke
Update MurmurHash3_x86_32 to use the cuco equivalent implementation (#17429) @PointKernel
Remove cudf._lib.replace in favor of inlining pylibcudf (#17428) @mroeschke
Remove nvtx/ranges.hpp include from cuda.cuh (#17427) @davidwendt
Remove the unused detail int_fastdiv.h header (#17426) @PointKernel
Remove cudf._lib.lists in favor of inlining pylibcudf (#17425) @mroeschke
Remove cudf._lib.quantile (#17424) @mroeschke
Remove cudf._lib.rolling in favor of inlining pylibcudf (#17423) @mroeschke
Rework minhash APIs for deprecation cycle (#17421) @davidwendt
Use thread_index_type in binary-ops jit kernel.cu (#17420) @davidwendt
Change binops for-each kernel to thrust::for_each_n (#17419) @davidwendt
Move cudf._lib.search to cudf.core._internals (#17411) @mroeschke
Use grid_1d utilities in copy_range.cuh (#17409) @davidwendt
Remove cudf._lib.text in favor of inlining pylibcudf (#17408) @mroeschke
Run clang-tidy checks in PR CI (#17407) @bdice
Update strings/text source to use grid_1d for thread/block/stride calculations (#17404) @davidwendt
Expose stream-ordering to strings attribute APIs (#17398) @shrshi
Expose stream-ordering to interop APIs (#17397) @shrshi
Remove unused type aliases (#17396) @PointKernel
Remove some cudf._lib.strings files in favor of inlining pylibcudf (#17394) @mroeschke
Update xxhash_64 to utilize the cuco equivalent implementation (#17393) @PointKernel
Change indices for dictionary column to signed integer type (#17390) @davidwendt
Return categorical values in to_numpy/to_cupy (#17388) @mroeschke
Forward-merge branch-24.12 to branch-25.02 (#17379) @bdice
Remove unused IO utilities from cudf python (#17374) @Matt711
Remove cudf._lib.datetime in favor of inlining pylibcudf (#17372) @mroeschke
Remove cudf._lib.join in favor of inlining pylibcudf (#17371) @mroeschke
Remove cudf._lib.merge in favor of inlining pylibcudf (#17370) @mroeschke
Remove cudf._lib.partitioning in favor of inlining pylibcudf (#17369) @mroeschke
Remove cudf._lib.reshape in favor of inlining pylibcudf (#17368) @mroeschke
Remove cudf._lib.timezone in favor of inlining pylibcudf (#17366) @mroeschke
Remove cudf._lib.transpose in favor of inlining pylibcudf (#17365) @mroeschke
Move make_strings_column benchmark to nvbench (#17340) @davidwendt
Improve strings contains/find performance for smaller strings (#17330) @davidwendt
Use rapids-logger to generate the cudf logger (#17307) @vyasr
Add write_parquet to pylibcudf (#17263) @mroeschke
Single-partition Dask executor for cuDF-Polars (#17262) @rjzamora
Add breaking change workflow trigger (#17248) @AyodeAwe
Update to CCCL 2.7.0-rc2. (#17233) @bdice
Make column_empty mask buffer creation consistent with libcudf (#16715) @mroeschke

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NIGHTLY] v25.02.00

🔗 Links

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors