v24.04.00
🐛 Bug Fixes
- Update pre-commit-hooks to v0.0.3 (#2239) @KyleFromNVIDIA
- MAINT: Simplify NCCL worker rank identification (#2228) @VibhuJawa
- Fix bug in blockRankedReduce (#2226) @akifcorduk
- Fix illegal acces mean/stdev, sum add Kahan Summation (#2223) @mfoerste4
- Batch cutlass distance kernels along N matrix dim (#2215) @mdoijade
- Fix out of bounds access in sum kernel (#2183) @tfeher
- Fix ANN bench ground truth generation for k>1024 (#2180) @tfeher
- Fixing cusparse aligned address issue and adding note (#2179) @cjnolet
- Launch
neighborhood_recall
kernel on CUDA stream (#2156) @divyegala - Add
compile-library
by default on pylibraft build (#2090) @lowener
📖 Documentation
🚀 New Features
- Add CAGRA-Q to ANN benchmarks (#2233) @achirkin
- Add CAGRA-Q build (compression) (#2213) @achirkin
- CAGRA-Q search (#2206) @enp1s0
- Demangle backtrace symbols on raft error (#2188) @achirkin
- Reapply: Support for fp16 in CAGRA and IVF-PQ (#2172) @achirkin
- Remove supports_streams from custom RAFT memory resources (#2121) @harrism
- [FEA] Add support for bitmap_view & the API of
bitmap_to_csr
(#2109) @rhdong
🛠️ Improvements
- Use
conda env create --yes
instead of--force
(#2247) @bdice - Align ucx version pinning with ucx-py/ucxx. (#2227) @bdice
- Add upper bound to prevent usage of NumPy 2 (#2222) @bdice
- Performance optimization of IVF-flat / select_k (#2221) @mfoerste4
- Replace local copyright check with pre-commit-hooks verify-copyright (#2220) @KyleFromNVIDIA
- Remove hard-coding of RAPIDS version where possible (#2219) @KyleFromNVIDIA
- Fix style. (#2214) @bdice
- Add explicit instantiations for IVF-PQ search kernels used in tests (#2212) @tfeher
- Improve RBC eps-neighborhood query performance (#2211) @mfoerste4
- Add test for spmm (#2210) @mfoerste4
- Only install necessary components in conda packages. (#2209) @bdice
- Automate C++ include file grouping and ordering using clang-format (#2202) @harrism
- Add support for Python 3.11, require NumPy 1.23+ (#2200) @jameslamb
- Pass
std::optional
instead ofthrust::optional
to RMM (#2199) @trxcllnt - Update devcontainers to CUDA Toolkit 12.2 (#2192) @trxcllnt
- target branch-24.04 for GitHub Actions workflows (#2189) @jameslamb
- Fixing workaround for cuSPARSE bug with correct copy dimensions (#2185) @mfoerste4
- Allow topk larger than 1024 in CAGRA (#2181) @benfred
- IVF-FLAT support k > 256 (#2169) @mfoerste4
- Add environment-agnostic scripts for running ctests and pytests (#2165) @trxcllnt
- Ensure that
ctest
is called with--no-tests=error
. (#2163) @bdice - Update ops-bot.yaml (#2158) @AyodeAwe
- random sampling of dataset rows with improved memory utilization (#2155) @tfeher
- [FIX] Ensure hnswlib can be found from RAFT's build dir (#2145) @trxcllnt
- Improve analysis experience for ANN benchmarks (#2139) @achirkin
- Enable CAGRA index building without adding dataset to the index (#2126) @tfeher
- Add fused cosine 1-NN cutlass based kernel (#2125) @mdoijade
- Update raft for compatibility with the latest cuco (#2118) @PointKernel
- Support CUDA 12.2 (#2092) @jameslamb
- Cache IVF-PQ and select-warpsort kernel launch parameters to reduce latency (#1786) @achirkin