Releases · NVIDIA/cuDecomp

30 Oct 17:52

v0.4.2

7703aa0

v0.4.2 Latest

Latest

What's Changed

This patch release fixes several build related issues, including updating CMake include search paths for NVSHMEM 3.x support and improper naming of the single precision C2C benchmark executable. Other changes include small corrections to command line argument handling in the benchmark program and functionality updates to the Tayor Green example.

PRs included in this release

Update CMake NVSHMEM include search paths for NVSHMEM 3.x. (#34)
Fix integer conversion of skip_threshold in benchmark program. (#35)
Fix scaling overflow for large grids in R2C benchmark. Correct compilation defines for single precision C2C benchmark. (#37)
Taylor Green example updates. (#36)

Full Changelog: v0.4.1...v0.4.2

Assets 2

20 Apr 23:44

romerojosh

v0.4.1

38b6dea

v0.4.1

What's Changed

This patch release fixes a bug in processor dims handling during autotuning when supplying a fixed process grid introduced in v0.4.0.

PRs included in this release

Fix transposed pdims during autotuning. (#29)
Make CMake library include directory handling more robust. (#30)

Full Changelog: v0.4.0...v0.4.1

Assets 2

14 Mar 19:30

romerojosh

v0.4.0

b8ffecc

v.0.4.0

What's Changed

This release includes a new CMake build process, new and improved autotuning configuration options, and compilation fixes for newer NVHPC releases with CUTENSOR 2.0. This release also includes initial opt-in support for NCCL User Buffer registration.

Breaking changes

#21 changed the attributetranspose_use_inplace_buffers in cudecompGridDescAutotuneOptions_t to an array of boolean values from a single value. This will require updates to C++ code using this autotuning option.

PRs included in this release

Allow to skip certain transpose operations during autotuning. (#16)
Remove unneeded 4 GPU restriction on Fortran autotune example. (#17)
Add CMake build (#15)
Enable autotuner to skip slow configurations via new skip_threshold option (#18)
Lowering CMake build optimization level for host code (#19)
Add support for CUTENSOR 2.0. (#20)
Enable per operation setting for in-place usage when autotuning. (#21)
Enable applying weights to individual transpose operation timings during autotuning (#22)
Add support for NCCL user buffer registration (#23)
Add MPI_Barrier call in NCCL initialization code. (#24)
Make CMake detection of NVHPC compilers more robust (#26)
Move NVSHMEM kernels into separate file to limit application of -rdc=true. (#28)

Full Changelog: v0.3.1...v0.4.0

Assets 2

18 May 16:17

romerojosh

v0.3.1

3cbf456

v0.3.1 Pre-release

Pre-release

This patch release includes bug fixes in the handling of large message sizes with NVSHMEM backend.

Bugfixes:

Fixed handling of large message sizes in NVSHMEM backend. (#13)

Assets 2

24 Apr 17:01

romerojosh

v0.3.0

1e7c80e

v0.3.0 Pre-release

Pre-release

This release includes bug fixes in the handling of user-provided MPI communicators and processor grid configurations yielding empty pencils.

Bugfixes:

Fixed handling of user-provided MPI communicators. (#7)
Fixed handling of processor grid configurations yielding empty pencils. (#11, #12)

Assets 2

07 Sep 19:22

romerojosh

v0.2.0

e364ed5

v0.2.0 Pre-release

Pre-release

This release includes some minor bug fixes and quality of life improvements.

Changes:

Renaming of optional arguments in Fortran interface. (#2)

Bugfixes:

Fixed indexing bug in cudecompGetShiftedRank in Fortran interface. (#1)
Fixed bug with NCCL resource reclamation when using multiple grid descriptors. (#4)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

PRs included in this release

What's Changed

PRs included in this release

What's Changed

Breaking changes

PRs included in this release

Releases: NVIDIA/cuDecomp

v0.4.2

What's Changed

PRs included in this release

v0.4.1

What's Changed

PRs included in this release

v.0.4.0

What's Changed

Breaking changes

PRs included in this release

v0.3.1

v0.3.0

v0.2.0