Releases: ROCm/rocPRIM
Releases · ROCm/rocPRIM
rocPRIM 2.10.13 for ROCm 5.1.1
rocPRIM code for ROCm 5.1.1 did not change. The library was rebuilt for the updated ROCm 5.1.1 stack.
rocPRIM 2.10.13 for ROCm 5.1.0
Fixed
- Fixed radix sort int64_t bug introduced in [2.10.11]
Added
- Future value
- Added device partition_three_way to partition input to three output iterators based on two predicates
Changed
- The reduce/scan algorithm precision issues in the tests has been resolved for half types.
Known issues
- device_segmented_radix_sort unit test failing for HIP on Windows
rocPRIM-2.10.12 for ROCm 5.0.2
rocPRIM code for ROCm 5.0.2 is unchanged from rocPRIM for ROCm 5.0.1. The library was rebuilt for the updated ROCm 5.0.2 stack.
rocPRIM-2.10.12 for ROCm 5.0.1
rocPRIM code for ROCm 5.0.1 is unchanged from rocPRIM for ROCm 5.0.0. The library was rebuilt for the updated ROCm 5.0.1 stack.
rocPRIM-2.10.12 for ROCm 5.0.0
Fixed
- Enable bfloat16 tests and reduce threshold for bfloat16
- Fix device scan limit_size feature
- Non-optimized builds no longer trigger local memory limit errors
Added
- Added scan size limit feature
- Added reduce size limit feature
- Added transform size limit feature
- Add block_load_striped and block_store_striped
- Add gather_to_blocked to gather values from other threads into a blocked arrangement
- The block sizes for device merge sorts initial block sort and its merge steps are now separate in its kernel config
- the block sort step supports multiple items per thread
Changed
- size_limit for scan, reduce and transform can now be set in the config struct instead of a parameter
- Device_scan and device_segmented_scan:
inclusive_scan
now uses the input-type as accumulator-type,exclusive_scan
uses initial-value-type.- This particularly changes behaviour of small-size input types with large-size output types (e.g.
short
input,int
output). - And low-res input with high-res output (e.g.
float
input,double
output)
- This particularly changes behaviour of small-size input types with large-size output types (e.g.
- Revert old Fiji workaround, because they solved the issue at compiler side
- Update README cmake minimum version number
- Block sort support multiple items per thread
- currently only powers of two block sizes, and items per threads are supported and only for full blocks
- Bumped the minimum required version of CMake to 3.16
Known issues
- Unit tests may soft hang on MI200 when running in hipMallocManaged mode.
- device_segmented_radix_sort, device_scan unit tests failing for HIP on Windows
- ReduceEmptyInput cause random faulire with bfloat16
rocPRIM-2.10.11 for ROCm 4.5.2
rocPRIM code for ROCm 4.5.2 is unchanged from rocPRIM for ROCm 4.5.0. The library was rebuilt for the updated ROCm 4.5.2 stack.
rocPRIM-2.10.11 for ROCm 4.5.0
Addded
- Code coverage tools build option
- Address sanitizer build option
- gfx1030 support added.
- Experimental HIP-CPU support; build using GCC/Clang/MSVC on Win/Linux. It is work in progress, many algorithms still known to fail.
- Initial HIP on Windows support. See README for instructions on how to build and install.
- bfloat16 support added.
Optimizations
- Added single tile radix sort for smaller sizes.
- Improved performance for radix sort for larger element sizes.
Changed
- Package renamed to rocprim-dev for
.deb
, and to rocprim-devel for.rpm
. As rocPRIM is a header-only library, there is no associated runtime package, so for compatibility this development package provides the package rocprim. The provides feature in packaging is introduced as a deprecated feature and will be removed in a future rocm release.
Deprecated
- The warp_size() function is now deprecated; please switch to host_warp_size() and device_warp_size() for host and device references respectively.
rocPRIM-2.10.10 for ROCm 4.3.1
No changes made for ROCm 4.3.1.
rocPRIM-2.10.10 for ROCm 4.3.0
Fixed
- Bugfix & minor performance improvement for merge_sort when input and output storage are the same.
Added
- gfx90a support added.
Deprecated
- The warp_size() function is now deprecated; please switch to host_warp_size() and device_warp_size() for host and device references respectively.
rocPRIM-2.10.9 for ROCm 4.2.0
Fixed
- Size zero inputs are now properly handled with newer ROCm builds that no longer allow zero-size kernel grid/block dimensions
Changed
- Minimum cmake version required is now 3.10.2
Known issues
- Device scan unit test currently failing due to LLVM bug.