Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Opt] Optimizing the performance of
bitmap_to_csr
(#2516)
This PR optimizes the performance of `bitmap_to_csr` related kernels by 14~1000 times. It could also benefit the `bitset_to_csr` in the future. #### After (Updated Dec 08) ```shell --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------------------------- BitmapToCsrBench<uint32_t, int64_t, float>/0/manual_time 0.161 ms 0.197 ms 4350 rows*cols=1*100000000 sparsity=0.95 BitmapToCsrBench<uint32_t, int64_t, float>/1/manual_time 0.110 ms 0.147 ms 6363 rows*cols=1*100000000 sparsity=0.99 BitmapToCsrBench<uint32_t, int64_t, float>/2/manual_time 14.2 ms 14.2 ms 50 rows*cols=100*100000000 sparsity=0.95 BitmapToCsrBench<uint32_t, int64_t, float>/3/manual_time 8.76 ms 8.80 ms 80 rows*cols=100*100000000 sparsity=0.99 ``` #### Before ```shell --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------------------------- BitmapToCsrBench<uint32_t, int64_t, float>/0/manual_time 176 ms 176 ms 4 rows*cols=1*100000000 sparsity=0.95 BitmapToCsrBench<uint32_t, int64_t, float>/1/manual_time 146 ms 146 ms 5 rows*cols=1*100000000 sparsity=0.99 BitmapToCsrBench<uint32_t, int64_t, float>/2/manual_time 180 ms 180 ms 4 rows*cols=100*100000000 sparsity=0.95 BitmapToCsrBench<uint32_t, int64_t, float>/3/manual_time 148 ms 148 ms 5 rows*cols=100*100000000 sparsity=0.99 ``` Authors: - rhdong (https://github.com/rhdong) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #2516
- Loading branch information