Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement in libcudf case conversion for long strings #15441

Merged
merged 15 commits into from
Apr 12, 2024

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented Apr 2, 2024

Description

Improves logic efficiency overall strings case conversion and reworks the specialized kernels for long strings to improve parallelization within each string.
Closes #15406

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 2, 2024
@davidwendt davidwendt self-assigned this Apr 2, 2024
@davidwendt
Copy link
Contributor Author

Existing benchmark show good improvement for long strings

## [0] NVIDIA RTX A6000

| row_width | num_rows |   Ref Time |   Cmp Time |           Diff |   %Diff |
|-----------|----------|------------|------------|----------------|---------|
|    32     |   4096   | 105.801 us |  98.869 us |      -6.932 us |  -6.55% |
|    64     |   4096   | 142.955 us | 138.110 us |      -4.845 us |  -3.39% |
|    128    |   4096   | 186.503 us | 211.072 us |      24.569 us |  13.17% |
|    256    |   4096   | 288.387 us | 214.724 us |     -73.663 us | -25.54% |
|    512    |   4096   | 514.895 us | 224.731 us |    -290.164 us | -56.35% |
|   1024    |   4096   |   1.007 ms | 262.894 us |    -743.903 us | -73.89% |
|   2048    |   4096   |   1.992 ms | 348.787 us |   -1643.051 us | -82.49% |
|    32     |  32768   | 109.572 us | 102.730 us |      -6.843 us |  -6.24% |
|    64     |  32768   | 147.685 us | 143.177 us |      -4.508 us |  -3.05% |
|    128    |  32768   | 237.028 us | 237.712 us |       0.684 us |   0.29% |
|    256    |  32768   | 354.450 us | 340.402 us |     -14.048 us |  -3.96% |
|    512    |  32768   | 651.191 us | 529.697 us |    -121.494 us | -18.66% |
|   1024    |  32768   |   1.365 ms | 914.432 us |    -450.780 us | -33.02% |
|   2048    |  32768   |   2.668 ms |   1.707 ms |    -961.703 us | -36.04% |
|    32     |  262144  | 192.874 us | 174.467 us |     -18.406 us |  -9.54% |
|    64     |  262144  | 401.702 us | 409.239 us |       7.538 us |   1.88% |
|    128    |  262144  |   2.059 ms | 961.064 us |   -1097.503 us | -53.31% |
|    256    |  262144  |   6.995 ms |   1.587 ms |   -5407.260 us | -77.31% |
|    512    |  262144  |  18.476 ms |   2.916 ms |  -15560.385 us | -84.22% |
|   1024    |  262144  |  41.705 ms |   6.044 ms |  -35661.092 us | -85.51% |
|   2048    |  262144  |  98.148 ms |  12.489 ms |  -85658.725 us | -87.28% |
|    32     | 2097152  | 928.088 us | 762.243 us |    -165.845 us | -17.87% |
|    64     | 2097152  |   2.161 ms |   2.022 ms |    -138.884 us |  -6.43% |
|    128    | 2097152  |  14.398 ms |   6.582 ms |   -7816.360 us | -54.29% |
|    256    | 2097152  |  55.781 ms |  11.553 ms |  -44228.492 us | -79.29% |
|    512    | 2097152  | 150.897 ms |  22.285 ms | -128611.893 us | -85.23% |
|    32     | 16777216 |   7.042 ms |   5.710 ms |   -1331.347 us | -18.91% |
|    64     | 16777216 |  16.597 ms |  15.268 ms |   -1329.738 us |  -8.01% |

@davidwendt davidwendt changed the title Performance improvement for libcudf case conversion Performance improvement in libcudf case conversion for long strings Apr 2, 2024
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Apr 11, 2024
@davidwendt davidwendt marked this pull request as ready for review April 11, 2024 18:35
@davidwendt davidwendt requested a review from a team as a code owner April 11, 2024 18:35
@davidwendt davidwendt requested review from bdice and PointKernel April 11, 2024 18:35
to_char_utf8(str_ptr + i, u8);
size += converter.process_character(u8);
}
// this is every so slightly faster than using the cub::warp_reduce
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably due to the high register usage with cub

cpp/src/strings/case.cu Outdated Show resolved Hide resolved
cpp/src/strings/case.cu Outdated Show resolved Hide resolved
@davidwendt davidwendt requested a review from PointKernel April 11, 2024 23:24
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean as always. LGTM

@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit f5df665 into rapidsai:branch-24.06 Apr 12, 2024
70 checks passed
@davidwendt davidwendt deleted the perf-lower branch April 12, 2024 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] lower/upper are really slow for long strings
4 participants