Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance for long strings for nvtext::replace_tokens #15756

Merged
merged 17 commits into from
May 29, 2024

Conversation

davidwendt
Copy link
Contributor

Description

Improves performance for nvtext::replace_tokens for long strings.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 14, 2024
@davidwendt davidwendt self-assigned this May 14, 2024
@davidwendt
Copy link
Contributor Author

Some performance benchmark results

|row_width|num_rows|   Ref Time |   Cmp Time |           Diff |   %Diff |
|---------|--------|------------|------------|----------------|---------|
|   32    |  4096  |  95.488 us | 100.999 us |       5.512 us |   5.77% |
|   64    |  4096  | 120.956 us | 126.934 us |       5.979 us |   4.94% |
|   128   |  4096  | 193.513 us | 225.669 us |      32.156 us |  16.62% |
|   256   |  4096  | 321.329 us | 232.318 us |     -89.011 us | -27.70% |
|   512   |  4096  | 594.923 us | 269.090 us |    -325.833 us | -54.77% |
|  1024   |  4096  |   1.155 ms | 342.871 us |    -812.532 us | -70.32% |
|   32    | 32768  | 117.734 us | 123.737 us |       6.003 us |   5.10% |
|   64    | 32768  | 164.444 us | 171.198 us |       6.754 us |   4.11% |
|   128   | 32768  | 275.025 us | 333.420 us |      58.395 us |  21.23% |
|   256   | 32768  | 530.392 us | 499.948 us |     -30.444 us |  -5.74% |
|   512   | 32768  |   1.008 ms | 816.931 us |    -190.869 us | -18.94% |
|  1024   | 32768  |   1.856 ms |   1.452 ms |    -404.337 us | -21.78% |
|   32    | 262144 | 307.452 us | 315.685 us |       8.233 us |   2.68% |
|   64    | 262144 | 576.788 us | 583.611 us |       6.823 us |   1.18% |
|   128   | 262144 |   3.243 ms |   1.430 ms |   -1813.488 us | -55.92% |
|   256   | 262144 |   8.539 ms |   2.706 ms |   -5833.149 us | -68.31% |
|   512   | 262144 |  24.445 ms |   5.429 ms |  -19016.762 us | -77.79% |
|  1024   | 262144 |  64.353 ms |  10.712 ms |  -53641.204 us | -83.35% |
|   32    |2097152 |   1.835 ms |   1.841 ms |       5.734 us |   0.31% |
|   64    |2097152 |   4.016 ms |   4.028 ms |      12.260 us |   0.31% |
|   128   |2097152 |  19.600 ms |  10.796 ms |   -8803.548 us | -44.92% |
|   256   |2097152 |  50.128 ms |  21.580 ms |  -28547.777 us | -56.95% |
|   512   |2097152 | 157.250 ms |  44.047 ms | -113202.559 us | -71.99% |
|   32    |16777216|  14.444 ms |  14.421 ms |     -22.290 us |  -0.15% |
|   64    |16777216|  31.994 ms |  31.937 ms |     -56.896 us |  -0.18% |

@davidwendt davidwendt changed the base branch from branch-24.06 to branch-24.08 May 21, 2024 14:30
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels May 22, 2024
@davidwendt davidwendt marked this pull request as ready for review May 24, 2024 18:35
@davidwendt davidwendt requested a review from a team as a code owner May 24, 2024 18:35
@davidwendt davidwendt requested a review from bdice May 24, 2024 18:35
@davidwendt davidwendt requested a review from ttnghia May 24, 2024 18:35
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit ff981a4 into rapidsai:branch-24.08 May 29, 2024
70 checks passed
@davidwendt davidwendt deleted the text-replace-perf branch May 29, 2024 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants