Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve strings contains/find performance for smaller strings #17330

Merged
merged 26 commits into from
Dec 6, 2024

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented Nov 14, 2024

Description

Replaces usage of cudf::string_view::find() with loop and call to cudf::string_view::compare() where possible.
This showed significant performance improvement.
This was also slightly faster than a KMP prototype implementation.
Also updates the find/contains benchmarks to remove the 2GB limit and include column versions of the find APIs.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 14, 2024
@davidwendt davidwendt self-assigned this Nov 14, 2024
@davidwendt davidwendt changed the title Use a KMP search algorithm for strings contains() Improve strings contains/find performance for smaller strings Nov 15, 2024
@davidwendt
Copy link
Contributor Author

Benchmarks show significant speed up for shorter strings for scalar contains and find

## [0] NVIDIA RTX 5880 Ada Generation

| width | num_rows | hits |   api    | target |   Ref Time |   Cmp Time |        Diff |   %Diff |
|-------|----------|------|----------|--------|------------|------------|-------------|---------|
|  32   |  32768   |  20  | contains | scalar |  42.512 us |  38.013 us |   -4.499 us | -10.58% |
|  64   |  32768   |  20  | contains | scalar |  71.327 us |  62.462 us |   -8.865 us | -12.43% |
|  128  |  32768   |  20  | contains | scalar |  55.435 us |  53.488 us |   -1.948 us |  -3.51% |
|  256  |  32768   |  20  | contains | scalar |  70.788 us |  68.642 us |   -2.146 us |  -3.03% |
|  32   |  262144  |  20  | contains | scalar |  75.077 us |  49.207 us |  -25.870 us | -34.46% |
|  64   |  262144  |  20  | contains | scalar | 178.340 us |  89.774 us |  -88.566 us | -49.66% |
|  128  |  262144  |  20  | contains | scalar | 249.517 us | 247.330 us |   -2.187 us |  -0.88% |
|  256  |  262144  |  20  | contains | scalar | 358.435 us | 356.482 us |   -1.953 us |  -0.54% |
|  32   | 2097152  |  20  | contains | scalar | 341.067 us | 170.980 us | -170.087 us | -49.87% |
|  64   | 2097152  |  20  | contains | scalar |   1.020 ms | 424.781 us | -595.535 us | -58.37% |
|  128  | 2097152  |  20  | contains | scalar |   1.797 ms |   1.799 ms |    1.739 us |   0.10% |
|  256  | 2097152  |  20  | contains | scalar |   2.648 ms |   2.651 ms |    2.949 us |   0.11% |
|  32   |  32768   |  80  | contains | scalar |  43.069 us |  36.631 us |   -6.438 us | -14.95% |
|  64   |  32768   |  80  | contains | scalar |  74.226 us |  61.119 us |  -13.107 us | -17.66% |
|  128  |  32768   |  80  | contains | scalar |  61.266 us |  57.107 us |   -4.159 us |  -6.79% |
|  256  |  32768   |  80  | contains | scalar |  76.492 us |  72.194 us |   -4.298 us |  -5.62% |
|  32   |  262144  |  80  | contains | scalar |  89.574 us |  53.098 us |  -36.476 us | -40.72% |
|  64   |  262144  |  80  | contains | scalar | 200.841 us |  96.890 us | -103.951 us | -51.76% |
|  128  |  262144  |  80  | contains | scalar | 279.909 us | 278.216 us |   -1.693 us |  -0.60% |
|  256  |  262144  |  80  | contains | scalar | 387.450 us | 386.376 us |   -1.074 us |  -0.28% |
|  32   | 2097152  |  80  | contains | scalar | 459.934 us | 209.319 us | -250.615 us | -54.49% |
|  64   | 2097152  |  80  | contains | scalar |   1.213 ms | 411.948 us | -800.896 us | -66.03% |
|  128  | 2097152  |  80  | contains | scalar |   2.042 ms |   2.045 ms |    2.777 us |   0.14% |
|  256  | 2097152  |  80  | contains | scalar |   2.882 ms |   2.885 ms |    3.259 us |   0.11% |
## [0] NVIDIA RTX 5880 Ada Generation

| width | num_rows | hits |  api  | target |   Ref Time |   Cmp Time |        Diff |   %Diff |
|-------|----------|------|-------|--------|------------|------------|-------------|---------|
|  32   |  32768   |  20  | find  | scalar |  46.198 us |  42.149 us |   -4.049 us |  -8.76% |
|  64   |  32768   |  20  | find  | scalar |  75.200 us |  70.511 us |   -4.690 us |  -6.24% |
|  128  |  32768   |  20  | find  | scalar |  84.968 us |  81.511 us |   -3.457 us |  -4.07% |
|  256  |  32768   |  20  | find  | scalar | 106.626 us | 100.825 us |   -5.801 us |  -5.44% |
|  32   |  262144  |  20  | find  | scalar |  75.483 us |  58.425 us |  -17.058 us | -22.60% |
|  64   |  262144  |  20  | find  | scalar | 179.695 us | 129.064 us |  -50.631 us | -28.18% |
|  128  |  262144  |  20  | find  | scalar | 437.466 us | 432.743 us |   -4.723 us |  -1.08% |
|  256  |  262144  |  20  | find  | scalar | 571.513 us | 568.183 us |   -3.330 us |  -0.58% |
|  32   | 2097152  |  20  | find  | scalar | 346.495 us | 229.112 us | -117.383 us | -33.88% |
|  64   | 2097152  |  20  | find  | scalar |   1.028 ms | 670.638 us | -357.811 us | -34.79% |
|  128  | 2097152  |  20  | find  | scalar |   3.243 ms |   3.242 ms |   -0.485 us |  -0.01% |
|  256  | 2097152  |  20  | find  | scalar |   4.292 ms |   4.289 ms |   -3.001 us |  -0.07% |
|  32   |  32768   |  80  | find  | scalar |  43.012 us |  41.202 us |   -1.810 us |  -4.21% |
|  64   |  32768   |  80  | find  | scalar |  70.353 us |  71.941 us |    1.588 us |   2.26% |
|  128  |  32768   |  80  | find  | scalar |  96.212 us | 100.672 us |    4.460 us |   4.64% |
|  256  |  32768   |  80  | find  | scalar | 117.319 us | 120.849 us |    3.530 us |   3.01% |
|  32   |  262144  |  80  | find  | scalar |  91.934 us |  67.151 us |  -24.783 us | -26.96% |
|  64   |  262144  |  80  | find  | scalar | 204.427 us | 130.435 us |  -73.992 us | -36.19% |
|  128  |  262144  |  80  | find  | scalar | 581.534 us | 577.699 us |   -3.835 us |  -0.66% |
|  256  |  262144  |  80  | find  | scalar | 709.497 us | 703.679 us |   -5.817 us |  -0.82% |
|  32   | 2097152  |  80  | find  | scalar | 464.751 us | 311.426 us | -153.325 us | -32.99% |
|  64   | 2097152  |  80  | find  | scalar |   1.220 ms | 649.425 us | -570.523 us | -46.77% |
|  128  | 2097152  |  80  | find  | scalar |   4.409 ms |   4.405 ms |   -4.608 us |  -0.10% |
|  256  | 2097152  |  80  | find  | scalar |   5.384 ms |   5.382 ms |   -2.057 us |  -0.04% |

@davidwendt
Copy link
Contributor Author

The contains column API also showed significant improvement overall with this change.

## [0] NVIDIA RTX 5880 Ada Generation

| width | num_rows | hits |   api    | target |   Ref Time |   Cmp Time |         Diff |   %Diff |
|-------|----------|------|----------|--------|------------|------------|--------------|---------|
|  32   |  32768   |  20  | contains | column |  43.186 us |  35.144 us |    -8.042 us | -18.62% |
|  64   |  32768   |  20  | contains | column |  75.852 us |  59.826 us |   -16.026 us | -21.13% |
|  128  |  32768   |  20  | contains | column | 163.517 us |  99.894 us |   -63.624 us | -38.91% |
|  256  |  32768   |  20  | contains | column | 302.529 us | 171.011 us |  -131.518 us | -43.47% |
|  32   |  262144  |  20  | contains | column |  76.081 us |  50.128 us |   -25.953 us | -34.11% |
|  64   |  262144  |  20  | contains | column | 182.916 us |  93.221 us |   -89.695 us | -49.04% |
|  128  |  262144  |  20  | contains | column | 595.423 us | 239.143 us |  -356.280 us | -59.84% |
|  256  |  262144  |  20  | contains | column |   1.173 ms | 456.718 us |  -716.329 us | -61.07% |
|  32   | 2097152  |  20  | contains | column | 386.547 us | 206.650 us |  -179.897 us | -46.54% |
|  64   | 2097152  |  20  | contains | column |   1.098 ms | 482.762 us |  -614.947 us | -56.02% |
|  128  | 2097152  |  20  | contains | column |   4.085 ms |   1.573 ms | -2512.355 us | -61.49% |
|  256  | 2097152  |  20  | contains | column |   8.244 ms |   3.110 ms | -5133.808 us | -62.28% |
|  32   |  32768   |  80  | contains | column |  42.589 us |  33.726 us |    -8.864 us | -20.81% |
|  64   |  32768   |  80  | contains | column |  73.789 us |  58.302 us |   -15.487 us | -20.99% |
|  128  |  32768   |  80  | contains | column | 164.130 us |  97.675 us |   -66.455 us | -40.49% |
|  256  |  32768   |  80  | contains | column | 307.640 us | 165.999 us |  -141.642 us | -46.04% |
|  32   |  262144  |  80  | contains | column |  90.312 us |  53.633 us |   -36.679 us | -40.61% |
|  64   |  262144  |  80  | contains | column | 204.043 us |  96.802 us |  -107.240 us | -52.56% |
|  128  |  262144  |  80  | contains | column | 559.216 us | 187.786 us |  -371.430 us | -66.42% |
|  256  |  262144  |  80  | contains | column | 978.985 us | 265.986 us |  -712.999 us | -72.83% |
|  32   | 2097152  |  80  | contains | column | 498.635 us | 243.698 us |  -254.938 us | -51.13% |
|  64   | 2097152  |  80  | contains | column |   1.261 ms | 451.149 us |  -810.320 us | -64.24% |
|  128  | 2097152  |  80  | contains | column |   3.957 ms |   1.334 ms | -2622.974 us | -66.28% |
|  256  | 2097152  |  80  | contains | column |   7.262 ms |   2.103 ms | -5158.243 us | -71.03% |

The find column API showed only significant improvement with shorter strings

## [0] NVIDIA RTX 5880 Ada Generation

| width | num_rows | hits |  api  | target |   Ref Time |   Cmp Time |        Diff |   %Diff |
|-------|----------|------|-------|--------|------------|------------|-------------|---------|
|  32   |  32768   |  20  | find  | column |  47.391 us |  42.670 us |   -4.721 us |  -9.96% |
|  64   |  32768   |  20  | find  | column |  96.955 us |  88.535 us |   -8.420 us |  -8.68% |
|  128  |  32768   |  20  | find  | column | 105.372 us | 102.967 us |   -2.405 us |  -2.28% |
|  256  |  32768   |  20  | find  | column | 126.030 us | 122.494 us |   -3.537 us |  -2.81% |
|  32   |  262144  |  20  | find  | column |  76.075 us |  62.058 us |  -14.017 us | -18.43% |
|  64   |  262144  |  20  | find  | column | 202.727 us | 150.489 us |  -52.238 us | -25.77% |
|  128  |  262144  |  20  | find  | column | 482.592 us | 481.785 us |   -0.807 us |  -0.17% |
|  256  |  262144  |  20  | find  | column | 624.475 us | 624.277 us |   -0.198 us |  -0.03% |
|  32   | 2097152  |  20  | find  | column | 367.564 us | 251.290 us | -116.274 us | -31.63% |
|  64   | 2097152  |  20  | find  | column |   1.103 ms | 751.289 us | -351.859 us | -31.90% |
|  128  | 2097152  |  20  | find  | column |   3.498 ms |   3.507 ms |    9.486 us |   0.27% |
|  256  | 2097152  |  20  | find  | column |   4.600 ms |   4.604 ms |    3.822 us |   0.08% |
|  32   |  32768   |  80  | find  | column |  46.525 us |  41.206 us |   -5.319 us | -11.43% |
|  64   |  32768   |  80  | find  | column |  95.081 us |  83.233 us |  -11.848 us | -12.46% |
|  128  |  32768   |  80  | find  | column | 120.962 us | 121.114 us |    0.151 us |   0.13% |
|  256  |  32768   |  80  | find  | column | 141.690 us | 141.546 us |   -0.144 us |  -0.10% |
|  32   |  262144  |  80  | find  | column |  92.277 us |  72.436 us |  -19.841 us | -21.50% |
|  64   |  262144  |  80  | find  | column | 223.713 us | 151.144 us |  -72.568 us | -32.44% |
|  128  |  262144  |  80  | find  | column | 627.288 us | 629.337 us |    2.049 us |   0.33% |
|  256  |  262144  |  80  | find  | column | 761.668 us | 764.135 us |    2.468 us |   0.32% |
|  32   | 2097152  |  80  | find  | column | 485.524 us | 335.784 us | -149.740 us | -30.84% |
|  64   | 2097152  |  80  | find  | column |   1.265 ms | 708.428 us | -556.300 us | -43.99% |
|  128  | 2097152  |  80  | find  | column |   4.683 ms |   4.695 ms |   11.324 us |   0.24% |
|  256  | 2097152  |  80  | find  | column |   5.714 ms |   5.724 ms |    9.203 us |   0.16% |

@davidwendt davidwendt changed the base branch from branch-24.12 to branch-25.02 November 21, 2024 16:11
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Dec 3, 2024
@davidwendt davidwendt marked this pull request as ready for review December 3, 2024 16:49
@davidwendt davidwendt requested a review from a team as a code owner December 3, 2024 16:49
@lamarrr lamarrr self-requested a review December 4, 2024 18:32
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two suggestions, otherwise LGTM.

cpp/src/strings/search/find.cu Outdated Show resolved Hide resolved
cpp/src/strings/search/find.cu Outdated Show resolved Hide resolved
@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 38261f8 into rapidsai:branch-25.02 Dec 6, 2024
104 of 105 checks passed
@davidwendt davidwendt deleted the kmp-contains branch December 6, 2024 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants