-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_initcap
failed with seed 14
#9247
Comments
This looks like we found a character that we do not translate to upper case properly. I'll try to do some more digging here. |
So the characters in the output do not match the characters listed in the regexp. I don't think sre_yield is doing what we want/expect here and going off of bytes instead of characters. Need to dig in a little bit more to understand if this is expected or not. |
Okay, now I am even more confused. I added in upper and lower in addition to initcap just to see what would happen, and the CPU is doing something rather odd.
We are consistent and make ß upper case as the first character, but the CPU keeps it lower case and I really want to understand why... |
OK on further digging this is a real bug in our code. title case and upper case are not the same thing. We are converting characters to upper case and lower case using the cudf strings::capitalize function. But it converts the values to upper case, which at least in the case of ß is not the same as title case. We are likely going to need some help from cudf to make this work properly. |
CUDF agreed to fix the issue so I will keep this assigned to me to verify the fix. |
Saw another instance of this with a different seed:
|
moved to 24.02 |
Just for reference there are a number of characters that do not match between the CPU and the GPU. It is not too hard to do an exhaustive check in scala.
This showed 265 characters where we got the wrong answer. The code point values for these are.
|
Changing seed to some other values like 14 will fail
test_initcap
:log:
The text was updated successfully, but these errors were encountered: