You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I recently was testing some spark cases and ran into some failures related to 0 range patterns.
A{0,} for replaceRegexp NON_CAPTURE
A{0,5} for replaceRegexp NON_CAPTURE
[a0-9]{0,2} for replaceRegexp NON_CAPTURE
(?:ab){0,3} for containsRe NON_CAPTURE
These were for the java APIs, but it should apply to python too. The patch #16798 appears to have caused this some how.
The differences in replace appear to show that it no longer honors the 0 in the range some of the time. For example the pattern A{0,} being replaced with PROD for an input of 'TEST A' produces 'TEST PROD'. But before it would match everywhere and produce 'PRODTPRODEPRODSPRODTPROD PROD PRODPROD'. I think that is an issue for python too
Steps/Code to reproduce bug
The tests failing in Spark are
FAILED ../../src/main/python/regexp_test.py::test_regexp_replace_digit[DATAGEN_SEED=1728593263, TZ=UTC] - AssertionError: GPU and CPU string values are different at [0, 'regexp_repl...
FAILED ../../src/main/python/regexp_test.py::test_re_replace_repetition[DATAGEN_SEED=1728593263, TZ=UTC] - AssertionError: GPU and CPU string values are different at [0, 'regexp_repl...
FAILED ../../src/main/python/regexp_test.py::test_regexp_memory_ok[DATAGEN_SEED=1728593263, TZ=UTC, INJECT_OOM] - AssertionError: GPU and CPU boolean values are different at [0, 'RLIKE(a, (...
But the examples above are the cleaned up versions of the tests.
Expected behavior
It should behave like python or java regular expressions.
The text was updated successfully, but these errors were encountered:
Describe the bug
I recently was testing some spark cases and ran into some failures related to 0 range patterns.
A{0,}
for replaceRegexp NON_CAPTUREA{0,5}
for replaceRegexp NON_CAPTURE[a0-9]{0,2}
for replaceRegexp NON_CAPTURE(?:ab){0,3}
for containsRe NON_CAPTUREThese were for the java APIs, but it should apply to python too. The patch #16798 appears to have caused this some how.
The differences in replace appear to show that it no longer honors the 0 in the range some of the time. For example the pattern
A{0,}
being replaced with PROD for an input of 'TEST A' produces 'TEST PROD'. But before it would match everywhere and produce 'PRODTPRODEPRODSPRODTPROD PROD PRODPROD'. I think that is an issue for python tooSteps/Code to reproduce bug
The tests failing in Spark are
But the examples above are the cleaned up versions of the tests.
Expected behavior
It should behave like python or java regular expressions.
The text was updated successfully, but these errors were encountered: