Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(expr): add regexp_split_to_array #12844

Merged
merged 5 commits into from
Oct 17, 2023

Conversation

xzhseh
Copy link
Contributor

@xzhseh xzhseh commented Oct 14, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

resolve #12509.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

Support regexp_split_to_array function

@xzhseh xzhseh added the user-facing-changes Contains changes that are visible to users label Oct 14, 2023
@xzhseh xzhseh self-assigned this Oct 14, 2023
@codecov
Copy link

codecov bot commented Oct 14, 2023

Codecov Report

Merging #12844 (e1f49fc) into main (fcc2469) will decrease coverage by 0.02%.
The diff coverage is 13.04%.

@@            Coverage Diff             @@
##             main   #12844      +/-   ##
==========================================
- Coverage   69.18%   69.17%   -0.02%     
==========================================
  Files        1489     1489              
  Lines      245832   245901      +69     
==========================================
+ Hits       170083   170091       +8     
- Misses      75749    75810      +61     
Flag Coverage Δ
rust 69.17% <13.04%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/frontend/src/binder/expr/function.rs 78.38% <100.00%> (+0.01%) ⬆️
src/frontend/src/expr/pure.rs 87.69% <ø> (ø)
src/expr/impl/src/scalar/regexp.rs 13.40% <11.76%> (-0.39%) ⬇️

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@TennyZhuang
Copy link
Contributor

TennyZhuang commented Oct 14, 2023

Can you enable regress tests in

-- split string on regexp
--@ SELECT foo, length(foo) FROM regexp_split_to_table('the quick brown fox jumps over the lazy dog', $re$\s+$re$) AS foo;
--@ SELECT regexp_split_to_array('the quick brown fox jumps over the lazy dog', $re$\s+$re$);
--@
--@ SELECT foo, length(foo) FROM regexp_split_to_table('the quick brown fox jumps over the lazy dog', $re$\s*$re$) AS foo;
--@ SELECT regexp_split_to_array('the quick brown fox jumps over the lazy dog', $re$\s*$re$);
--@ SELECT foo, length(foo) FROM regexp_split_to_table('the quick brown fox jumps over the lazy dog', '') AS foo;
--@ SELECT regexp_split_to_array('the quick brown fox jumps over the lazy dog', '');
--@ -- case insensitive
--@ SELECT foo, length(foo) FROM regexp_split_to_table('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'i') AS foo;
--@ SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'i');
--@ -- no match of pattern
--@ SELECT foo, length(foo) FROM regexp_split_to_table('the quick brown fox jumps over the lazy dog', 'nomatch') AS foo;
--@ SELECT regexp_split_to_array('the quick brown fox jumps over the lazy dog', 'nomatch');
--@ -- some corner cases
--@ SELECT regexp_split_to_array('123456','1');
--@ SELECT regexp_split_to_array('123456','6');
--@ SELECT regexp_split_to_array('123456','.');
--@ SELECT regexp_split_to_array('123456','');
--@ SELECT regexp_split_to_array('123456','(?:)');
--@ SELECT regexp_split_to_array('1','');
--@ -- errors
--@ SELECT foo, length(foo) FROM regexp_split_to_table('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'zippy') AS foo;
--@ SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'iz');
--@ -- global option meaningless for regexp_split
--@ SELECT foo, length(foo) FROM regexp_split_to_table('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g') AS foo;
--@ SELECT regexp_split_to_array('thE QUick bROWn FOx jUMPs ovEr The lazy dOG', 'e', 'g');
? Just remove --@ here (See https://github.com/risingwavelabs/risingwave/blob/f2a3fd021059e680b35b24c63cff5f8dbe9f9d5f/src/tests/regress/README.md for details)

You can only enable tests that passed now. Not necessary to support them all.

@TennyZhuang
Copy link
Contributor

🤔

SELECT regexp_split_to_array('123456','1');
2023-10-16T08:36:56.624001Z ERROR risingwave_regress_test::schedule: Diff:
  regexp_split_to_array
 -----------------------
- {"",23456}
+ {23456}
 (1 row)

2023-10-16T08:36:56.624016Z ERROR risingwave_regress_test::schedule: query input:
SELECT regexp_split_to_array('123456','6');
2023-10-16T08:36:56.624032Z ERROR risingwave_regress_test::schedule: Diff:
  regexp_split_to_array
 -----------------------
- {12345,""}
+ {12345}
 (1 row)

2023-10-16T08:36:56.624044Z ERROR risingwave_regress_test::schedule: query input:
SELECT regexp_split_to_array('123456','.');
2023-10-16T08:36:56.624064Z ERROR risingwave_regress_test::schedule: Diff:
- regexp_split_to_array
-------------------------
- {"","","","","","",""}
+ regexp_split_to_array
+-----------------------
+ {}
 (1 row)

Signed-off-by: TennyZhuang <[email protected]>
@xzhseh
Copy link
Contributor Author

xzhseh commented Oct 16, 2023

This is due to, currently, we treat the delimiter which contains empty separated field as pure empty, but postgres treats these as empty string.
I will ensure and conform to postgres's standard for this later.

@TennyZhuang TennyZhuang enabled auto-merge October 17, 2023 02:42
@TennyZhuang TennyZhuang added this pull request to the merge queue Oct 17, 2023
Merged via the queue into main with commit 0f61b00 Oct 17, 2023
8 of 9 checks passed
@TennyZhuang TennyZhuang deleted the xzhseh/feat-regexp-split-to-array branch October 17, 2023 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unsupported function: "regexp_split_to_array"
2 participants