Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(expr): add json path functions #13568

Merged
merged 14 commits into from
Dec 5, 2023
Merged

feat(expr): add json path functions #13568

merged 14 commits into from
Dec 5, 2023

Conversation

wangrunji0408
Copy link
Contributor

@wangrunji0408 wangrunji0408 commented Nov 21, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

resolve #11372

This PR implements all jsonpath operators and functions:

jsonb @? jsonpath → boolean
jsonb @@ jsonpath → boolean
jsonb_path_exists ( target jsonb, path jsonpath [, vars jsonb [, silent boolean ]] ) → boolean
jsonb_path_match ( target jsonb, path jsonpath [, vars jsonb [, silent boolean ]] ) → boolean
jsonb_path_query ( target jsonb, path jsonpath [, vars jsonb [, silent boolean ]] ) → setof jsonb
jsonb_path_query_array ( target jsonb, path jsonpath [, vars jsonb [, silent boolean ]] ) → jsonb
jsonb_path_query_first ( target jsonb, path jsonpath [, vars jsonb [, silent boolean ]] ) → jsonb

This PR doesn't introduce the jsonpath data type. However, this does not affect the normal use of the above functions. All jsonpath arguments in the signature are actually varchar. If we have jsonpath type, there will be only one parsing during 'literal' :: jsonpath. Currently, parsing happens every time the eval is called (not every time evaluating a row). I think it's not a big problem.

The jsonpath parsing and query logic is separated into an independent crate: sql-json-path. Please also review it if you are interested. (After investigation, I found no existing crate that can meet our requirements. Most of them follow the IETF standard and only support serde-json.)

Almost all jsonpath syntax is supported, except .datetime() and some escaped string literal. I use Postgres regression test jsonb_jsonpath.sql to ensure correctness. Please check this file to see what is (not) supported yet. Detailed explanation: #13568 (comment)

This PR is based on #13545, which is needed to run regression test. After it merged, I will rebase this PR onto main.

References:

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Support jsonpath operators and functions:

jsonb @? jsonpath → boolean
jsonb @@ jsonpath → boolean
jsonb_path_exists ( target jsonb, path jsonpath [, vars jsonb [, silent boolean ]] ) → boolean
jsonb_path_match ( target jsonb, path jsonpath [, vars jsonb [, silent boolean ]] ) → boolean
jsonb_path_query ( target jsonb, path jsonpath [, vars jsonb [, silent boolean ]] ) → setof jsonb
jsonb_path_query_array ( target jsonb, path jsonpath [, vars jsonb [, silent boolean ]] ) → jsonb
jsonb_path_query_first ( target jsonb, path jsonpath [, vars jsonb [, silent boolean ]] ) → jsonb

See postgres docs and search "jsonpath" for more details. Note that we don't support jsonpath data type for now. All jsonpath arguments in the signature are actually varchar.

For the syntax of jsonpath, see 9.16.2. The SQL/JSON Path Language. Almost all features in the docs have been supported, except that:

  1. .datetime() and . datetime(template) method is not supported.
  2. Some implementations for the silent argument may be incorrect.
  3. Some implementations for regexp matching may be incorrect.


#[function(
"jsonb_path_exists(jsonb, varchar) -> boolean",
prebuild = "JsonPath::new($1).map_err(parse_error)?"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@wangrunji0408 wangrunji0408 marked this pull request as ready for review November 23, 2023 08:13
@wangrunji0408
Copy link
Contributor Author

wangrunji0408 commented Nov 23, 2023

I have fixed regression tests as much as possible.
The remaining failing cases are as follows: (- is expected result, + is actual output)

 # wrong silent handling
 select jsonb_path_query('[1,"2",3]', '+$', '{}', true);
  jsonb_path_query 
 ------------------
- 1
-(1 row)
+(0 rows)

 # wrong silent handling
 select jsonb '["1",2,0,3]' @? '-$[*]';
  ?column? 
 ----------
- t
+ 
 (1 row)

 # double precision problem
 select jsonb_path_query('{"a": 2.5}', '-($.a * $.a).floor() % 4.3');
- jsonb_path_query 
-------------------
- -1.7
+  jsonb_path_query   
+---------------------
+ -1.7000000000000002
 (1 row)

 # I think both results are valid according to the specification. It depends on implementation.
 select jsonb_path_query('[[null, 1, "abc", "abcabc"]]', 'lax $ ? (@[*] starts with "abc")');
-      jsonb_path_query      
-----------------------------
- [null, 1, "abc", "abcabc"]
-(1 row)
+ jsonb_path_query 
+------------------
+(0 rows)

 # wrong regexp matching
 select jsonb_path_query('[null, 1, "a\b", "a\\b", "^a\\b$"]', 'lax $[*] ? (@ like_regex "a\\b" flag "")');
  jsonb_path_query 
 ------------------
+ "^a\\b$"
+ "a\\b"
  "a\b"
-(1 row)
+(3 rows)

 # wrong regexp matching
 select jsonb_path_query('[null, 1, "a\b", "a\\b", "^a\\b$"]', 'lax $[*] ? (@ like_regex "^a\\b$" flag "")');
  jsonb_path_query 
 ------------------
- "a\b"
-(1 row)
+(0 rows)

 # wrong silent handling
 SELECT jsonb_path_query_first('[{"a": 1}, {"a": 2}, {}]', 'strict $[*].a', '{}', true);
  jsonb_path_query_first 
 ------------------------
- 1
+ 
 (1 row)

As these are mostly corner cases, I think we can move on and leave them to be fixed later.

Base automatically changed from wrj/implicit-cast to main November 28, 2023 08:51
Copy link

codecov bot commented Nov 28, 2023

Codecov Report

Attention: 174 lines in your changes are missing coverage. Please review.

Comparison is base (a73d940) 68.19% compared to head (b96fe7e) 68.15%.

Files Patch % Lines
src/expr/impl/src/scalar/jsonb_path.rs 23.80% 144 Missing ⚠️
src/frontend/src/binder/expr/binary_op.rs 18.75% 13 Missing ⚠️
src/common/src/types/jsonb.rs 16.66% 5 Missing ⚠️
src/sqlparser/src/tokenizer.rs 0.00% 4 Missing ⚠️
src/frontend/src/expr/mod.rs 0.00% 3 Missing ⚠️
src/sqlparser/src/ast/operator.rs 0.00% 2 Missing ⚠️
src/sqlparser/src/parser.rs 33.33% 2 Missing ⚠️
src/expr/macro/src/gen.rs 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #13568      +/-   ##
==========================================
- Coverage   68.19%   68.15%   -0.04%     
==========================================
  Files        1524     1525       +1     
  Lines      262232   262454     +222     
==========================================
+ Hits       178829   178879      +50     
- Misses      83403    83575     +172     
Flag Coverage Δ
rust 68.15% <24.67%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Comment on lines +147 to +150
let matched = path
.query::<ValueRef<'_>>(target.into())
.map_err(eval_error)?;
Ok(matched.into_iter().map(|json| json.into_owned().into()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to return an iterator for query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard in stable Rust as all query functions need to be written as generator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about those 3rd party libraries available under stable 🥵? Like https://github.com/Wolvereness/remit/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Will have a try later.

Copy link
Contributor

@TennyZhuang TennyZhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR and test suite LGTM. Although not all corner cases were covered, I think it’s still good enough to merge.

@wangrunji0408 wangrunji0408 added this pull request to the merge queue Dec 5, 2023
Merged via the queue into main with commit e0cfb82 Dec 5, 2023
15 of 16 checks passed
@wangrunji0408 wangrunji0408 deleted the wrj/jsonb-path branch December 5, 2023 05:01
github-actions bot pushed a commit that referenced this pull request Dec 5, 2023
wangrunji0408 added a commit that referenced this pull request Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support JSON Path functions for JSONB
3 participants