Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(bench): add bench for json parser #9195

Merged
merged 3 commits into from
Apr 14, 2023
Merged

chore(bench): add bench for json parser #9195

merged 3 commits into from
Apr 14, 2023

Conversation

tabVersion
Copy link
Contributor

@tabVersion tabVersion commented Apr 14, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Checklist For Contributors

as title, the pr focuses on comparing cases that column names that do not match json keys

- [ ] I have written necessary rustdoc comments
- [ ] I have added necessary unit tests and integration tests
- [ ] I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features #7934).
- [ ] I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)

  • All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

  • I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

  • My PR DOES NOT contain user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

Signed-off-by: tabVersion <[email protected]>
Signed-off-by: tabVersion <[email protected]>
@codecov
Copy link

codecov bot commented Apr 14, 2023

Codecov Report

Merging #9195 (6476bd2) into main (a2fe5aa) will increase coverage by 0.00%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #9195   +/-   ##
=======================================
  Coverage   70.81%   70.82%           
=======================================
  Files        1200     1200           
  Lines      200245   200245           
=======================================
+ Hits       141813   141822    +9     
+ Misses      58432    58423    -9     
Flag Coverage Δ
rust 70.82% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@kwannoel kwannoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a starting point for benchmarking json parser. Thanks for this!

I'm curious about source::benches::json_parser.rs though. Is that benchmark still relevant then?

Signed-off-by: tabVersion <[email protected]>
@tabVersion
Copy link
Contributor Author

tabVersion commented Apr 14, 2023

LGTM as a starting point for benchmarking json parser. Thanks for this!

I'm curious about source::benches::json_parser.rs though. Is that benchmark still relevant then?

Yes, it is still ok to benchmark json parser perf for general purposes. In this pr, because I suspect the regression is introduced in #9086, it focuses on comparing cases that column names that do not match json keys.

@tabVersion tabVersion closed this Apr 14, 2023
@tabVersion tabVersion reopened this Apr 14, 2023
@tabVersion
Copy link
Contributor Author

run on local machine: Apple M1 silicon, 32G RAM

ParseMatchCase/32       time:   [673.13 ms 675.36 ms 677.62 ms]

ParseMatchCase/128      time:   [638.88 ms 644.43 ms 653.78 ms]

ParseMatchCase/512      time:   [646.35 ms 652.50 ms 660.39 ms]

ParseMatchCase/1024     time:   [638.15 ms 640.67 ms 643.56 ms]

ParseMatchCase/2048     time:   [632.13 ms 633.57 ms 635.16 ms]

ParseMatchCase/4096     time:   [626.47 ms 627.26 ms 628.06 ms]

ParseMisMatchCase/32    time:   [712.50 ms 715.36 ms 718.23 ms]

ParseMisMatchCase/128   time:   [681.59 ms 685.19 ms 689.48 ms]

ParseMisMatchCase/512   time:   [669.22 ms 670.75 ms 672.17 ms]

ParseMisMatchCase/1024  time:   [669.12 ms 670.78 ms 672.34 ms]

ParseMisMatchCase/2048  time:   [671.74 ms 679.59 ms 690.52 ms]

ParseMisMatchCase/4096  time:   [675.99 ms 682.49 ms 692.14 ms]

seems there gonna be an extra 40ms+(5%) if column names do not match json keys. maybe it can explain the observation in #9113

@tabVersion tabVersion enabled auto-merge April 14, 2023 11:42
@tabVersion tabVersion added this pull request to the merge queue Apr 14, 2023
Merged via the queue into main with commit de7b5d4 Apr 14, 2023
@tabVersion tabVersion deleted the tab/parser-bench branch April 14, 2023 12:13
@kwannoel
Copy link
Contributor

run on local machine: Apple M1 silicon, 32G RAM

ParseMatchCase/32       time:   [673.13 ms 675.36 ms 677.62 ms]

ParseMatchCase/128      time:   [638.88 ms 644.43 ms 653.78 ms]

ParseMatchCase/512      time:   [646.35 ms 652.50 ms 660.39 ms]

ParseMatchCase/1024     time:   [638.15 ms 640.67 ms 643.56 ms]

ParseMatchCase/2048     time:   [632.13 ms 633.57 ms 635.16 ms]

ParseMatchCase/4096     time:   [626.47 ms 627.26 ms 628.06 ms]

ParseMisMatchCase/32    time:   [712.50 ms 715.36 ms 718.23 ms]

ParseMisMatchCase/128   time:   [681.59 ms 685.19 ms 689.48 ms]

ParseMisMatchCase/512   time:   [669.22 ms 670.75 ms 672.17 ms]

ParseMisMatchCase/1024  time:   [669.12 ms 670.78 ms 672.34 ms]

ParseMisMatchCase/2048  time:   [671.74 ms 679.59 ms 690.52 ms]

ParseMisMatchCase/4096  time:   [675.99 ms 682.49 ms 692.14 ms]

seems there gonna be an extra 40ms+(5%) if column names do not match json keys. maybe it can explain the observation in #9113

Hmm not entirely, but can be a contributing factor. E2e was 6% but just json parser alone regression is like 60% (11%/18%=0.6), so 5% doesnt seem to explain it fully.

Flamegraph percentages probably not too reliable though, only rough indicators. We can probably can checkout the old main branch where 11% is observed and run the json parser micro benchmarks, get a more precise figure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants