Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimize json parser #9113

Closed
kwannoel opened this issue Apr 12, 2023 · 11 comments
Closed

perf: optimize json parser #9113

kwannoel opened this issue Apr 12, 2023 · 11 comments
Assignees
Labels
help wanted Issues that need help from contributors no-issue-activity type/feature

Comments

@kwannoel
Copy link
Contributor

kwannoel commented Apr 12, 2023

For q17

Now:
Screenshot 2023-04-11 at 7 33 02 PM

Before:
Screenshot 2023-04-11 at 7 32 52 PM

  1. After investigation, we should setup some benchmarks for json parser which can catch this regression. (Or ignore this comment if they already do).
  2. Overall, source throughput increased to 244,000 from 225,000. As such, normalized regression may be around 6% instead of 7%.
  3. @tabVersion mentioned investigating fix(parser): fetch json keys regardless of cases #9086 may be helpful.

    @tabVersionI don’t know if it is caused by this pr fix(parser): fetch json keys regardless of cases #9086
    it adds a loop for all keys when parsing every row to match the column names and the json keys ignoring cases

Here's the flamegraph:
Screenshot 2023-04-12 at 12 55 22 PM

Interactive version

@kwannoel kwannoel assigned kwannoel and unassigned kwannoel Apr 12, 2023
@github-actions github-actions bot added this to the release-0.19 milestone Apr 12, 2023
@tabVersion
Copy link
Contributor

bench #9195

@lmatz
Copy link
Contributor

lmatz commented Apr 28, 2023

Went through the flame graphs generated in https://buildkite.com/risingwavelabs/main-cron/builds/458#0187c454-58bd-439a-bf57-6f774ea41b80

For queries q0,q1,q2,q3,q7,q8,q10,q14,q15,q17,q18,q20,q21,q22, json parser takes more time than other executors.

@kwannoel
Copy link
Contributor Author

Screenshot 2023-04-28 at 1 18 28 PM

Likely need to optimize append_datum_n

@kwannoel
Copy link
Contributor Author

Investigating it

@kwannoel kwannoel assigned kwannoel and unassigned tabVersion and adevday Apr 28, 2023
@st1page
Copy link
Contributor

st1page commented Apr 28, 2023

The SourceStreamChunkRowWriter here only append one row to the ArrayBuilderImpl and it will take dynamic dispatch for each row. Maybe we can append a batch of data each time?

@kwannoel
Copy link
Contributor Author

kwannoel commented Apr 28, 2023

Another potential cost

pub fn append_datum_n(&mut self, n: usize, datum: impl ToDatumRef) {
match datum.to_datum_ref() {
None => match self {
$( Self::$variant_name(inner) => inner.append_n(n, None), )*
}
Some(scalar_ref) => match (self, scalar_ref) {
$( (Self::$variant_name(inner), ScalarRefImpl::$variant_name(v)) => inner.append_n(n, Some(v)), )*
(this_builder, this_scalar_ref) => panic!(
"Failed to append datum, array builder type: {}, scalar type: {}",
this_builder.get_ident(),
this_scalar_ref.get_ident()
),
},
}
}

The inner match under the Some branch can also cost 3%?
Screenshot 2023-04-28 at 1 40 29 PM

@st1page
Copy link
Contributor

st1page commented Apr 28, 2023

}
Some(scalar_ref) => match (self, scalar_ref) {
$( (Self::$variant_name(inner), ScalarRefImpl::$variant_name(v)) => inner.append_n(n, Some(v)), )*
(this_builder, this_scalar_ref) => panic!(
"Failed to append datum, array builder type: {}, scalar type: {}",
this_builder.get_ident(),

The match here leads to too many branches(O(n^2)) do you have any idea? @BugenZhao

@kwannoel

This comment was marked as outdated.

@kwannoel

This comment was marked as outdated.

@kwannoel kwannoel added the help wanted Issues that need help from contributors label Apr 28, 2023
@kwannoel kwannoel modified the milestones: release-1.0, release-1.1 Jul 14, 2023
@kwannoel kwannoel closed this as not planned Won't fix, can't repro, duplicate, stale Aug 1, 2023
@kwannoel kwannoel reopened this Aug 1, 2023
@kwannoel kwannoel removed this from the release-1.1 milestone Aug 1, 2023
Copy link
Contributor

github-actions bot commented Jul 3, 2024

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean.
Don't worry if you think the issue is still valuable to continue in the future.
It's searchable and can be reopened when it's time. 😄

@st1page
Copy link
Contributor

st1page commented Jul 4, 2024

closed with #17146 and related PRs
cc @BugenZhao

@st1page st1page closed this as completed Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issues that need help from contributors no-issue-activity type/feature
Projects
None yet
Development

No branches or pull requests

5 participants