Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimize insert perfomance #16347

Open
lmatz opened this issue Apr 16, 2024 · 3 comments
Open

perf: optimize insert perfomance #16347

lmatz opened this issue Apr 16, 2024 · 3 comments

Comments

@lmatz
Copy link
Contributor

lmatz commented Apr 16, 2024

We used to run sysbench on 3 8C16G CN:
dashboard: http://metabase.risingwave-cloud.xyz/dashboard/278-sysbench-3cn-rw-qps
example: https://buildkite.com/risingwave-test/sysbench/builds/698#018e061f-1719-4907-9c2c-15bddb895327

But lately we have been running sysbench on 1 8C16G CN:
dashboard: http://metabase.risingwave-cloud.xyz/dashboard/1133-sysbench-1cn-rw-qps
example: https://buildkite.com/risingwave-test/sysbench/builds/746#018ee38f-d9de-4f54-b698-7e7518a4a829

In Sysbench, we have two insert benchmarks:

  1. OLTP-insert, which is included in the vanilla sysbench: https://github.com/risingwavelabs/sysbench/blob/master/src/lua/oltp_insert.lua
  2. Bulk-insert, which is added by us: https://github.com/risingwavelabs/sysbench/blob/master/src/lua/bulk_insert.lua

There are feedback from users that the insert performance is not desirable. We want to improve it.

One observation is that 3CN(128 sysbench threads) vs 1CN(128 sysbench threads) vs 1CN (256 sysbench threads) do not differ from each other too much.

Some other common observations:

  1. The frontend is CPU-intensive, i.e. 600+%, but not utilizing all 8 CPUs.
  2. The CPU utilization of compute node is low.

We enabled CPU profiling by default. The CPU flamegraph is generated and uploaded in buildkite pipelines, e.g. https://buildkite.com/risingwave-test/sysbench/builds/746#018ee38f-d9de-4f54-b698-7e7518a4a829 under artifacts tab.

@lmatz
Copy link
Contributor Author

lmatz commented Apr 18, 2024

One example: https://buildkite.com/risingwave-test/sysbench/builds/755#018ef126-d6ee-41e0-a309-606a9c89119d

CN and FN FlameGraph under the artifacts tab.

oltp-insert workload. no checks in MV executor (disabled index):
https://github.com/risingwavelabs/sysbench/blob/master/src/lua/oltp_insert.lua#L50-L61

Grafana: https://grafana.test.risingwave-cloud.xyz/d/EpkBw5W4k/risingwave-dev-dashboard?orgId=1&var-datasource=Prometheus:%20test-useast1-eks-a&from=1713443508000&to=1713443906000&var-namespace=sysbench-lmatz-test

Frontend is using close to 7CPUs out of 8CPUs in total.
Frontend CPU usage is much higher than Compute CPU usage.

Probably we optimize the CPU usage of Frontend.

SCR-20240418-tg5
SCR-20240418-tcd

Considering that the insert statement is the same, I suppose FN does not need to spend so much time on gen_batch_plan_by_statement?
Need something like a plan cache? cc: @chenzl25

@chenzl25
Copy link
Contributor

Considering that the insert statement is the same, I suppose FN does not need to spend so much time on gen_batch_plan_by_statement? Need something like a plan cache? cc: @chenzl25

Theoretically, yes. Plan cache can improve the performance in this scenario, but it has some shortcomings as well, i.e. it needs to normalize the SQL and parameterize it which would introduce an additional overhead, furthermore, from the optimizer view, during optimization we can't see the actual parameter anymore which would introduce a huge refactoring to RisingWave. With a plan cache, people might introduce a new optimization without considering how it will affect the optimization time anymore, so I think we'd better do not introduce plan cache.

@lmatz lmatz removed this from the release-1.9 milestone May 14, 2024
Copy link
Contributor

github-actions bot commented Aug 1, 2024

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean.
Don't worry if you think the issue is still valuable to continue in the future.
It's searchable and can be reopened when it's time. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants