-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nightly-20241106 sysbench perf degradation #19281
Comments
https://buildkite.com/risingwave-test/sysbench/builds/934#01930a58-97d3-44b1-888f-2d95513f874e The system will hang when we set max_prefetch_block_number = 0, 100% reproduced. |
I did a real investigation through a lot of tests and I found that the performance regression is related to cache miss, and I concluded that the performance data of this test is unstable for the following reasons.
All of the above behaviours are unstable and therefore the performance of this test is unstable. A fairer way would be to fully compact the lsm before the read test and adopt a more reasonable prefetch strategy for different olap query modes, but the current test is favourable for us to find out the problems on the compaction/read path. |
I think we can do both in the future if you can provide something like a warm-up machanism in the OLTP test. We can have two test pipelines with warm-up enabled and disabled. cc @lmatz |
+1. IIUC, the current test is expected to test regression on random lookup assuming cache is warmed up but the assumption has changed with the recent compaction strategy changes. I think we can fix the current test to make the assumption hold all the time by introducing cache warm up. However, given that the current test unexpectedly helps us find out some corner cases on the new compaction strategy, I think it is still valuable. For testing the new compaction strategy, do you think we should keep this test or write a new test? cc @Li0k |
I think we can enhance the current pipeline with optional warm-up to achieve both in the current test. But you need to provide the warm-up mechanism first. I remember @Li0k said he can provide a SQL syntax for this. |
Describe the bug
Buildkite Job
Grafana
Metabase Sysbench
The nexmark q105 also drops. But it's not stable recently. Metabase Nexmark Q105
Error message/log
No response
To Reproduce
No response
Expected behavior
No response
How did you deploy RisingWave?
No response
The version of RisingWave
nightly-20241106
Additional context
The only pull request for nightly-20241106 is #19080 according to https://github.com/risingwavelabs/rw-commits-history?tab=readme-ov-file#nightly-20241106
@Li0k PTAL
The text was updated successfully, but these errors were encountered: