Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

14 nexmark benchmark skus performance degradation #13931

Closed
cyliu0 opened this issue Dec 12, 2023 · 13 comments
Closed

14 nexmark benchmark skus performance degradation #13931

cyliu0 opened this issue Dec 12, 2023 · 13 comments
Assignees
Milestone

Comments

@cyliu0
Copy link
Collaborator

cyliu0 commented Dec 12, 2023

Describe the bug

+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| BENCHMARK NAME                                                  | EXECUTION ID | STATUS     | FLUCTUATION OF BEST | FLUCTUATION OF LAST 10 DAYS |
+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| nexmark-q18-blackhole-medium-1cn                                |        16377 | Negative   | -23.47%             | -11.33%                     |
| nexmark-q8-blackhole-watermark-medium-1cn-affinity              |        16381 | Negative   | -23.73%             | -13.39%                     |
| nexmark-q19-blackhole-medium-1cn                                |        16382 | Negative   | -30.59%             | -25.84%                     |
| nexmark-q9-blackhole-watermark-medium-1cn-affinity              |        16386 | Negative   | -22.56%             | -13.23%                     |
| nexmark-q20-blackhole-medium-1cn                                |        16389 | Negative   | -19.53%             | -12.72%                     |
| nexmark-q18-blackhole-watermark-medium-1cn-affinity             |        16392 | Negative   | -20.35%             | -13.29%                     |
| nexmark-q101-blackhole-medium-1cn                               |        16394 | Negative   | -21.85%             | -14.95%                     |
| nexmark-q6-group-top1-blackhole-watermark-medium-1cn-affinity   |        16397 | Negative   | -23.22%             | -10.88%                     |
| nexmark-q5-many-windows-blackhole-watermark-medium-1cn-affinity |        16403 | Negative   | -38.80%             | -18.74%                     |
| nexmark-q103-blackhole-medium-1cn                               |        16404 | Negative   | -16.32%             | -11.45%                     |
| nexmark-q104-blackhole-medium-1cn                               |        16409 | Negative   | -23.97%             | -15.64%                     |
| nexmark-q5-many-windows-blackhole-medium-1cn                    |        16412 | Negative   | -49.41%             | -14.18%                     |
| nexmark-q105-blackhole-medium-1cn                               |        16413 | Negative   | -22.20%             | -14.24%                     |
| nexmark-q5-many-windows-blackhole-medium-1cn-affinity           |        16414 | Negative   | -50.40%             | -15.86%                     |

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

nightly-20231211

Additional context

Please find the commit history here: https://github.com/risingwavelabs/rw-commits-history#commit-history

52632ae #13926
ad2073f #13925
1d4cac8 #13921
2fffc13 #13558
7c77553 #13769
98b0ebd #13491
ccf8be5 #13870
7b410c0 #13853
c3e21a8 #13915
92f3b58 #13862
c8d351b #13909
fb1bf0a #13525
4d525e2 #13901
a3c71aa #13854
d270f10 #13692
ddbd74b #13877
5d7f327 #13849
e513e6b #13881

SCR-20231212-kzt

@cyliu0 cyliu0 added type/bug Something isn't working type/perf labels Dec 12, 2023
@github-actions github-actions bot added this to the release-1.6 milestone Dec 12, 2023
@cyliu0
Copy link
Collaborator Author

cyliu0 commented Dec 12, 2023

And the sysbench prepare job failed to check completion in 5 retries with an interval of 60 seconds. So I think there might be insertion performance degradation as well on nightly-20231211. Because the prepare finished on time with same configuration when running on nightly-20231210
https://buildkite.com/risingwave-test/sysbench/builds/620#018c5aaf-b799-4cd2-9e4e-d2b89cd4b7e0

a separate issue: #13932

@cyliu0
Copy link
Collaborator Author

cyliu0 commented Dec 12, 2023

For backfill test, it failed to create watermark mv for 1h timeout on nightly-20231211. https://buildkite.com/risingwave-test/backfill/builds/72#018c5b1d-515e-49ac-aedf-4da032ee503f

  | 2023-12-12 07:26:13 | create watermark mv for backfill table
  | 2023-12-12 07:26:13 | Timing is on.
  | 2023-12-12 08:26:11 | ❌ backfill test timeout

The backfill test step for creating watermark mv succeed on nightly-20231208 in 2 minutes

  | 2023-12-09 07:25:54 | create watermark mv for backfill table
  | 2023-12-09 07:25:54 | Timing is on.
  | 2023-12-09 07:27:35 | NOTICE:  Your session timezone is UTC. It was used in the interpretation of timestamps and dates in your query. If this is unintended, change your timezone to match that of your data's with `set timezone = [timezone]` or rewrite your query with an explicit timezone conversion, e.g. with `AT TIME ZONE`.
  | 2023-12-09 07:27:35 | CREATE_MATERIALIZED_VIEW
  | 2023-12-09 07:27:35 | Time: 103637.007 ms (01:43.637)

a separate issue: #13943

@kwannoel
Copy link
Contributor

@lmatz
Copy link
Contributor

lmatz commented Dec 12, 2023

From the commit history, it looks like only #13558 is possible to affect the performance, cc: @Little-Wallace , do you think the PR is possible to introduce such a big change?

@st1page
Copy link
Contributor

st1page commented Dec 12, 2023

looks like the memory usage of some component increase and the executor cache evicts more

image image image

@st1page
Copy link
Contributor

st1page commented Dec 12, 2023

From the commit history, it looks like only #13558 is possible to affect the performance, cc: @Little-Wallace , do you think the PR is possible to introduce such a big change?

run the queries before #13558
~~https://buildkite.com/risingwave-test/nexmark-benchmark/builds/2660#018c5da3-654f-4867-ad3e-2931f0aa82c2~~(wrong affinity)
https://buildkite.com/risingwave-test/nexmark-benchmark/builds/2662#018c5eba-37f7-469e-8d17-20f3ef206bb8

(build the image in a seperate branch, to include the fix #13925 ) https://buildkite.com/risingwavelabs/docker/builds/14875#018c5d85-6b70-4b1f-b4f7-8adc4b4c655a

@st1page
Copy link
Contributor

st1page commented Dec 12, 2023

| nexmark-q5-many-windows-blackhole-medium-1cn | 16412 | Negative | -49.41% | -14.18%

This is because the revert in #13821

image

@cyliu0
Copy link
Collaborator Author

cyliu0 commented Dec 13, 2023

For nightly-20231212

+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| BENCHMARK NAME                                                  | EXECUTION ID | STATUS     | FLUCTUATION OF BEST | FLUCTUATION OF LAST 10 DAYS |
+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| nexmark-q18-blackhole-medium-1cn                                |        16450 | Negative   | -24.54%             | -10.97%                     |
| nexmark-q5-many-windows-blackhole-medium-1cn                    |        16488 | Negative   | -52.12%             | -16.54%                     |

@st1page
Copy link
Contributor

st1page commented Dec 13, 2023

From the commit history, it looks like only #13558 is possible to affect the performance, cc: @Little-Wallace , do you think the PR is possible to introduce such a big change?

run the queries before #13558 ~~https://buildkite.com/risingwave-test/nexmark-benchmark/builds/2660#018c5da3-654f-4867-ad3e-2931f0aa82c2~~(wrong affinity) https://buildkite.com/risingwave-test/nexmark-benchmark/builds/2662#018c5eba-37f7-469e-8d17-20f3ef206bb8

(build the image in a seperate branch, to include the fix #13925 ) https://buildkite.com/risingwavelabs/docker/builds/14875#018c5d85-6b70-4b1f-b4f7-8adc4b4c655a

It has a similar performance with the 1210

@st1page
Copy link
Contributor

st1page commented Dec 13, 2023

For nightly-20231212

+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| BENCHMARK NAME                                                  | EXECUTION ID | STATUS     | FLUCTUATION OF BEST | FLUCTUATION OF LAST 10 DAYS |
+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| nexmark-q18-blackhole-medium-1cn                                |        16450 | Negative   | -24.54%             | -10.97%                     |
| nexmark-q5-many-windows-blackhole-medium-1cn                    |        16488 | Negative   | -52.12%             | -16.54%                     |

🤔 I rerun the 20231211 to see if the degradation can be reproduced
https://buildkite.com/risingwave-test/nexmark-benchmark/builds/2670

@st1page
Copy link
Contributor

st1page commented Dec 13, 2023

For nightly-20231212

+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| BENCHMARK NAME                                                  | EXECUTION ID | STATUS     | FLUCTUATION OF BEST | FLUCTUATION OF LAST 10 DAYS |
+-----------------------------------------------------------------+--------------+------------+---------------------+-----------------------------+
| nexmark-q18-blackhole-medium-1cn                                |        16450 | Negative   | -24.54%             | -10.97%                     |
| nexmark-q5-many-windows-blackhole-medium-1cn                    |        16488 | Negative   | -52.12%             | -16.54%                     |

🤔 I rerun the 20231211 to see if the degradation can be reproduced https://buildkite.com/risingwave-test/nexmark-benchmark/builds/2670

q18 and q20 good, q19 bad 😇 So the issue can not be reproduced stably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants