Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observable write stalls on high load #3085

Closed
v0y4g3r opened this issue Jan 3, 2024 · 0 comments · Fixed by #3114
Closed

Observable write stalls on high load #3085

v0y4g3r opened this issue Jan 3, 2024 · 0 comments · Fixed by #3114
Assignees
Labels
A-storage Involves code in storage engines C-performance Category Performance

Comments

@v0y4g3r
Copy link
Contributor

v0y4g3r commented Jan 3, 2024

What type of bug is this?

Performance issue

What subsystems are affected?

Datanode

Minimal reproduce step

You can simply reproduce this in TSBS suite.

What did you expect to see?

There should be no peaks and valleys in requests handled per second.

What did you see instead?

When performs benchmark on disks with average performance (like AWS gp2/3 with no extra IOPS budget), we can observe noticeable write stall according to the metric mito_write_rows_total.

image

The stalls have a correlation in terms of time with fsync operations in WAL. Everytime the WAL rotates, it will allocate a new log file and fsync the previous log file to ensure durability. This will cause high IO util.

Some methods to mitigate these stalls:

  • We may break the large fsync into frequent smaller fsyncs to amortize the cost
  • Enable log recycle to reuse obsolate log files.

What operating system did you use?

NA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Involves code in storage engines C-performance Category Performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant