Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(storage): Improve data alignment for multi-table compaction groups #13037

Open
Li0k opened this issue Oct 24, 2023 · 4 comments
Open

perf(storage): Improve data alignment for multi-table compaction groups #13037

Li0k opened this issue Oct 24, 2023 · 4 comments

Comments

@Li0k
Copy link
Contributor

Li0k commented Oct 24, 2023

We avoid the split of creating table after #11826.

Backfill snapshot read can cause tables to have a large write throughput during mv creation and cause excessive compaction groups to be created. But the streaming throughput can be low after mv creation completes.

Currently, we have not implemented compaction group merge, In the above scenario, it may cause us to waste more IOPS. However, placing high write-throughput tabless from in the default compaction group will not allow to utilization of parallel base compaction to improve the efficiency of the compaction, due to the key range not being aligned.

To solve the data alignment problem, propose a simple solution to improve the parallelism of compaction by performing some data alignment operations on the default compaction group, which may improve the performance of the backfill, reduce the stacking of l0s, and get more efficient compaction.

  1. In the default compaction group, count the table write throughput in the creating phase (logic already exists).
  2. cut the table high throughput by table_id and vnode to achieve data alignment (like the dedicated compaction group)
  3. After backfill, restore the default logic to reduce the iops of the default compaction group.
@hzxa21
Copy link
Collaborator

hzxa21 commented Nov 8, 2023

#13075

@Li0k
Copy link
Contributor Author

Li0k commented Nov 13, 2023

Backfill Test

Resource

  • compute node = 8c_32g * 3
  • compactor node = 16c_4g * 1

Background

Test the behavior of compaction and backfill under different policies by creating mvs on a mirror cluster, The mvs created contain multiple state tables, and there are a few state tables with high write throughput. Comparing the old and new policies:

Result

CPU

  • main
image
  • branch
image

The compactor cpu utilization of branch has increased, which indirectly indicates an increase in parallelism.

Barrier Latency

  • main
image
  • branch
image

Read Duration - iter

  • main
image
  • branch
image

SStable Count

  • main
image
  • branch
image

SStable Size

  • main
image
  • branch
image

cg2 and cg3 have less stacked l0 and base level data.

  • cg2 75g vs 65g
  • cg3 32g vs 24g

Compaction Skip Count

  • main
image
  • branch
image

cg2 and cg3 have fewer skip counts due to pending-files

Compaction Task

  • main
image
  • branch
image

From the analysis of CompactTask's properties, we can find that the branch's task can eliminate more sub_levels, and the size of each task is controlled to be around 2g, and the number of files is kept below 100. Therefore, we can maintain a stable running task count and improve the compactor cpu utilization.

Compacting Task count

  • main
image
  • branch
image

It is intuitively obvious that the branch's base level compaction task has a higher parallelism.

Lsm Compact Pending Bytes

  • main
image
  • branch
image

Conclusion

Data alignment does bring some compaction benefits. It improves compactor utilization and therefore alleviates data buildup in lsm. However, in the current tests, the short backfill times do not result in significant time optimization, and the barrier latency is somewhat jittery due to more frequent compactions.

@Li0k Li0k modified the milestones: release-1.5, release-1.6 Dec 6, 2023
@Li0k Li0k modified the milestones: release-1.6, release-1.8 Mar 6, 2024
@Li0k Li0k modified the milestones: release-1.8, release-1.9 Apr 8, 2024
@Li0k
Copy link
Contributor Author

Li0k commented Apr 8, 2024

related to #15291 , We will introduce new strategies to perform data alignment and split.

Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants