Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: table scan with too many delete tombstones is slow #12903

Open
BugenZhao opened this issue Oct 17, 2023 · 6 comments
Open

perf: table scan with too many delete tombstones is slow #12903

BugenZhao opened this issue Oct 17, 2023 · 6 comments

Comments

@BugenZhao
Copy link
Member

BugenZhao commented Oct 17, 2023

To resolve the backfill live-lock described in #12680, we ensure that the backfill should make progress before getting canceled in #12780. However, the underlying performance is still not resolved. This may affect the scan performance on tables with frequent deletes, like the materialized views cleaned up with the temporal filter.

If I understand it correctly, all full scans on such tables will be affected, right? If we make specific fixes or improvements to Backfill, users may still find it confusing that batch scanning a temporal-filtered table is slow. This could be another reason to consider fixing it in a more general way...

Originally posted by @BugenZhao in #12680 (comment)

This can be easily reproduced:

dev=> create table t (v int);
CREATE_TABLE

Time: 27.679 ms
dev=> select count(*) from t;
 count 
-------
     0
(1 row)

Time: 4.860 ms

dev=> insert into t select generate_series(1, 10000000);
INSERT 0 10000000
Time: 40462.840 ms (00:40.463)
dev=> delete from t;
DELETE 10000000
Time: 12125.790 ms (00:12.126)

dev=> select count(*) from t;
 count 
-------
     0
(1 row)

Time: 2036.083 ms (00:02.036)

As expected, nearly all time is spent on rewinding the merge iterator.

image

Ideas given by @hzxa21:

Some ideas:

  • Storage expose a "internal key" iterator to backfill so that backfill can use the tombstone key for backfilled position. This ensures that backfill can make progress even though no data is returned.
  • Maintain table statistics to guide backfill. A simple statistic is per table min/max key. Backfill can use vnode | min_key as the start_key when it begins. This can solve a subset of the issues when the upstream table performs sequential deletes (e.g. TTL).
  • Tune compaction to be more aggressive when the SST delete ratio is high. This can improve the situation but cannot solve it completely since it order to compaction to clean up a tombstone key completely, a full compaction is needed. When the data volume is high, full compaction may not be done in a fast manner even though compaction is triggered aggressively.
@kwannoel
Copy link
Contributor

kwannoel commented Oct 17, 2023

Maintain table statistics to guide backfill. A simple statistic is per table min/max key. Backfill can use vnode | min_key as the start_key when it begins. This can solve a subset of the issues when the upstream table performs sequential deletes (e.g. TTL).

Is this generalizable to some sort of cache? Such that we can handle tombstone ranges.

Tune compaction to be more aggressive when the SST delete ratio is high. This can improve the situation but cannot solve it completely since it order to compaction to clean up a tombstone key completely, a full compaction is needed. When the data volume is high, full compaction may not be done in a fast manner even though compaction is triggered aggressively.

I think @Li0k has worked on this.

@kwannoel
Copy link
Contributor

Tune compaction to be more aggressive when the SST delete ratio is high. This can improve the situation but cannot solve it completely since it order to compaction to clean up a tombstone key completely, a full compaction is needed. When the data volume is high, full compaction may not be done in a fast manner even though compaction is triggered aggressively.

Can I confirm again @Li0k? This is done by your PRs right?

@Li0k
Copy link
Contributor

Li0k commented Oct 20, 2023

Tune compaction to be more aggressive when the SST delete ratio is high. This can improve the situation but cannot solve it completely since it order to compaction to clean up a tombstone key completely, a full compaction is needed. When the data volume is high, full compaction may not be done in a fast manner even though compaction is triggered aggressively.

Can I confirm again @Li0k? This is done by your PRs right?

yeah, but cannot solve it completely, If the tombstone is affecting performance, we can consider adjusting the frequency of the tombstone picker triggers

@kwannoel
Copy link
Contributor

since it order to compaction to clean up a tombstone key completely, a full compaction is needed. When the data volume is high, full compaction may not be done in a fast manner even though compaction is triggered aggressively.

Further discussed offline with @Li0k , this part is still under discussion.

Tune compaction to be more aggressive when the SST delete ratio is high.

This part is done, compaction will occur in 10 min intervals. See #12776.

@kwannoel
Copy link
Contributor

For testing compaction: https://github.com/risingwavelabs/kube-bench/issues/306

Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

@hzxa21 hzxa21 removed this from the release-1.7 milestone Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants