Add retry loop to deleting timeseries by name #6504
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We currently support deleting timeseries by name in a schema upgrade. This deletion is implemented as a mutation in ClickHouse, which walks all affected data parts and deletes the relevant records in a merge operation. That's asynchronous by default, and run in a pool of background tasks. Despite that, with large tables, it can take a while for each mutation to complete, which blocks the server from queueing new deletion requests. This can lead to timeouts, like seen in #6501.
This should fix #6501, but I'm not certain of that because I don't have a good way to reproduce the bug. It seems likely that this is only seen when the database is already heavily loaded, as it might be when doing these mutations on large tables.