Reduce risk of losing data due to garbage collection or restarts #2252

jbearer · 2024-11-04T17:06:24Z

Partially addresses an issue we saw on water, where all DA nodes restarting at the same time can cause loss of data. The problematic sequence of events is:

DA nodes receive, store, and sign a DA proposal A, causing a DAC to be formed
DA nodes shut down
The corresponding proposal is decided
DA nodes start back up, and receive a decide event for a subsequent proposal B
DA nodes garbage collect all views <= B, including proposal A, which they never decided and thus never moved to archival storage

This change addresses this potential problem in two ways:

All nodes (including non-DA nodes) garbage collect consensus storage only for views they have definitively processed (i.e. moved to archival storage). That is, instead of garbage collecting all views <= v upon seeing a decide event for leaf v, we only collect views which have been part of a decide event we've seen (ie whose view numbers fall within a range of view numbers defined by the first and last view numbers in a chain of leaves with consecutive heights). This means view which are not encompassed by a decide event are conservatively not garbage collected (at least, not immediately). This change prevents unintentional loss of data.
We implement the query service fetching provider traits by retrieving data from consensus storage. This change leverages the previous change preventing loss of data, allowing the database to automatically recover from missing data caused by missed decide events.

This PR:

When processing a decide event in persistent storage, look for chains of consecutive leaves, and only garbage collect views which fall within those chains
Separately, garbage collect any views over a certain age, to clean up any undecided views which weren't used to recover archival storage
- GC more aggressively when storage used by consensus storage exceeds a threshold (just like in the query service)
Implement query service provider traits based on consensus storage

This PR does not:

Fully solve the restart issue. Specifically: the protocol guarantees that at least one DA node will store every DA proposal, and this change ensures every stored DA proposal will eventually make it to archival storage. We do not have the same guarantee for quorum proposals and certificates, because DA nodes are not required to form a QC. Thus, we can still have a situation where every DA node misses a quorum proposal or certificate, and furthermore, the query service needs the quorum proposal for the block header before it can even try to fetch a missing DA proposal. This problem is solved by having non-DA nodes run a version of the availability service which stores leaves but not payloads. That will be facilitated by sqlite support.

Previously, we would delete all data from consensus storage up to the decided view, with each new decide. In the case where we miss a decide event, this can cause us to delete the associated data (e.g. DA, VID proposals) from a view that we never actually moved to archival storage. This in turn makes it much harder to use any guarantees from consensus about certain nodes posessing certain data after a decide, because we are garbage collecting such data. One particularly bad case is about DA nodes. Consensus guarantees that some honest DA nodes will see every DA proposal. However, if those nodes all restart at the same time shortly after seeing a DA proposal, they may miss the corresonding quorum proposal and decide. Then, when they restart, they may end up garbage collecting that DA proposal before anyone realizes that it did get decided, and now no one has the data. The new technique is more conservative. We only garbage collect specific views or ranges of views for which we know we have successfully processed all decide events. Other data, whether it is for views that never decided or for views where we missed a decide (we cannot immediately tell the difference) will be retained indefinitely. This ensures we never lose data before it is archived, and allows us to manually rebuild an incomplete archive after it is discovered. It also enables us to implement a catchup data source that pulls from this store of undecided, un-garbage-collected data, so that the archive can automatically rebuild itself. Of course, indefinitely retaining data which may not even have been decided is undesirable. The next commit will add pruning, so that all data is deleted after a certain number of views, even if we never archived it. Then the guarantee will be: we can always recover a complete archive, based on the guarantees of consensus, as long as recover completes within a certain (configurable) time period.

Add target, minimum retentions and target usage, like the archival pruner. This allows us to take full advantage of the storage space if we have it, keeping data around for longer, while still ensuring we keep it around long *enough* even if we are low on space.

jbearer force-pushed the jb/gc branch from ad76cc3 to ca1cc91 Compare November 4, 2024 23:57

jbearer force-pushed the jb/gc branch from ca1cc91 to bef771b Compare November 12, 2024 22:38

jbearer added 8 commits November 14, 2024 11:22

Implement pruning for old, undecided consensus data

adf2f42

Implement provider traits for consensus persistence

9a8dc8f

Regression test for deadlock

9601b0e

Use consensus storage as fetching provider when running query service

cb17c20

Remove dyn-clone, which is no longer necessary

e37b744

Bump query service

bb58b2b

jbearer force-pushed the jb/gc branch from 268acb4 to 2a32d56 Compare November 14, 2024 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce risk of losing data due to garbage collection or restarts #2252

Reduce risk of losing data due to garbage collection or restarts #2252

jbearer commented Nov 4, 2024 •

edited

Loading

Reduce risk of losing data due to garbage collection or restarts #2252

Are you sure you want to change the base?

Reduce risk of losing data due to garbage collection or restarts #2252

Conversation

jbearer commented Nov 4, 2024 • edited Loading

This PR:

This PR does not:

jbearer commented Nov 4, 2024 •

edited

Loading