Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pruner for finalized states #8379

Merged
merged 30 commits into from
Jul 10, 2024
Merged

Conversation

gfukushima
Copy link
Contributor

@gfukushima gfukushima commented Jun 14, 2024

PR Description

This PR adds a state pruner that can prune finalized states keeping only the finalized states for the amount of epochs specified using the experimental flag --Xdata-storage-archive-finalized-states-retained.
Available in archive mode exclusively.

Fixed Issue(s)

Fixes #8331

Documentation

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.

Changelog

  • I thought about adding a changelog entry, and added one if I deemed necessary.

@gfukushima gfukushima changed the title State pruner Add pruner for finalized states Jun 14, 2024
Comment on lines 120 to 123
LOG.info(
"Data storage mode: {}, Retained epochs {}",
config.getDataStorageMode(),
config.getRetainedEpochs());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably want to be mindful of logs here... maybe either put this to debug, or only log this when we're in archive mode...
also probalby want to think about this service WRT tree storage mode..

Copy link
Contributor Author

@gfukushima gfukushima Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, these are some leftovers I missed from a test. I'm doing a cleanup on some of those atm. the one that survive the clean up in the end they will all be at debug level

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed this to be a more informative log but kept it at INFO level since it is only printed at startup, WDYT?

Comment on lines 41 to 43
public static final Duration DEFAULT_STATE_PRUNING_INTERVAL = Duration.ofMinutes(10);
public static final long DEFAULT_STORAGE_RETAINED_EPOCHS = -1;
public static final int DEFAULT_STATE_PRUNING_LIMIT = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think if this runs during epoch transition it may get interesting, more of a general observation about all of these types of tasks i guess... we probably want to build out a design at some point that gives us more confidence about what parts of an epoch these tasks might be allowed...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh, this has been the most difficult part so far, to find/define a good set up, I thought about something a little more dynamic. E.g having the interval being defined by epochs_retained and epoch_durantion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Paul that we could have a "no flight time window" which could be epoch transition +/- 1 or 2 slots (ie from 31 to 2). The question will be how to handle tasks falling in that window:

  • skip: (pro: easy, cons: no execution guarantee for a long time)
  • queue them to the end of the time window (slot 2): (pro: execution guarantees, cons: multiple tasks can potentially run at the same time)

Just dumped some thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to link these prunings explicitly to certain slots, so the frequency will be an epoch for all of them, and we will have an implicit control over the epoch transition overlap (not guaranteed but reasonable).

Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
Signed-off-by: Gabriel Fukushima <[email protected]>
@gfukushima gfukushima marked this pull request as ready for review June 26, 2024 02:23
@gfukushima
Copy link
Contributor Author

There is a question which is, should we allow users to config --Xdata-storage-archive-finalized-states-retained to a value lower than --data-storage-archive-frequency? This would mean that you might have a finalized state for a short period of time in the DB.

Example:
--data-storage-archive-frequency = 2048 (default = 64 epochs)
--Xdata-storage-archive-finalized-states-retained = 32 (1 epoch)

You might save a finalized state for slot number 10, and by the time slot 42 reaches finalization state for slot 10 will be pruned. Next finalized state to be stored in the db will be on slot 2058 only.

@lucassaldanha lucassaldanha requested a review from tbenr June 26, 2024 03:59
@tbenr
Copy link
Contributor

tbenr commented Jul 5, 2024

Next finalized state to be stored in the db will be on slot 2058 only.

I guess you intended 2048 here otherwise I'm missing something in the reasoning :)

In general I think it is an edge case, and definitely a weird configuration to have. My mild feeling would be to enforce retain to be > than frequency

Copy link
Contributor

@tbenr tbenr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. some comments and still a LOG level change requested by Paul

@gfukushima
Copy link
Contributor Author

Adding some numbers collected from instances before merging this PR:

DB Size of different setups:
Holesky-01 (minimal)
DB - 72G teku/beacon/db

Holesky-archive
DB - 212G teku/beacon/db

Holesky-archive-64epochs
DB - 110G teku/beacon/db

@gfukushima gfukushima enabled auto-merge (squash) July 10, 2024 05:26
@gfukushima gfukushima merged commit f0d8fbd into Consensys:master Jul 10, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow pruning "archive" node
3 participants