Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bonsai archive feature #7475

Open
wants to merge 62 commits into
base: main
Choose a base branch
from

Conversation

matthew1001
Copy link
Contributor

@matthew1001 matthew1001 commented Aug 16, 2024

PR description

Introduces a new (experimental) "Bonsai Archive" DB mode which creates a full archive of the chain it syncs with. This allows JSON/RPC calls to be made with historic blocks as context, for example eth_getBalance to get the balance of an account at a historic block, or eth_call to simulate a transaction at a given block in history.

The PR is intended to provide part of the function currently offered by the (now deprecated) FOREST DB mode. Specifically it allows state to be queried at an arbitrary block in history, but does not currently offer eth_getProof for said state. A subsequent PR will implement eth_getProof for historic blocks.

Summary of the overall design & changes

This PR builds on PR #5865 which proved the basic concept of archiving state in the Bonsai flat DB by suffixing entries with the block in which they were changed.

For example the state for account 0x0e79065B5F11b5BD1e62B935A600976ffF3754B9 at block 37834 is stored as

<account-hash><block-num-hex> = 0x9ab656e8fa2a1029964289c9a189083db258ca4b46ebaa374477e069b8f47dec00000000000093ca

In order to minimise performance degradation over time, historic state and storage entries in the DB are "archived" by moving them into a separate DB segment.

Where account state is stored in segment ACCOUNT_INFO_STATE, state that has been archived is stored in ACCOUNT_INFO_STATE_ARCHIVE. Likewise where storage is held in segment ACCOUNT_STORAGE_STORAGE, archived storage entries are stored in ACCOUNT_STORAGE_ARCHIVE.

An example Rocks DB query to retrieve the state of the example account above would be:

ldb --db=. get --column_family=ACCOUNT_INFO_STATE_ARCHIVE --key_hex --value_hex 0x9ab656e8fa2a1029964289c9a189083db258ca4b46ebaa374477e069b8f47dec00000000000093ca

Creating a Bonsai Archive node

The PR introduces an entirely new data storage format (as opposed to making it a configuration option of the existing BONSAI storage format.

To create a bonsai archive node simply set --data-storage-format=x_bonsai_archive when creating it.

An existing FOREST or BONSAI node cannot be migrated to BONSAI_ARCHIVE mode.

Storage requirements

An archive node intrinsically requires more storage space than a non-archive node. Every state update is retained in the archive DB segments as outlined above. An archive node for the holesky testnet as of the raising of this PR requires approximately 160Gi of storage.

Sync time

In order to create an archive of an entire chain, FULL sync mode must be used. This PR does not prevent SNAP syncing an archive node, but this will result in only a partial archive of the chain.

While the node is performing a FULL sync with the chain it is also migrating entries from the regular DB segments to the archive DB segments. Overall this increases the time to create the archive node. For a public chain this might require 1 week or more to complete syncing and archiving.

@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch 2 times, most recently from 88f3968 to 7d4a524 Compare August 20, 2024 10:19
@matthew1001 matthew1001 changed the title Multi version flat db rebase Bonsai archive feature Sep 4, 2024
@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch 7 times, most recently from 782ae60 to 5752732 Compare October 2, 2024 17:10
@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch 4 times, most recently from 5b06b50 to dce531e Compare October 4, 2024 16:11
@matthew1001 matthew1001 marked this pull request as ready for review October 7, 2024 16:31
jframe and others added 15 commits October 8, 2024 08:38
Signed-off-by: Jason Frame <[email protected]>
…se constructor that reuses worldStateStorage so that we don't lose values in the EvmToolSpecTests

Signed-off-by: Jason Frame <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
…d state, and freeze it

Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
…ten for blocks and move account state to new DB segment

Signed-off-by: Matthew Whitehead <[email protected]>
…t block state has been frozen for

Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
…age from the freezer segment

Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
matthew1001 and others added 8 commits January 9, 2025 10:01
…ices/storage/rocksdb/configuration/BaseVersionedStorageFormat.java

Co-authored-by: Sally MacFarlane <[email protected]>
Signed-off-by: Matt Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
Signed-off-by: Matthew Whitehead <[email protected]>
@matthew1001
Copy link
Contributor Author

matthew1001 commented Jan 14, 2025

@garyschulte @matkt I've added some commits that refactor the way archive world state and bonsai context work together. I think it's generally much cleaner now.

There are some small things I think that could still be improved. I've updated BonsaiContext to allow either block number or block hash to be provided - perhaps that should just be one or the other but I'm inclined to leave it as a pragmatic solution for now. But I'm generally pretty happy with the code structure now.

I also added a new class BonsaiArchiveWorldStateProvider which provides an override for getMutable(...) with archive-specific behaviour. In the case where the request is for world state <= 512 blocks ago, it calls the super version of the function which does a traditional bonsai rollback. But for older blocks it uses archive-specific logic to "rollback" to the chain head and set the block context to support flat DB requests at the historic block.

@matthew1001 matthew1001 requested a review from matkt January 14, 2025 11:57
@matthew1001
Copy link
Contributor Author

I've re-run ./gradlew acceptanceTestBftSoak which on this PR includes an archive node as 1 of the 4 QBFT nodes, and makes archive requests to the archive node at the end of the test. Test passed fine.

Signed-off-by: Matthew Whitehead <[email protected]>
@garyschulte
Copy link
Contributor

@garyschulte @matkt I've added some commits that refactor the way archive world state and bonsai context work together. I think it's generally much cleaner now.

There are some small things I think that could still be improved. I've updated BonsaiContext to allow either block number or block hash to be provided - perhaps that should just be one or the other but I'm inclined to leave it as a pragmatic solution for now. But I'm generally pretty happy with the code structure now.

...

Is there any time pressure for this feature? With the latest refactor, I'd like to at minimum archive and non-archive sync a network that predates cancun (where we nerfed SELFDESTRUCT).

Ideally we would have time to full sync mainnet without archive to ensure no regression, but that is a months long process on commodity hardware.

Either way - I will keep 👀 on this PR and kick off regression tests when you signal that it is ready again with your latest commits.

@matthew1001
Copy link
Contributor Author

@garyschulte @matkt I've added some commits that refactor the way archive world state and bonsai context work together. I think it's generally much cleaner now.
There are some small things I think that could still be improved. I've updated BonsaiContext to allow either block number or block hash to be provided - perhaps that should just be one or the other but I'm inclined to leave it as a pragmatic solution for now. But I'm generally pretty happy with the code structure now.

...

Is there any time pressure for this feature? With the latest refactor, I'd like to at minimum archive and non-archive sync a network that predates cancun (where we nerfed SELFDESTRUCT).

Ideally we would have time to full sync mainnet without archive to ensure no regression, but that is a months long process on commodity hardware.

Either way - I will keep 👀 on this PR and kick off regression tests when you signal that it is ready again with your latest commits.

I think almost all of the latest refactoring is isolated to archive-specific logic or classes. The clearStorage() implementation may be the one exception to that, which I'd be happy to revert if we want to merge the logic that we've tested with the full main merge. My concern with holding off too long is that the code-base is evolving and requires a fair amount of effort to keep rebasing/merging the PR with main. Unless we have big concerns that Bonsai (non-archive) has regressed I'd be keen to get this one merged sooner rather than later.

@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch from 1b145a3 to 3873f8e Compare January 16, 2025 11:23
// Update the block context before putting entries to storage via calculateRootHash()
// TODO - rename calculateRootHash() to be clearer that it updates state, it doesn't just
// calculate a hash
if (worldStateKeyValueStorage.getFlatDbStrategy() instanceof BonsaiArchiveFlatDbStrategy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to put it all the time, it doesn't really have a huge impact and avoids an additional condition

provideCachedWorldStorageManager(bonsaiCachedWorldStorageManager);
loadPersistedState(
new BonsaiWorldState(
this, worldStateKeyValueStorage, evmConfiguration, defaultWorldStateConfig));
}

@Override
public Optional<MutableWorldState> getMutable(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to be useless ? it's already in DiffBasesWorldStateProvider class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - I suspect I needed it before the refactoring I did. I've removed it in the latest commit.

* Puts the account data for the given account hash and block context.
*/
@Override
public void putFlatAccount(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot add direlctly the blocknumber in the method fields instead of passing a context ? like that the implementation decide ot use it or not ? just a proposition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest refactoring is cleaner in this regard I believe.

@@ -153,7 +155,7 @@ public synchronized Optional<MutableWorldState> getMutable(
return rollMutableStateToBlockHash(persistedState, blockHash);
}

Optional<MutableWorldState> rollMutableStateToBlockHash(
protected Optional<MutableWorldState> rollMutableStateToBlockHash(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can go back to private

@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch 2 times, most recently from 877a5f3 to 7d42770 Compare January 20, 2025 17:52
@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch from 7d42770 to 633fad4 Compare January 20, 2025 17:52
Signed-off-by: Matthew Whitehead <[email protected]>
@matthew1001 matthew1001 force-pushed the multi-version-flat-db-rebase branch from 7809755 to 3d8d8c7 Compare January 21, 2025 15:35
matthew1001 and others added 2 commits January 21, 2025 15:35
@matthew1001 matthew1001 requested a review from matkt January 22, 2025 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants