Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: database garbage collection #2638

Merged
merged 66 commits into from
Mar 22, 2023
Merged
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
a50a0d3
feat: database garbage collection
hanabi1224 Mar 8, 2023
d270673
rolling db impl and tests
hanabi1224 Mar 8, 2023
f07e305
Merge remote-tracking branch 'origin/main' into hm/db-gc
hanabi1224 Mar 8, 2023
da3ce6a
fix lints
hanabi1224 Mar 8, 2023
621c11f
ri FileBacked change
hanabi1224 Mar 8, 2023
ae02c58
Merge remote-tracking branch 'origin/main' into hm/db-gc
hanabi1224 Mar 8, 2023
e87c67c
remove &mut self from pub APIs
hanabi1224 Mar 8, 2023
baf8d07
hook up proxy_db
hanabi1224 Mar 8, 2023
0a6f642
background gc task
hanabi1224 Mar 8, 2023
1a40602
cli command and CI
hanabi1224 Mar 9, 2023
ac96d4e
fix gc event
hanabi1224 Mar 9, 2023
5fc9f51
Merge branch 'main' into hm/db-gc
hanabi1224 Mar 9, 2023
9d8eb59
buffered copy during GC
hanabi1224 Mar 9, 2023
08b0d66
Merge branch 'main' into hm/db-gc
hanabi1224 Mar 9, 2023
ee240af
resolve review comments
hanabi1224 Mar 9, 2023
8bbf6db
Merge remote-tracking branch 'origin/main' into hm/db-gc
hanabi1224 Mar 9, 2023
f89a7a7
aggregated error
hanabi1224 Mar 9, 2023
307588d
fix(ci): move gc to before snapshot export
hanabi1224 Mar 9, 2023
fa06eec
reduce loop depth
hanabi1224 Mar 10, 2023
f202ca7
Merge remote-tracking branch 'origin/main' into hm/db-gc
hanabi1224 Mar 10, 2023
9edd0f4
resolve review comments
hanabi1224 Mar 10, 2023
d45fc0b
Merge remote-tracking branch 'origin/main' into hm/db-gc
hanabi1224 Mar 10, 2023
18003a7
changes as requested
hanabi1224 Mar 13, 2023
6e3339c
bypass GC at epoch 0
hanabi1224 Mar 13, 2023
9f60ea2
Bypass size checking during import
hanabi1224 Mar 13, 2023
6c930bb
use bounded channel to reduce mem usage during db re-index
hanabi1224 Mar 13, 2023
0181003
docs and fixes
hanabi1224 Mar 13, 2023
8b4fd5c
log total entries and total size
hanabi1224 Mar 13, 2023
7a4fda3
fix docs ci
hanabi1224 Mar 13, 2023
db49a6a
remove invalid options from `chain export`
hanabi1224 Mar 13, 2023
df4abb4
fix typos, add design goals and reasoning
hanabi1224 Mar 13, 2023
18c824a
wordsmith
hanabi1224 Mar 13, 2023
314f742
fix lint
hanabi1224 Mar 13, 2023
1a37851
update doc
hanabi1224 Mar 13, 2023
59901c0
fix doc and auto-gc trigger condition
hanabi1224 Mar 13, 2023
6829e7e
fix ut
hanabi1224 Mar 13, 2023
0bad463
Merge remote-tracking branch 'origin/main' into hm/db-gc
hanabi1224 Mar 15, 2023
6d35873
docs
hanabi1224 Mar 16, 2023
b351971
Merge branch 'main' into hm/db-gc
hanabi1224 Mar 16, 2023
ed76a29
suppress clippy warnings
hanabi1224 Mar 16, 2023
fa7a145
Merge branch 'main' into hm/db-gc
hanabi1224 Mar 16, 2023
849d0b1
Merge branch 'hm/db-gc' of github.com:ChainSafe/forest into hm/db-gc
hanabi1224 Mar 16, 2023
fc421d5
Merge branch 'main' into hm/db-gc
hanabi1224 Mar 16, 2023
ce7f890
remove fn next_partition
hanabi1224 Mar 16, 2023
f90703c
fix clippy
hanabi1224 Mar 16, 2023
578ed1e
set visibility of next_current to pub(crate)
hanabi1224 Mar 16, 2023
f757e84
fix cold start issue for last_reachable_bytes
hanabi1224 Mar 16, 2023
c13ebb9
fix reachable_bytes calculation
hanabi1224 Mar 16, 2023
8fca01f
Merge remote-tracking branch 'origin/main' into hm/db-gc
hanabi1224 Mar 17, 2023
47be513
Merge branch 'main' into hm/db-gc
hanabi1224 Mar 17, 2023
1ce7265
Merge branch 'main' into hm/db-gc
hanabi1224 Mar 17, 2023
3ed8005
Merge remote-tracking branch 'origin/main' into hm/db-gc
hanabi1224 Mar 17, 2023
b7d4c8d
simplify open_db
hanabi1224 Mar 20, 2023
ee30872
change db_gc rpc access to write
hanabi1224 Mar 20, 2023
c341894
Merge branch 'main' into hm/db-gc
hanabi1224 Mar 20, 2023
6896b53
Apply suggestions from code review
hanabi1224 Mar 21, 2023
eeaee7e
Merge remote-tracking branch 'origin/main' into hm/db-gc
hanabi1224 Mar 21, 2023
a66158d
const DB_KEY_SIZE
hanabi1224 Mar 21, 2023
350fd3f
revert snapshot import tests to using single thread
hanabi1224 Mar 21, 2023
44e079a
Merge remote-tracking branch 'origin/main' into hm/db-gc
hanabi1224 Mar 21, 2023
ae67da5
sample numbers
hanabi1224 Mar 21, 2023
e437e01
Merge remote-tracking branch 'origin/main' into hm/db-gc
hanabi1224 Mar 21, 2023
220a03c
Merge remote-tracking branch 'origin/main' into hm/db-gc
hanabi1224 Mar 22, 2023
2586fe8
Merge branch 'main' into hm/db-gc
hanabi1224 Mar 22, 2023
a42ac9b
Merge branch 'main' into hm/db-gc
hanabi1224 Mar 22, 2023
2d077eb
FOREST_GC_TRIGGER_FACTOR env var
hanabi1224 Mar 22, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ Notable updates:
- [forest daemon] Support for NV18.
[#2558](https://github.com/ChainSafe/forest/pull/2558)
[#2579](https://github.com/ChainSafe/forest/pull/2579)
- [forest daemon] Automatic database garbage collection.
[#2638](https://github.com/ChainSafe/forest/pull/2638)

### Changed

Expand Down
37 changes: 36 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ bls-signatures = { version = "0.12", default-features = false, features = ["blst
byteorder = "1.4.3"
bytes = "1.2"
cfg-if = "1"
chrono = { version = "0.4", default-features = false, features = [] }
chrono = { version = "0.4", default-features = false, features = ["clock"] }
cid = { version = "0.8", default-features = false, features = ["std"] }
clap = { version = "4.0", features = ["derive"] }
console-subscriber = { version = "0.1", features = ["parking_lot"] }
Expand Down Expand Up @@ -125,6 +125,7 @@ serde_ipld_dagcbor = "0.2"
serde_json = "1.0"
serde_repr = "0.1.8"
serde_with = { version = "2.0.1", features = ["chrono_0_4"] }
serde_yaml = "0.9"
sha2 = { version = "0.10.5", default-features = false }
tempfile = "3.4"
thiserror = "1.0"
Expand Down
1 change: 0 additions & 1 deletion blockchain/chain/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ forest_metrics.workspace = true
forest_networks.workspace = true
forest_shim.workspace = true
forest_utils.workspace = true
futures.workspace = true
fvm_ipld_amt.workspace = true
fvm_ipld_blockstore.workspace = true
fvm_ipld_car.workspace = true
Expand Down
76 changes: 6 additions & 70 deletions blockchain/chain/src/store/chain_store.rs
Original file line number Diff line number Diff line change
@@ -1,29 +1,25 @@
// Copyright 2019-2023 ChainSafe Systems
// SPDX-License-Identifier: Apache-2.0, MIT

use std::{collections::VecDeque, num::NonZeroUsize, path::Path, sync::Arc, time::SystemTime};
use std::{num::NonZeroUsize, path::Path, sync::Arc, time::SystemTime};

use ahash::{HashMap, HashMapExt, HashSet};
use anyhow::Result;
use async_stream::stream;
use bls_signatures::Serialize as SerializeBls;
use cid::{
multihash::{Code, Code::Blake2b256},
Cid,
};
use cid::{multihash::Code::Blake2b256, Cid};
use digest::Digest;
use forest_beacon::{BeaconEntry, IGNORE_DRAND_VAR};
use forest_blocks::{Block, BlockHeader, FullTipset, Tipset, TipsetKeys, TxMeta};
use forest_encoding::de::DeserializeOwned;
use forest_interpreter::BlockMessages;
use forest_ipld::{recurse_links_hash, CidHashSet};
use forest_ipld::{should_save_block_to_snapshot, walk_snapshot};
use forest_libp2p_bitswap::{BitswapStoreRead, BitswapStoreReadWrite};
use forest_message::{ChainMessage, Message as MessageTrait, SignedMessage};
use forest_metrics::metrics;
use forest_networks::ChainConfig;
use forest_shim::{
address::Address,
clock::EPOCHS_IN_DAY,
crypto::{Signature, SignatureType},
econ::TokenAmount,
executor::Receipt,
Expand All @@ -37,7 +33,6 @@ use forest_utils::{
},
io::Checksum,
};
use futures::Future;
use fvm_ipld_amt::Amtv0 as Amt;
use fvm_ipld_blockstore::Blockstore;
use fvm_ipld_car::CarHeader;
Expand Down Expand Up @@ -563,23 +558,18 @@ where

// Walks over tipset and historical data, sending all blocks visited into the
// car writer.
Self::walk_snapshot(tipset, recent_roots, |cid| {
walk_snapshot(tipset, recent_roots, |cid| {
let tx_clone = tx.clone();
async move {
let block = self
.blockstore()
.get(&cid)?
.ok_or_else(|| Error::Other(format!("Cid {cid} not found in blockstore")))?;

// Don't include identity CIDs.
// We only include raw and dagcbor, for now.
// Raw for "code" CIDs.
if u64::from(Code::Identity) != cid.hash().code()
&& (cid.codec() == fvm_shared::IPLD_RAW
|| cid.codec() == fvm_ipld_encoding::DAG_CBOR)
{
if should_save_block_to_snapshot(&cid) {
tx_clone.send_async((cid, block.clone())).await?;
}

Ok(block)
}
})
Expand All @@ -605,60 +595,6 @@ where
let digest = writer.lock().await.get_mut().finalize();
Ok(digest)
}

/// Walks over tipset and state data and loads all blocks not yet seen.
/// This is tracked based on the callback function loading blocks.
pub async fn walk_snapshot<F, T>(
tipset: &Tipset,
recent_roots: ChainEpoch,
mut load_block: F,
) -> Result<(), Error>
where
F: FnMut(Cid) -> T + Send,
T: Future<Output = Result<Vec<u8>, anyhow::Error>> + Send,
{
let mut seen = CidHashSet::default();
let mut blocks_to_walk: VecDeque<Cid> = tipset.cids().to_vec().into();
let mut current_min_height = tipset.epoch();
let incl_roots_epoch = tipset.epoch() - recent_roots;

while let Some(next) = blocks_to_walk.pop_front() {
if !seen.insert(&next) {
continue;
}

let data = load_block(next).await?;

let h = BlockHeader::unmarshal_cbor(&data)?;

if current_min_height > h.epoch() {
current_min_height = h.epoch();
if current_min_height % EPOCHS_IN_DAY == 0 {
info!(target: "chain_api", "export at: {}", current_min_height);
}
}

if h.epoch() > incl_roots_epoch {
recurse_links_hash(&mut seen, *h.messages(), &mut load_block).await?;
}

if h.epoch() > 0 {
for p in h.parents().cids() {
blocks_to_walk.push_back(*p);
}
} else {
for p in h.parents().cids() {
load_block(*p).await?;
}
}

if h.epoch() == 0 || h.epoch() > incl_roots_epoch {
recurse_links_hash(&mut seen, *h.state_root(), &mut load_block).await?;
}
}

Ok(())
}
}

pub(crate) type TipsetCache = Mutex<LruCache<TipsetKeys, Arc<Tipset>>>;
Expand Down
1 change: 1 addition & 0 deletions forest/cli/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ anyhow.workspace = true
atty = "0.2"
base64.workspace = true
boa_engine = { version = "0.16.0", features = ["console"] }
chrono.workspace = true
cid.workspace = true
clap.workspace = true
convert_case = "0.6.0"
Expand Down
28 changes: 23 additions & 5 deletions forest/cli/src/cli/db_cmd.rs
Original file line number Diff line number Diff line change
@@ -1,17 +1,21 @@
// Copyright 2019-2023 ChainSafe Systems
// SPDX-License-Identifier: Apache-2.0, MIT

use chrono::Utc;
use clap::Subcommand;
use forest_cli_shared::{chain_path, cli::Config};
use forest_db::db_engine::db_path;
use forest_db::db_engine::db_root;
use forest_rpc_client::db_ops::db_gc;
use log::error;

use crate::cli::prompt_confirm;
use crate::cli::{handle_rpc_err, prompt_confirm};

#[derive(Debug, Subcommand)]
pub enum DBCommands {
/// Show DB stats
Stats,
/// Run DB garbage collection
GC,
/// DB Clean up
Clean {
/// Answer yes to all forest-cli yes/no questions without prompting
Expand All @@ -21,19 +25,33 @@ pub enum DBCommands {
}

impl DBCommands {
pub fn run(&self, config: &Config) -> anyhow::Result<()> {
pub async fn run(&self, config: &Config) -> anyhow::Result<()> {
match self {
Self::Stats => {
use human_repr::HumanCount;

let dir = db_path(&chain_path(config));
let dir = db_root(&chain_path(config));
println!("Database path: {}", dir.display());
let size = fs_extra::dir::get_size(dir).unwrap_or_default();
println!("Database size: {}", size.human_count_bytes());
Ok(())
}
Self::GC => {
let start = Utc::now();

db_gc((), &config.client.rpc_token)
.await
.map_err(handle_rpc_err)?;

println!(
"DB GC completed. took {}s",
(Utc::now() - start).num_seconds()
);

Ok(())
}
Self::Clean { force } => {
let dir = db_path(&chain_path(config));
let dir = db_root(&chain_path(config));
if !dir.is_dir() {
println!(
"Aborted. Database path {} is not a valid directory",
Expand Down
Loading