Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: database garbage collection #2638

Merged
merged 66 commits into from
Mar 22, 2023
Merged

feat: database garbage collection #2638

merged 66 commits into from
Mar 22, 2023

Conversation

hanabi1224
Copy link
Contributor

@hanabi1224 hanabi1224 commented Mar 8, 2023

Summary of changes

Changes introduced in this pull request:

  • Implement a DB wrapper as a drop-in replacement that can be garbage collected.
  • Hook the DB wrapper into ChainStore
  • Implement a garbage collection fn that can be scheduled in background task
  • Trigger GC task with forest-cli

(Blocked by #2635)

Reference issue to close (if applicable)

Closes
#2292
#1708

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

forest/daemon/src/daemon.rs Outdated Show resolved Hide resolved
ipld/src/util.rs Outdated Show resolved Hide resolved
node/db/src/rolling/impls.rs Outdated Show resolved Hide resolved
node/db/src/rolling/impls.rs Outdated Show resolved Hide resolved
node/db/src/rolling/index.rs Outdated Show resolved Hide resolved
node/db/src/rolling/mod.rs Outdated Show resolved Hide resolved
@hanabi1224 hanabi1224 marked this pull request as ready for review March 9, 2023 16:09
node/db/src/rolling/impls.rs Outdated Show resolved Hide resolved
@lemmih
Copy link
Contributor

lemmih commented Mar 17, 2023

Ran benchmarks with the entire snapshot in the "old" partition and it doesn't affect performance in any way I can measure.

Copy link
Contributor

@lemmih lemmih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright!

I'll add issues for the following improvements:

  • Load data directly into the old space rather than the new space.
  • Show progress during garbage collection (and chain exporting).
  • Possibly get rid of the Store trait altogether.
  • Yield during GC to allow Forest to shutdown properly.

Copy link
Member

@LesnyRumcajs LesnyRumcajs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be fantastic to make a blogpost about it :)

Comment on lines +164 to +180
let db_garbage_collector = {
let db = db.clone();
let chain_store = chain_store.clone();
let get_tipset = move || chain_store.heaviest_tipset().as_ref().clone();
Arc::new(DbGarbageCollector::new(db, get_tipset))
};

#[allow(clippy::redundant_async_block)]
services.spawn({
let db_garbage_collector = db_garbage_collector.clone();
async move { db_garbage_collector.collect_loop_passive().await }
});
#[allow(clippy::redundant_async_block)]
services.spawn({
let db_garbage_collector = db_garbage_collector.clone();
async move { db_garbage_collector.collect_loop_event().await }
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to move it to a separate method. To my understanding, those calls in general need each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect_loop_passive and collect_loop_event work independently, either or both can be disabled

@@ -583,28 +600,28 @@ mod test {

use super::*;

#[tokio::test]
#[tokio::test(flavor = "multi_thread")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshot import spawns a task for writing to DB now, so that single thread become insufficient

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the single threaded runtime insufficient? Surely it can run multiple tasks in parallel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test deadlocked so that I made the change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to debug why that happened. We shouldn't be doing anything blocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reverted the multi_threading change and the deadlock is gone
350fd3f

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last time I encountered deadlock with tokio was because there were multiple runtimes. It was a bug. :)

node/db/src/rolling/gc.rs Outdated Show resolved Hide resolved
node/db/src/rolling/gc.rs Outdated Show resolved Hide resolved
//! used to speed up the database write operation
//!
//! ## Scheduling
//! 1. GC is triggered automatically when total DB size is greater than 2x of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to add an actual mainnet example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do u mean an example output in log?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just example numbers with mainnet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example numbers added

node/db/src/rolling/gc.rs Outdated Show resolved Hide resolved
.ok_or_else(|| anyhow::anyhow!("Cid {cid} not found in blockstore"))?;

let pair = (cid, block.clone());
// Key size is 32 bytes in paritydb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, does this work at all with RocksDb? If not, how does Forest behave when RocksDb is used?

Copy link
Contributor Author

@hanabi1224 hanabi1224 Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All functionalities work with rocksdb, however, I did not test thru the crash resilience story for rocksdb so it might not be able to recover when forest is shutdown improperly. I believe we had crash resilience issue with rocksdb before so I think the potential regression is not introduced by this change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you tested that the GC works with rocksdb? How long does it take? I'm asking because the key size value is hardcoded here.

Copy link
Contributor Author

@hanabi1224 hanabi1224 Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time cost is similar. Key size is used for estimating reachable data size as trigger conditions. However, rocksdb is compressed while paritydb is (mostly) not, the gc trigger interval of rocksdb tends to be longer

Comment on lines +8 to +12
//! This module contains a concurrent, semi-space garbage collector. The garbage
//! collector is guaranteed to be non-blocking and can be expected to run with a
//! fixed memory overhead and require disk space proportional to the size of the
//! reachable graph. For example, if the size of the reachable graph is 100GiB,
//! expect this garbage collector to use 3x100GiB = 300GiB of storage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging this, we should update the specs of our DO droplets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, 320GB is not sufficient if docker and rust toolchains need to be installed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we could mount a volume of ~500GB on the fly and symbol link the data folder

}
}
self.put_many_keyed(buffer)?;
info!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this instead a debug? Also, I'm not sure calling timers by default makes sense in a performance-related method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is called either during snapshot import or during database garbage collection which take 1-2h on laptop or 3-5h on DO droplet. The info log should never flood the screen but it's quite useful for collecting metrics. I would prefer to keep it info level so that it's available in the log by default. What do u think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's helpful, perhaps let's have a proper Prometheus metric? Like, average speed or something like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I think log of individual runs is more helpful than aggregated metrics to understand details and nature of the database and for troubleshooting. How about keeping it as it is, given it only emits 2-3 lines every 1.5-2 days?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

utils/forest_utils/src/db/mod.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add garbage collection to Forest Spike: garbage collection in Forest
3 participants