Skip to content

Commit

Permalink
Purge the query cache to prevent GC livelocks (#7370)
Browse files Browse the repository at this point in the history
When the GC needs to reclaim memory but has already exhausted all the
data from the store, then purge the query cache directly.
The complete rationale for this patch is there:
#7369 (comment).

Before:

![image](https://github.com/user-attachments/assets/0ee825d1-11a5-43ad-b4ac-4e9ee6134108)

After:

![image](https://github.com/user-attachments/assets/6bb83281-94e7-42ca-8cd6-5502bdefe47b)


- Fixes #7369

### Checklist
* [x] I have read and agree to [Contributor
Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and
the [Code of
Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md)
* [x] I've included a screenshot or gif (if applicable)
* [x] I have tested the web demo (if applicable):
* Using examples from latest `main` build:
[rerun.io/viewer](https://rerun.io/viewer/pr/7370?manifest_url=https://app.rerun.io/version/main/examples_manifest.json)
* Using full set of examples from `nightly` build:
[rerun.io/viewer](https://rerun.io/viewer/pr/7370?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json)
* [x] The PR title and labels are set such as to maximize their
usefulness for the next release's CHANGELOG
* [x] If applicable, add a new check to the [release
checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)!
* [x] If have noted any breaking changes to the log API in
`CHANGELOG.md` and the migration guide

- [PR Build Summary](https://build.rerun.io/pr/7370)
- [Recent benchmark results](https://build.rerun.io/graphs/crates.html)
- [Wasm size tracking](https://build.rerun.io/graphs/sizes.html)

To run all checks from `main`, comment on the PR with `@rerun-bot
full-check`.
  • Loading branch information
teh-cmc authored Sep 9, 2024
1 parent 45da3a1 commit a150b3c
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 3 deletions.
15 changes: 12 additions & 3 deletions crates/store/re_entity_db/src/entity_db.rs
Original file line number Diff line number Diff line change
Expand Up @@ -386,14 +386,21 @@ impl EntityDb {
re_tracing::profile_function!();

assert!((0.0..=1.0).contains(&fraction_to_purge));
self.gc(&GarbageCollectionOptions {
if !self.gc(&GarbageCollectionOptions {
target: GarbageCollectionTarget::DropAtLeastFraction(fraction_to_purge as _),
protect_latest: 1,
time_budget: DEFAULT_GC_TIME_BUDGET,
});
}) {
// If we weren't able to collect any data, then we need to GC the cache itself in order
// to regain some space.
// See <https://github.com/rerun-io/rerun/issues/7369#issuecomment-2335164098> for the
// complete rationale.
self.query_caches.purge_fraction_of_ram(fraction_to_purge);
}
}

pub fn gc(&mut self, gc_options: &GarbageCollectionOptions) {
/// Returns `true` if anything at all was actually GC'd.
pub fn gc(&mut self, gc_options: &GarbageCollectionOptions) -> bool {
re_tracing::profile_function!();

let (store_events, stats_diff) = self.data_store.gc(gc_options);
Expand All @@ -405,6 +412,8 @@ impl EntityDb {
);

self.on_store_deletions(&store_events);

!store_events.is_empty()
}

/// Unconditionally drops all the data for a given [`EntityPath`] .
Expand Down
22 changes: 22 additions & 0 deletions crates/store/re_query/src/latest_at.rs
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,28 @@ impl Caches {

results
}

/// Free up some RAM by forgetting the older parts of all timelines.
pub fn purge_fraction_of_ram(&mut self, fraction_to_purge: f32) {
re_tracing::profile_function!();

let mut caches = self.latest_at_per_cache_key.write();
for (_key, cache) in caches.iter_mut() {
let mut cache = cache.write();

let split_point =
(cache.per_query_time.len().saturating_sub(1) as f32 * fraction_to_purge) as usize;

if let Some(split_time) = cache.per_query_time.keys().nth(split_point).copied() {
// NOTE: By not clearing the pending invalidations set, we risk invalidating a
// future result that need not be invalidated.
// That is a much better outcome that the opposite though: not invalidating a
// future result that in fact should have been.
// See `handle_pending_invalidation` for more information.
cache.per_query_time = cache.per_query_time.split_off(&split_time);
}
}
}
}

// --- Results ---
Expand Down

0 comments on commit a150b3c

Please sign in to comment.