Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promises: bigger-than-RAM #5247

Closed
emilk opened this issue Feb 21, 2024 · 2 comments
Closed

Promises: bigger-than-RAM #5247

emilk opened this issue Feb 21, 2024 · 2 comments
Labels
enhancement New feature or request ⛃ re_datastore affects the datastore itself 🎄 tracking issue issue that tracks a bunch of subissues

Comments

@emilk
Copy link
Member

emilk commented Feb 21, 2024

Goals

Support some forms of “bigger-than-RAM” recordings, as soon as possible

Background

Small-index vs Big-index

Table index: row ids and time points.

Does the table index fit in RAM?

Hypothesis: most “bigger-than-RAM” problems have smallish indices.

Big index

Example: 100GB of scalar plots

We need a hierarchical index file on disk, with seeking, and have store-subscribers that are aware of this, etc. Difficult!

Small index

Example: thousands of uncompressed 4k images, or big point clouds, meshes, …

We “just” need to figure out how load blobs from disk on-demand. Easier!

Promises: a solution to small-index

We replace large blobs with promises, that refer to the external data.

A promise could be a file path with optional byte offset, a URL, …

When a query results in a Promise, we (try) to resolve it.

Example: we go through a huge MCAP file and log it to Rerun, but replace big blobs by a Promise referring to a byte-offset in the MCAP.

User stories

  • Logging a file reference
    • rr.log(”image”, rr.Image(data=rr.Promise.file_path(”foo.jpg”)))
  • VRS
    • file://recording.vrs?stream=video&time=42
  • Log a video file
    for i, frame in enumerate(video):
        rr.set_time_point("frame", i)
        rr.log(”video”, rr.Image(data=rr.Promise.file_path(ffoo.mp4?frame={i}”)))

Design

A Promise is a datatype, which can be used for any component.
So a component.Point3D can be represented by datatype.Promise
A promise contains a single URI string.

A promise resolves to some IPC Arrow data (or an error, or pending).

The promise is resolved late, after primary caches, close to the UI/visualizer.

/// The data of component. `ComponentResult` a better name?
enum ComponentResult<'data, T> {
    /// The entity doesn't have this component
    None,
    
    /// Wait for it - it is being loaded in the background
    Pending,
    
    /// Failed to load.
    Error(String),
    
    /// The data is decoded and ready.
    /// A slice into the secondary promise cache (if it was a promise)
    Data(&'data [T]),
}

impl PromiseResult {
    fn map() -> …
}

MVP

  • log huge files, index them after, then open the small index
  • Shortcomings:
    • Some stalling when time-scrubbing
    • No web support
    • Local files only

Steps

  • Add a PromiseCache returning ComponentResult<'a, T>
  • entity_iterator should either
    • return a MaybePromise<T> for each component (leaving it to the user to resolve)
    • or a ComponentResult<'a, T> for each component
  • Put datatype-name in the meta-data of each DataCell
  • Built-in resolver for [file://…?bytes=…](file://)
    • Immediate, fseek
    • IPC Arrow data at a byte offset, or ArrowMsg at offset + index in it
  • rerun index huge.rrd > indexed.rrd
    • creates “indexed” version of rrd which replaces components with promises and puts the raw blobs elsewhere in the file
    • two files as alternative, but single file preferred
    • “self” uri, for referring to the same file
  • gc PromiseCache

Post-MVP

Latency-aware

  • Start using in ComponentResult in visualizers
  • make resolver async
  • Some latency resolver strategy
    • experiment with simulated latency etc.

Promise resolvers

  • Custom HTTP(S) resolver
  • VRS resolver

SDK-aware

Each of these adds additional abilities:

  • Auto-promsify sink in the SDK
  • log promise components directly rr.log("mypoints", rr.Promise(Position3D.name, uri))
  • Support promises for all archetypes
    • Rust: replace Option<Vec<Position3D>> with MaybePromise<Vec<Position3D>>
    • Python: isinstance
    • C++: enhance or wrap Collection type
@emilk emilk added enhancement New feature or request 🎄 tracking issue issue that tracks a bunch of subissues ⛃ re_datastore affects the datastore itself labels Feb 21, 2024
@andresv
Copy link

andresv commented Mar 13, 2024

Making video frame seeking really fast would be huge. My use case is hours of video and telemetry recordings from UAVs and I would like to dump all of them to rerun and then step frame by frame back and forward somewhere in the middle of the video to study frames and corresponding telemetry.

@teh-cmc
Copy link
Member

teh-cmc commented Nov 26, 2024

Fine-grained promises (i.e. at the cell/componentbatch level) have proven to be the wrong tool for the job: we've since removed all remaining traces of them.

External data references, promises and larger-than-RAM storage can all be solved much more efficiently using a combination of our new storage node and Chunk-level processors.

See #8221 for more information about Chunk processors:

@teh-cmc teh-cmc closed this as not planned Won't fix, can't repro, duplicate, stale Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ⛃ re_datastore affects the datastore itself 🎄 tracking issue issue that tracks a bunch of subissues
Projects
None yet
Development

No branches or pull requests

3 participants