Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design for "Video Frame References" #7272

Closed
jleibs opened this issue Aug 26, 2024 · 2 comments
Closed

Design for "Video Frame References" #7272

jleibs opened this issue Aug 26, 2024 · 2 comments
Assignees
Labels
🪵 Log & send APIs Affects the user-facing API for all languages 🍏 primitives Relating to Rerun primitives 🎞️ video

Comments

@jleibs
Copy link
Member

jleibs commented Aug 26, 2024

Context

The original idea was that we could use the delta time between when the video is logged and the current time on the timeline to determine which frame is presented. While this works in theory, in practice there are several related issues:

  • How time-points are mapped to the current timeline selection has a lot of room for confusion. Need assorted options such as per-timeline scalefactor that are easy to get wrong.
  • Because we don't know the duration of the video, the timeline just shows it as a single event at the start-time. However, it would still be nice for users to be able to see the actual time-points where the different frames occur.
  • When querying data back from Rerun it would also be nice to be able to make reference to a video frame for alignment with other data in the same joined row.

Possible solutions

We suspect that we want some way of converting a video into a bunch of events that all reference the same video payload at log-time. However there are a number of choices for how we approach this.

The HF lerobot dataset, for example, encodes video in a column as Struct<uri: String, pts: Timestamp>

We could do something similar, but we still have many choices for what kind of reference to support:

  • An implicit component on the same entity? LatestVideo
  • A concrete entity + row?
  • A generic side-band "blob asset"?

Additionally, there are open questions for how the column of video-frame-reference data is generated:

  • We could just make user do this themselves. If they encoded the video they should know the pts timestamps already. This would certainly be the easiest to implement but maybe a heavy lift for users.
  • We could provide rust-code which parses the mp4 container and does this for the user via something like log_video, or returns the list of PTS timestamps back to user code.
  • We could still think of this as some kind of "conversion" process that happens downstream of logging and creates a new derived column from the original data.
@jleibs jleibs added 🍏 primitives Relating to Rerun primitives 🪵 Log & send APIs Affects the user-facing API for all languages labels Aug 26, 2024
@rerun-io rerun-io deleted a comment Aug 26, 2024
@emilk
Copy link
Member

emilk commented Sep 3, 2024

Data model

/// Maps a time point in a video to one or more Rerun timelines.
// TODO: a better name (we're reference a video _time_, not a frame)
component struct VideoFrame {
    /// References the closest video frame to this time.
    /// (closest-to instead of latest-at to forgive rounding errors)
    video_time: i64,
    
    time_mode: VideoTimeMode
    
    /// Reference to an entity with a `VideoAsset`
    /// (OR to an entity with a BlobURI in it, for extra indirection)
    // null = self
    video: Option<EntityPath>,
}

enum VideoTimeMode {
    PtsNanoseconds, // Overkill accuracy for future proofing

   // Save for future: FrameNr
}

SDK

Our logging SDK should had a helper for generating all VideoFrame for a video.

mp4_blob = load_file("foo.mp4")

rr.log("video", VideoAsset(content=mp4_blob))

# Be explicit:
rr.send_columns("video",
    capture_time=[10.0, 10.16, 10.33, 10.50],
    [VideoFrame(ms=0), VideoFrame(ms=16), VideoFrame(ms=33), VideoFrame(ms=50)]
);

# Or use our helper function:
rr.send_video_frames(mp4_blob, capture_time=10.0)

Viewer

Our VideoVisualizer queries for VideoFrame.
We follow a VideoFrame via another latest-at query on the given entity path.

A VideoAsset without VideoFrame will not show up in any space view.

Selecting a video should ideally show that video in the selection view (with independent playback).
It should also be able to show the actual frame timestamps in the video.

@emilk
Copy link
Member

emilk commented Sep 6, 2024

Will be implemented in #7368

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪵 Log & send APIs Affects the user-facing API for all languages 🍏 primitives Relating to Rerun primitives 🎞️ video
Projects
None yet
Development

No branches or pull requests

4 participants
@emilk @jprochazk @jleibs and others