Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce VideoFrameReference archetype and base video visualization on it #7396

Merged
merged 12 commits into from
Sep 11, 2024
1 change: 1 addition & 0 deletions crates/store/re_data_loader/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ re_video.workspace = true

ahash.workspace = true
anyhow.workspace = true
arrow2.workspace = true
image.workspace = true
once_cell.workspace = true
parking_lot.workspace = true
Expand Down
184 changes: 106 additions & 78 deletions crates/store/re_data_loader/src/loader_archetype.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,13 @@
use re_chunk::{Chunk, RowId};
use re_log_types::NonMinI64;
use re_log_types::{EntityPath, TimeInt, TimePoint};
use re_types::components::MediaType;
use re_types::archetypes::VideoFrameReference;
use re_types::Archetype;
use re_types::{components::MediaType, ComponentBatch};

use arrow2::array::{
ListArray as ArrowListArray, NullArray as ArrowNullArray, PrimitiveArray as ArrowPrimitiveArray,
};
use arrow2::Either;

use crate::{DataLoader, DataLoaderError, LoadedData};

Expand Down Expand Up @@ -164,47 +170,6 @@ fn load_image(
Ok(rows.into_iter())
}

/// TODO(#7272): fix this
/// Used to expand the timeline when logging a video, so that the video can be played back.
#[derive(Clone, Copy)]
struct VideoTick(re_types::datatypes::Float64);

impl re_types::AsComponents for VideoTick {
fn as_component_batches(&self) -> Vec<re_types::MaybeOwnedComponentBatch<'_>> {
vec![re_types::NamedIndicatorComponent("VideoTick".into()).to_batch()]
}
}

impl re_types::Loggable for VideoTick {
type Name = re_types::ComponentName;

fn name() -> Self::Name {
"rerun.components.VideoTick".into()
}

fn arrow_datatype() -> re_chunk::external::arrow2::datatypes::DataType {
re_types::datatypes::Float64::arrow_datatype()
}

fn to_arrow_opt<'a>(
data: impl IntoIterator<Item = Option<impl Into<std::borrow::Cow<'a, Self>>>>,
) -> re_types::SerializationResult<Box<dyn re_chunk::external::arrow2::array::Array>>
where
Self: 'a,
{
re_types::datatypes::Float64::to_arrow_opt(
data.into_iter()
.map(|datum| datum.map(|datum| datum.into().0)),
)
}
}

impl re_types::SizeBytes for VideoTick {
fn heap_size_bytes(&self) -> u64 {
0
}
}

#[derive(Clone, Copy)]
struct ExperimentalFeature;

Expand Down Expand Up @@ -252,51 +217,114 @@ fn load_video(
) -> Result<impl ExactSizeIterator<Item = Chunk>, DataLoaderError> {
re_tracing::profile_function!();

timepoint.insert(
re_log_types::Timeline::new_temporal("video"),
re_log_types::TimeInt::new_temporal(0),
);
let video_timeline = re_log_types::Timeline::new_temporal("video");
timepoint.insert(video_timeline, re_log_types::TimeInt::new_temporal(0));

let media_type = MediaType::guess_from_path(filepath);

let duration_s = match media_type.as_ref().map(|v| v.as_str()) {
Some("video/mp4") => re_video::load_mp4(&contents)
.ok()
.map(|v| v.duration.as_f64() / 1_000.0),
_ => None,
}
.unwrap_or(100.0)
.ceil() as i64;
// TODO(andreas): Video frame reference generation should be available as a utility from the SDK.

let mut rows = vec![Chunk::builder(entity_path.clone())
let video = if media_type.as_ref().map(|v| v.as_str()) == Some("video/mp4") {
match re_video::load_mp4(&contents) {
Ok(video) => Some(video),
Err(err) => {
re_log::warn!("Failed to load video asset {filepath:?}: {err}");
None
}
}
} else {
re_log::warn!("Video asset {filepath:?} has an unsupported container format.");
None
};

// Log video frame references on the `video` timeline.
let video_frame_reference_chunk = if let Some(video) = video {
let first_timestamp = video
.segments
.first()
.map_or(0, |segment| segment.timestamp.as_nanoseconds());

// Time column.
let is_sorted = Some(true);
let time_column_times =
ArrowPrimitiveArray::<i64>::from_values(video.segments.iter().flat_map(|segment| {
segment
.samples
.iter()
.map(|s| s.timestamp.as_nanoseconds() - first_timestamp)
}));

let time_column = re_chunk::TimeColumn::new(is_sorted, video_timeline, time_column_times);

// VideoTimestamp component column.
let video_timestamps = video
.segments
.iter()
.flat_map(|segment| {
segment.samples.iter().map(|s| {
// TODO(andreas): Use sample indices instead of timestamps once possible.
re_types::components::VideoTimestamp::new_nanoseconds(
s.timestamp.as_nanoseconds(),
)
})
})
.collect::<Vec<_>>();
let video_timestamp_batch = &video_timestamps as &dyn ComponentBatch;
let video_timestamp_list_array = video_timestamp_batch
.to_arrow_list_array()
.map_err(re_chunk::ChunkError::from)?;

// Indicator column.
let video_frame_reference_indicator_datatype = arrow2::datatypes::DataType::Null;
let video_frame_reference_indicator_list_array = ArrowListArray::<i32>::try_new(
ArrowListArray::<i32>::default_datatype(
video_frame_reference_indicator_datatype.clone(),
),
video_timestamp_list_array.offsets().clone(),
Box::new(ArrowNullArray::new(
video_frame_reference_indicator_datatype,
video_timestamps.len(),
)),
None,
)
.map_err(re_chunk::ChunkError::from)?;

Some(Chunk::from_auto_row_ids(
re_chunk::ChunkId::new(),
entity_path.clone(),
std::iter::once((video_timeline, time_column)).collect(),
[
(
VideoFrameReference::indicator().name(),
video_frame_reference_indicator_list_array,
),
(video_timestamp_batch.name(), video_timestamp_list_array),
]
.into_iter()
.collect(),
)?)
} else {
None
};

// Put video asset into its own chunk since it can be fairly large.
let video_asset_chunk = Chunk::builder(entity_path.clone())
.with_archetype(
RowId::new(),
timepoint.clone(),
&re_types::archetypes::AssetVideo::from_file_contents(contents, media_type),
&re_types::archetypes::AssetVideo::from_file_contents(contents, media_type.clone()),
)
.with_component_batch(RowId::new(), timepoint.clone(), &ExperimentalFeature)
.build()?];

for i in 0..duration_s {
// We need some breadcrumbs of timepoints because the video doesn't have a duration yet.
// TODO(#7272): fix this
timepoint.insert(
re_log_types::Timeline::new_temporal("video"),
re_log_types::TimeInt::from_seconds(NonMinI64::new(i).expect("i > i64::MIN")),
);

rows.push(
Chunk::builder(entity_path.clone())
.with_component_batch(
RowId::new(),
timepoint.clone(),
&VideoTick(re_types::datatypes::Float64(i as f64)),
)
.build()?,
);
.build()?;

if let Some(video_frame_reference_chunk) = video_frame_reference_chunk {
Ok(Either::Left(
[video_asset_chunk, video_frame_reference_chunk].into_iter(),
))
} else {
// Still log the video asset, but don't include video frames.
Ok(Either::Right(std::iter::once(video_asset_chunk)))
}

Ok(rows.into_iter())
}

fn load_mesh(
Expand Down
3 changes: 2 additions & 1 deletion crates/store/re_types/definitions/rerun/archetypes.fbs

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
namespace rerun.archetypes;

/// A video file.
/// A video binary.
///
/// NOTE: Videos can only be viewed in the Rerun web viewer.
/// Only MP4 and AV1 is currently supported, and not in all browsers.
/// Only MP4 containers with a limited number of codecs are currently supported, and not in all browsers.
/// Follow <https://github.com/rerun-io/rerun/issues/7298> for updates on the native support.
///
/// In order to display a video, you need to log a [archetypes.VideoFrameReference] for each frame.
// TODO(andreas): More docs and examples on how to use this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's point these TODOs to the correct issue

table AssetVideo (
"attr.rerun.experimental"
) {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
namespace rerun.archetypes;

/// References a single video frame.
///
/// Used to display video frames from a [archetypes.AssetVideo].
// TODO(andreas): More docs and examples on how to use this.
table VideoFrameReference (
"attr.rerun.experimental"
){
// --- Required ---

/// References the closest video frame to this time.
///
/// Note that this uses the closest video frame instead of the latest at this timestamp
/// in order to be more forgiving of rounding errors.
// TODO(andreas): Once this can also be a frame index, point out that this is an accurate measure.
timestamp: rerun.components.VideoTimestamp ("attr.rerun.component_required", required, order: 1000);

// --- Optional ---

/// Optional reference to an entity with a [archetypes.AssetVideo].
///
/// If none is specified, the video is assumed to be at the same entity.
/// Note that blueprint overrides on the referenced video will be ignored regardless,
/// as this is always interpreted as a reference to the data store.
video_reference: rerun.components.EntityPath ("attr.rerun.component_optional", nullable, order: 2000);
}
2 changes: 2 additions & 0 deletions crates/store/re_types/definitions/rerun/components.fbs

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions crates/store/re_types/definitions/rerun/components/entity_path.fbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
namespace rerun.components;

/// A path to an entity, usually to reference some data that is part of the target entity.
table EntityPath (
"attr.arrow.transparent",
"attr.python.aliases": "str",
"attr.python.array_aliases": "str, Sequence[str]",
"attr.rust.derive": "Default, PartialEq, Eq, PartialOrd, Ord",
"attr.rust.repr": "transparent"
) {
value: rerun.datatypes.EntityPath (order: 100);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@

namespace rerun.components;

/// Timestamp inside a [archetypes.AssetVideo].
struct VideoTimestamp (
"attr.rust.derive": "Copy, PartialEq, Eq, Default",
"attr.rust.repr": "transparent",
"attr.rerun.experimental"
) {
timestamp: rerun.datatypes.VideoTimestamp (order: 100);
}
1 change: 1 addition & 0 deletions crates/store/re_types/definitions/rerun/datatypes.fbs

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
namespace rerun.datatypes;

/// Specifies how to interpret the `video_time` field of a [datatypes.VideoTimestamp].
enum VideoTimeMode: ubyte{
/// Invalid value. Won't show up in generated types.
Invalid = 0,

/// Presentation timestamp in nanoseconds since the beginning of the video.
Nanoseconds = 1 (default),

// Future values: FrameNr
}

/// Timestamp inside a [archetypes.AssetVideo].
struct VideoTimestamp (
Comment on lines +14 to +15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still love to eventually have time_mode = FrameNumber.

So calling this a Timestamp will create a bit of a strange naming pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coincidentally I brought this very thing up on Slack expressing a similar (but not very deep) concern. @emilk pointed out that frame numbers can be seen as a form timestamp unit in this context.
very open to other suggestions not sure on this one 🤷

https://rerunio.slack.com/archives/C041NHU952S/p1725981749912279?thread_ts=1725981199.569259&cid=C041NHU952S

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ideas from ChatGPT

VideoPosition: Emphasizes that the struct represents a position in the video, which could be either in frames or seconds.
VideoMarker: Suggests a point in the video, adaptable to frames or time.
TimeOrFrame: Makes it clear that the struct can hold either a time value or a frame number.
FrameOrTimeStamp: Explicitly names the two modes.
VideoCoordinate: Represents a general position in time (seconds or frames) as coordinates often imply different dimensions.
MediaTimeUnit: Indicates the struct holds a unit of time in different formats, either frames or seconds.
PlaybackPosition: Focuses on the position in playback, where both time and frame counts are relevant.
TimelineIndex: Suggests an index along a timeline that can be expressed in either frames or time.

"attr.rust.derive": "Copy, PartialEq, Eq",
"attr.rerun.experimental"
) {
/// Timestamp value, type defined by `time_mode`.
video_time: long (order: 100);

/// How to interpret `video_time`.
time_mode: VideoTimeMode (order: 200);
}
1 change: 1 addition & 0 deletions crates/store/re_types/src/archetypes/.gitattributes

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 5 additions & 3 deletions crates/store/re_types/src/archetypes/asset_video.rs

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions crates/store/re_types/src/archetypes/mod.rs

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading