The aim of this proposal is to define browser-native handling of MPEG-DASH timed metadata events, which today is handled at the web application layer. MPEG DASH emsg
events are included in MPEG CMAF, which has emerged as the common media delivery format in HLS and MPEG-DASH.
The current approach for handling in-band event information, implemented by libraries such as dash.js and hls.js, is to parse the media segments in JavaScript to extract the events and construct VTTCue
objects.
On resource constrained devices such as smart TVs and streaming sticks, this leads to a significant performance penalty, which can have an impact on UI rendering updates if this is done on the UI thread (although we note the proposal to make Media Source Extensions available to Worker contexts). There can also be an impact on the battery life of mobile devices. Given that the media segments will be parsed anyway by the user agent, parsing in JavaScript is an expensive overhead that could be avoided.
Avoiding parsing in JavaScript is also important for low latency video streaming applications, where minimizing the time taken to pass media content through to the media element's playback buffer is essential.
Instead of using VTTCue
, a separate proposal introduces DataCue
as a more appropriate cue API for timed metadata. See the DataCue explainer for details.
Many of the use cases are described in the DataCue explainer.
A media content provider wants to allow insertion of content, such as personalised video, local news, or advertisements, into a video media stream that contains the main program content. To achieve this, timed metadata is used to describe the points on the media timeline, known as splice points, where switching playback to inserted content is possible.
SCTE 35 defines a data cue format for describing such insertion points. Use of these cues in MPEG-DASH streams is described in SCTE 214-1, SCTE 214-2, and SCTE 214-3. Use in HLS streams is described in SCTE-35 section 12.2.
MPEG-DASH defines several control messages for media streaming clients (e.g., libraries such as dash.js). Control messages exist for several scenarios, such as:
- The media player should refresh or update its copy of the manifest document (MPD)
- The media player should make an HTTP request to a given URL for analytics purposes
- The media presentation will end at a time earlier than expected
These messages may be carried as in-band emsg
events in the media container files.
The proposed API is based on the existing text track support in HTML and the proposed DataCue
API.
TODO: Add API summary
As new emsg
event types can be introduced from time to time, we propose to expose the raw binary emsg
data for applications to parse. This avoids the need for browsers to natively understand the structure of the event messages.
We will need to specify how to extract in-band timed metadata from the media container, and the structure in which the data is exposed via the DataCue
interface. There are a couple of options for how to do this:
-
We could update the existing Sourcing In-band Media Resource Tracks from Media Containers into HTML spec.
-
We could produce a new set of specifications, following a registry approach with one specification per media format that describes the timed metadata details for that format, similar to the Media Source Extensions Byte Stream Format Registry. This could be based on Sourcing In-band Media Resource Tracks from Media Containers into HTML.
TODO: Needs updating: show how to subscribe to specific event streams, show how to set the dispatch mode.
This example shows how to add a cuechange
handler that can be used to receive media-timed data and event cues.
const video = document.getElementById('video');
video.textTracks.addEventListener('addtrack', (event) => {
const textTrack = event.track;
if (textTrack.kind === 'metadata') {
textTrack.mode = 'hidden';
// See cueChangeHandler examples below
textTrack.addEventListener('cuechange', cueChangeHandler);
}
});
const cueChangeHandler = (event) => {
const metadataTrack = event.target;
const activeCues = metadataTrack.activeCues;
for (let i = 0; i < activeCues.length; i++) {
const cue = activeCues[i];
// The UA delivers parsed message data for this message type
if (cue.type === 'urn:mpeg:dash:event:callback:2015' &&
cue.value.emsgValue === '1') {
const url = cue.value.data;
fetch(url).then(() => { console.log('Callback completed'); });
}
}
};
This example shows how a web application can handle SCTE 35 cues, both in the case where the cues are parsed by the browser implementation, and where parsed by the web application.
const cueChangeHandler = (event) => {
const metadataTrack = event.target;
const activeCues = metadataTrack.activeCues;
for (let i = 0; i < activeCues.length; i++) {
const cue = activeCues[i];
if (cue.type === 'urn:scte:scte35:2013:bin') {
// Parse the SCTE-35 message payload.
// parseSCTE35Data() is similar to Comcast's scte35.js library,
// adapted to take an ArrayBuffer as input.
// https://github.com/Comcast/scte35-js/blob/master/src/scte35.ts
const scte35Message = parseSCTE35Data(cue.value.data);
console.log(cue.startTime, cue.endTime, scte35Message.tableId, scte35Message.spliceCommandType);
}
}
};
This example shows how a web application can use the proposed new addcue
event to attach enter
and exit
handlers to each cue on the metadata track.
// video.currentTime has reached the cue start time
// through normal playback progression
const cueEnterHandler = (event) => {
const cue = event.target;
console.log('cueEnter', cue.startTime, cue.endTime);
};
// video.currentTime has reached the cue end time
// through normal playback progression
const cueExitHandler = (event) => {
const cue = event.target;
console.log('cueExit', cue.startTime, cue.endTime);
};
// A cue has been parsed from the media container
const addCueHandler = (event) => {
const cue = event.cue;
// Attach enter/exit event handlers
cue.onenter = cueEnterhandler;
cue.onexit = cueExitHandler;
};
const video = document.getElementById('video');
video.textTracks.addEventListener('addtrack', (event) => {
const textTrack = event.track;
if (textTrack.kind === 'metadata') {
textTrack.mode = 'hidden';
textTrack.addEventListener('addcue', addCueHandler);
}
});
TODO
This explainer is based on content from a Note written by the W3C Media and Entertainment Interest Group, and from a number of associated discussions, including the TPAC breakout session on video metadata cues. It is also closely related to the DASH-IF DASH Player's Application Events and Timed Metadata Processing Models and APIs document.