Is MediaStreamTrackProcessor for audio necessary? #29

youennf · 2021-04-30T07:09:23Z

Extracting this discussion from #4, since this was not really fully discussed there.
The use cases for MediaStreamTrackProcessor for audio are unclear given its functionality largely overlaps with what WebAudio can do, WebAudio being already largely deployed in all major browsers.

youennf · 2021-04-30T07:10:26Z

CC @padenot

guidou · 2021-05-01T07:01:17Z

The fact that there is overlap does not mean that we should not support it. After all, for video there is overlap with existing features as well. Also, while there is overlap, the MediaStreamTrackProcessor model is quite different from the AudioWorklet model.
The question is if the MediaStreamTrackProcessor model is a better fit in some cases. I'll reach out to audio developers to get more feedback, but some things that have been mentioned are:

access to the original timestamps of the audio source
better WebCodecs integration
there are use cases that do not fit naturally with the clock-based synchronous processing model of AudioWorklet (e.g., applications with high CPU requirements but without strong latency requirements). The MediaStreamTrackProcessor model might be a better match in these cases.

aboba · 2021-05-01T16:59:35Z

@guidou Quite a few of the WebCodecs Origin Trial participants are using it primarily for audio. Among game developers using WebCodecs for both audio and video, symmetry is an important aspect (e.g. using WebCodecs decode as an MSE substitute).

guidou · 2021-05-01T17:07:54Z

I fully agree that symmetry is an important benefit too for developers.

dogben · 2021-05-03T14:06:15Z

One use case to consider is https://ai.googleblog.com/2018/04/looking-to-listen-audio-visual-speech.html

alvestrand · 2021-05-03T14:12:28Z

@youennf are you referring to https://developer.mozilla.org/en-US/docs/Web/API/AudioWorklet (not yet in Safari) or to https://developer.mozilla.org/en-US/docs/Web/API/ScriptProcessorNode (available in all browsers, but deprecated)?

youennf · 2021-05-03T14:16:15Z

are you referring to https://developer.mozilla.org/en-US/docs/Web/API/AudioWorklet (not yet in Safari) or to

I am referring to AudioWorklet, which is available in Safari.

padenot · 2021-05-03T14:24:06Z

The PR to adjust the compatibility data has just been merged and it appears MDN is slightly out of date: https://github.com/mdn/browser-compat-data/pull/10129/files#r621975812

youennf · 2021-05-04T06:26:07Z

One use case to consider is https://ai.googleblog.com/2018/04/looking-to-listen-audio-visual-speech.html

This is indeed a good use case. It seems covered AFAIK by getUserMedia+MediaStreamAudioSourceNode+AudioWorklet.

Quite a few of the WebCodecs Origin Trial participants are using it primarily for audio.

Can you clarify which WebCodec original trial API they are primarily using for audio? Is it MediaStreamTrackProcessor?

youennf · 2021-05-04T06:28:13Z

I'll reach out to audio developers to get more feedback, but some things that have been mentioned are:

Thanks @guidou, this is helpful to identify the shortcomings of AudioWorklet.
Based on that, we should indeed either improve WebAudio support (including API) or envision alternatives.

What was asked in the past is a pros and cons of AudioWorklet vs. audio MediaStreamTrackProcessor.
So far, it seems that MediaStreamTrackProcessor could be shimed by AudioWorklet.

youennf · 2021-05-04T06:35:13Z

I fully agree that symmetry is an important benefit too for developers.

WebAudio API is very different from rendering API like Canvas/OffscreenCanvas and for good reasons: it was decided to solve a specific a problem in the best possible way.

By trying to build a single API for both audio and video, we miss the opportunity to build the best API dedicated for video.
Symmetry is not always a good friend.

youennf · 2021-05-04T06:45:24Z

There are some known advantages of using AudioWorklet over MediaStreamTrackProcessor.
With AudioWorklet, an application is able to implement its own buffering strategy and the best way to present data for processing.

For instance, an application might want to start with processing 10 ms chunks and will want to buffer 5 chunks of these.
At some point though, to cope with networking, the application will switch to 50 ms chunks and will increase buffering to 5 chunks of 50ms.

This is not easily doable with MediaStreamTrackProcessor: maxBufferSize is fixed at construction time and audio frames size is not in the application control.

dogben · 2021-05-04T11:00:25Z

One use case to consider is https://ai.googleblog.com/2018/04/looking-to-listen-audio-visual-speech.html

This is indeed a good use case. It seems covered AFAIK by getUserMedia+MediaStreamAudioSourceNode+AudioWorklet.

Apologies if I'm missing something obvious, but it doesn't seem possible to process both the audio and video inputs in an AudioWorklet. Nor does it seem possible for the audio data to be obtained outside of the AudioWorklet so that the audio and video can be processed together in a regular worker.

youennf · 2021-05-04T11:12:26Z

We can share the audio data to a regular worker through SharedArrayBuffer if possible, postMessage otherwise. I was referring to the audio part of the use case, I agree the video part deserves a better API than canvas.

guidou · 2021-06-18T10:28:45Z

I think the question of whether something is necessary is the wrong one to ask, since arguably, nothing is necessary.
For example, using getUserMedia+MediaStreamAudioSourceNode+AudioWorklet + some video processing API (such as MediaStreamTrackProcessor/MediaStreamTrackGenerator) in this context would be a lot more difficult than with having a symmetric API for audio and video.
For starters SharedArrayBuffer requires cross-origin isolation. The setup of MediaStreamAudioSourceNode+AudioWorklet on one hand and Video processing somewhere else using completely different APIs with different programming models adds even more friction.
Moreover, the unique advantages offered by AudioWorklet (e.g., real-time thread) do not apply to this specific use case.

I think this shows that there is real value in adding an audio version of the same API used for video.
Keeping the bug open to continue the discussion.

alvestrand · 2021-06-21T05:31:32Z

It's possible to add controls for sample size and bufffer size to MediaStreamTrackProcessor if that's a requested feature.
It isn't part of the minimal surface, but where to put the controls is obvious; raw audio data is easy to re-chunk.

chrisguttandin · 2022-09-05T06:57:00Z

It was mentioned in this thread that a MediaStreamTrackProcessor for audio is necessary to synchronize audio and video when using WebCodecs.

But in case I didn't miss anything it's probably still hard to accurately encode a MediaStream with WebCodecs even though there is MediaStreamTrackProcessor for audio.

I tried to record the MediaStream coming from the user's mic and camera in Chrome v105. It was obtained in the most simple way.

const mediaStream = await navigator.mediaDevices.getUserMedia({
    audio: true,
    video: true
})

I then used a MediaStreamTrackProcessor for each MediaStreamTrack to get the AudioData and VideoFrame respectively. However the timestamp of the video seems to start at 0 whereas the timestamp of the audio starts somewhere.

I think this is all fine according to the spec but it doesn't really help to synchronize the audio with the video. If I want to start the recording at a given point in time which VideoFrame and which AudioData are the first ones I should pass on to the encoder?

It would be nice to have a way of knowing the offset between the two timestamps. I think some API which says AudioData.timestamp === 62169.819898 and VideoFrame.timestamp === 0.566633 represent the same point in time would be really helpful.

Also I guess this all becomes very tricky when the recording is long enough for the two streams to drift apart.

guidou · 2024-01-30T08:58:11Z

FWIW, in Chrome, MSTP for audio is used 3X more than MSTP for video nowadays.

mehagar · 2024-03-07T15:28:14Z

At Zoom we're currently using MediaStreamTrackProcessor for video, and WebAudio for audio (very similar to this pattern: https://developer.chrome.com/blog/audio-worklet-design-pattern#webaudio_powerhouse_audio_worklet_and_sharedarraybuffer).

It works, but there's a lot of complexity that comes with WebAudio and SharedArrayBuffers, and handling the case when SharedArrayBuffer is not available. Having MediaStreamTrackProcessor for audio would certainly simplify things.

jan-ivar mentioned this issue May 18, 2021

A review #38

Closed

guidou mentioned this issue Jun 18, 2021

Is MediaStreamTrackGenerator for audio necessary? #31

Closed

alvestrand mentioned this issue Jan 10, 2023

Access to raw audio data w3c/webrtc-nv-use-cases#80

Open

bradisbell mentioned this issue Jul 6, 2024

Get raw data from MediaStream w3c/mediacapture-main#327

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is MediaStreamTrackProcessor for audio necessary? #29

Is MediaStreamTrackProcessor for audio necessary? #29

youennf commented Apr 30, 2021 •

edited

Loading

youennf commented Apr 30, 2021

guidou commented May 1, 2021

aboba commented May 1, 2021

guidou commented May 1, 2021

dogben commented May 3, 2021

alvestrand commented May 3, 2021

youennf commented May 3, 2021 •

edited

Loading

padenot commented May 3, 2021

youennf commented May 4, 2021

youennf commented May 4, 2021

youennf commented May 4, 2021

youennf commented May 4, 2021

dogben commented May 4, 2021

youennf commented May 4, 2021

guidou commented Jun 18, 2021

alvestrand commented Jun 21, 2021

chrisguttandin commented Sep 5, 2022

guidou commented Jan 30, 2024

mehagar commented Mar 7, 2024

Is MediaStreamTrackProcessor for audio necessary? #29

Is MediaStreamTrackProcessor for audio necessary? #29

Comments

youennf commented Apr 30, 2021 • edited Loading

youennf commented Apr 30, 2021

guidou commented May 1, 2021

aboba commented May 1, 2021

guidou commented May 1, 2021

dogben commented May 3, 2021

alvestrand commented May 3, 2021

youennf commented May 3, 2021 • edited Loading

padenot commented May 3, 2021

youennf commented May 4, 2021

youennf commented May 4, 2021

youennf commented May 4, 2021

youennf commented May 4, 2021

dogben commented May 4, 2021

youennf commented May 4, 2021

guidou commented Jun 18, 2021

alvestrand commented Jun 21, 2021

chrisguttandin commented Sep 5, 2022

guidou commented Jan 30, 2024

mehagar commented Mar 7, 2024

youennf commented Apr 30, 2021 •

edited

Loading

youennf commented May 3, 2021 •

edited

Loading