-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is MediaStreamTrackProcessor for audio necessary? #29
Comments
CC @padenot |
The fact that there is overlap does not mean that we should not support it. After all, for video there is overlap with existing features as well. Also, while there is overlap, the MediaStreamTrackProcessor model is quite different from the AudioWorklet model.
|
@guidou Quite a few of the WebCodecs Origin Trial participants are using it primarily for audio. Among game developers using WebCodecs for both audio and video, symmetry is an important aspect (e.g. using WebCodecs decode as an MSE substitute). |
I fully agree that symmetry is an important benefit too for developers. |
One use case to consider is https://ai.googleblog.com/2018/04/looking-to-listen-audio-visual-speech.html |
@youennf are you referring to https://developer.mozilla.org/en-US/docs/Web/API/AudioWorklet (not yet in Safari) or to https://developer.mozilla.org/en-US/docs/Web/API/ScriptProcessorNode (available in all browsers, but deprecated)? |
I am referring to AudioWorklet, which is available in Safari. |
The PR to adjust the compatibility data has just been merged and it appears MDN is slightly out of date: https://github.com/mdn/browser-compat-data/pull/10129/files#r621975812 |
This is indeed a good use case. It seems covered AFAIK by getUserMedia+MediaStreamAudioSourceNode+AudioWorklet.
Can you clarify which WebCodec original trial API they are primarily using for audio? Is it MediaStreamTrackProcessor? |
Thanks @guidou, this is helpful to identify the shortcomings of AudioWorklet. What was asked in the past is a pros and cons of AudioWorklet vs. audio MediaStreamTrackProcessor. |
WebAudio API is very different from rendering API like Canvas/OffscreenCanvas and for good reasons: it was decided to solve a specific a problem in the best possible way. By trying to build a single API for both audio and video, we miss the opportunity to build the best API dedicated for video. |
There are some known advantages of using AudioWorklet over MediaStreamTrackProcessor. For instance, an application might want to start with processing 10 ms chunks and will want to buffer 5 chunks of these. This is not easily doable with MediaStreamTrackProcessor: maxBufferSize is fixed at construction time and audio frames size is not in the application control. |
Apologies if I'm missing something obvious, but it doesn't seem possible to process both the audio and video inputs in an AudioWorklet. Nor does it seem possible for the audio data to be obtained outside of the AudioWorklet so that the audio and video can be processed together in a regular worker. |
We can share the audio data to a regular worker through SharedArrayBuffer if possible, postMessage otherwise. I was referring to the audio part of the use case, I agree the video part deserves a better API than canvas. |
I think the question of whether something is necessary is the wrong one to ask, since arguably, nothing is necessary. I think this shows that there is real value in adding an audio version of the same API used for video. |
It's possible to add controls for sample size and bufffer size to MediaStreamTrackProcessor if that's a requested feature. |
It was mentioned in this thread that a But in case I didn't miss anything it's probably still hard to accurately encode a I tried to record the const mediaStream = await navigator.mediaDevices.getUserMedia({
audio: true,
video: true
}) I then used a I think this is all fine according to the spec but it doesn't really help to synchronize the audio with the video. If I want to start the recording at a given point in time which It would be nice to have a way of knowing the offset between the two timestamps. I think some API which says Also I guess this all becomes very tricky when the recording is long enough for the two streams to drift apart. |
FWIW, in Chrome, MSTP for audio is used 3X more than MSTP for video nowadays. |
At Zoom we're currently using MediaStreamTrackProcessor for video, and WebAudio for audio (very similar to this pattern: https://developer.chrome.com/blog/audio-worklet-design-pattern#webaudio_powerhouse_audio_worklet_and_sharedarraybuffer). It works, but there's a lot of complexity that comes with WebAudio and SharedArrayBuffers, and handling the case when SharedArrayBuffer is not available. Having MediaStreamTrackProcessor for audio would certainly simplify things. |
Extracting this discussion from #4, since this was not really fully discussed there.
The use cases for MediaStreamTrackProcessor for audio are unclear given its functionality largely overlaps with what WebAudio can do, WebAudio being already largely deployed in all major browsers.
The text was updated successfully, but these errors were encountered: