Consider adding onVoiceActivity event on MediaStreamTrack for audio #145

taste1981 · 2024-05-30T03:03:51Z

Browsers may apply voice activity detection(VAD) for AudioTrack as a prelude step for subsequent audio processings(noise suppression, echo cancellation, etc).

Today without this capability exposed through MediaStreamTrack, if a video conferencing application wants to detect voice activity when user "mutes" their microphone from browser UI, application typically sets up a WebAudio worklet to perform ASR, and then provide hints to user to unmute in order to be heard by others.

This is a double VAD, and introduces un-necessary overhead since the audio worklet is invoked at a high frequency(typically every 10ms) and relatively high priority.

Could we consider add capability for user to query VAD capability as read-only , and listens to onVoiceActivity event if enabled?

This is similar to wire WebSpeech API to MediaStreamTrack, however the purpose here is not to perform speech recognition(STT), but just VAD.

youennf · 2024-05-30T08:47:33Z

A few thoughts.

We are making progress on actual capture mute (and not the enabled=false alternative).
Being able to let know the web application that the user is talking even if the capture track is muted is a nice addition to the platform.
An event seems good to me here.
Knowing the audio level and or whether there is voice activity might be a good idea.
This might be for instance used to properly fill things like the audio level/voice activity RTP header extension.

Maybe this could be exposed as part of MediaStreamTrackAudioStats.

Should capabilities/settings/constraints be used to let the web app notify to the UA its interest of getting that data?

taste1981 · 2024-05-30T15:09:22Z

Should capabilities/settings/constraints be used to let the web app notify to the UA its interest of getting that data?

yes. VAD should be opted-in by web app before UA starts to emit the event.

dontcallmedom-bot · 2024-06-19T06:31:17Z

This issue had an associated resolution in WebRTC June 18 2024 meeting – 18 June 2024 (Issue #145 Consider adding onVoiceActivity event on MediaStreamTrack for audio):

RESOLUTION: proceed with a pull request for the 1st use case, and open a separate issue for optimizing audio processing

jianjunz mentioned this issue Jun 20, 2024

Add MediaStreamTrack voice activity detection support. #153

Closed

jianjunz mentioned this issue Jun 28, 2024

Add voiceactivity action. w3c/mediasession#333

Merged

dontcallmedom linked a pull request Jun 28, 2024 that will close this issue

Add voiceactivity action. w3c/mediasession#333

Merged

youennf closed this as completed in w3c/mediasession#333 Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding onVoiceActivity event on MediaStreamTrack for audio #145

Consider adding onVoiceActivity event on MediaStreamTrack for audio #145

taste1981 commented May 30, 2024

youennf commented May 30, 2024

taste1981 commented May 30, 2024

dontcallmedom-bot commented Jun 19, 2024

Consider adding onVoiceActivity event on MediaStreamTrack for audio #145

Consider adding onVoiceActivity event on MediaStreamTrack for audio #145

Comments

taste1981 commented May 30, 2024

youennf commented May 30, 2024

taste1981 commented May 30, 2024

dontcallmedom-bot commented Jun 19, 2024