Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding onVoiceActivity event on MediaStreamTrack for audio #145

Closed
taste1981 opened this issue May 30, 2024 · 3 comments · Fixed by w3c/mediasession#333
Closed

Comments

@taste1981
Copy link

Browsers may apply voice activity detection(VAD) for AudioTrack as a prelude step for subsequent audio processings(noise suppression, echo cancellation, etc).

Today without this capability exposed through MediaStreamTrack, if a video conferencing application wants to detect voice activity when user "mutes" their microphone from browser UI, application typically sets up a WebAudio worklet to perform ASR, and then provide hints to user to unmute in order to be heard by others.

This is a double VAD, and introduces un-necessary overhead since the audio worklet is invoked at a high frequency(typically every 10ms) and relatively high priority.

Could we consider add capability for user to query VAD capability as read-only , and listens to onVoiceActivity event if enabled?

This is similar to wire WebSpeech API to MediaStreamTrack, however the purpose here is not to perform speech recognition(STT), but just VAD.

@youennf
Copy link
Contributor

youennf commented May 30, 2024

A few thoughts.

  1. We are making progress on actual capture mute (and not the enabled=false alternative).
    Being able to let know the web application that the user is talking even if the capture track is muted is a nice addition to the platform.
    An event seems good to me here.

  2. Knowing the audio level and or whether there is voice activity might be a good idea.
    This might be for instance used to properly fill things like the audio level/voice activity RTP header extension.

Maybe this could be exposed as part of MediaStreamTrackAudioStats.

  1. Should capabilities/settings/constraints be used to let the web app notify to the UA its interest of getting that data?

@taste1981
Copy link
Author

Should capabilities/settings/constraints be used to let the web app notify to the UA its interest of getting that data?

yes. VAD should be opted-in by web app before UA starts to emit the event.

@dontcallmedom-bot
Copy link

This issue had an associated resolution in WebRTC June 18 2024 meeting – 18 June 2024 (Issue #145 Consider adding onVoiceActivity event on MediaStreamTrack for audio):

RESOLUTION: proceed with a pull request for the 1st use case, and open a separate issue for optimizing audio processing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants