-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MediaStreamTrack voice activity detection support. #153
Conversation
This change adds support for the voice activity detection (VAD) feature for audio MediaStreamTracks. It is only enabled when voiceActivityDetection constraint is set to true. With voiceactivitydetected event, web applications are able to show notifications when the user is speaking but audio track is muted.
I wonder whether this should actually be at MediaStreamTrack level. Given this event would fire when the track is muted, the goal would be to unmute the track, which would be done in via MediaSession API. Moving this API to MediaSession makes some sense. Maybe all we need is a new MediaSession |
I'm wondering when this action should be triggered if voiceActivity is moved from MediaStreamTrack to MediaSession
1 and 2 may have privacy issue because users may not want applications to know their behavior before granting "microphone" permission. With current AudioWorklet approach, applications are able to know which track has voice activity. I personally believe applications only want to detect voice activity for microphone with MediaStreamTrack created and muted, but I'm not sure if any application applies vad for other audio tracks. |
The privacy story should be the same whatever the API shape. If we want to support multimicrophone cases, a deviceId could be exposed within MediaSessionActionDetails.
Agreed for the scope of this specific API. |
Moving this to media session makes sense to me as well. |
@steimelchrome FYI |
@jianjunz , would you be ok drafting a PR on MediaSession WG ? |
Since this is intended to help the user to unmute via the unmute button in the app, which would be done via MediaSession, it makes sense that this notification comes via MediaSession. |
I do not think there is any sense in moving this to MediaSession. There are far more use cases for voice activity detection beyond letting the user know that they may be muted. A couple use cases I would implement immediately if this API were available:
These use cases and others like them rely on the voice activity detection firing on the track. Besides, even if it were moved to MediaSession, choosing the right capture track to trigger on is not possible at the user agent level. It's not uncommon to have several capture tracks. The relevant captured track might even be "remote". (Think of cases where a local second device/screen/camera/mic is set up. Connected via WebRTC, but right there in the room.) Only the application truly knows what is what. |
This was discussed during the WebRTC WG meeting and we think there are two usecases which deserve two different solutions. The first use case is allowing to unmute when user is talking while muted. This PR is about this specific issue and moving it to MediaSession seems good. The second use case, which you seem more interested, is exposing whether a live unmuted track contains voice. |
Sure, I'll create a PR on MediaSession WG. Thanks. |
Closing this one as it's moved to MediaSession spec pr333. |
This change adds support for the voice activity detection (VAD) feature for audio MediaStreamTracks. It is only enabled when
voiceActivityDetection
constraint is set to true.With
voiceactivitydetected
event, web applications are able to show notifications when the user is speaking but audio track is muted.Fixes #145.
Preview | Diff