Skip to content

Commit

Permalink
Add voiceactivity action. (w3c#333)
Browse files Browse the repository at this point in the history
* Add voiceactivity action.

This change adds support for the voice activity detection (VAD) feature
for microphones. It allows application to show a notification when user
is speaking but MediaStreamTrack is muted.

* Add an example and restrict audio source to be micrphone.

* Call setMicrophoneActive(true) if user allows to unmute.

* Add a note for voice activity explanation.

* Address comments.

* Update the condition when UA may ignore voiceactivity.

MediaStreamTrack.muted is a readonly attribute. Replace it with enabled,
which can be set by the application.

* Add some wording for live microphone tracks and some minor fixes.
  • Loading branch information
jianjunz authored Jul 18, 2024
1 parent 16336bf commit 0f6e693
Showing 1 changed file with 50 additions and 1 deletion.
51 changes: 50 additions & 1 deletion index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,11 @@ platform UI or media keys, thereby improving the user experience.
the action's intent is to open the media session in a
picture-in-picture window.
</li>
<li>
<dfn enum-value for=MediaSessionAction>voiceactivity</dfn>:
the action's intent is to notify the web page that voice activity has
been detected by the microphone.
</li>
</ul>
</p>

Expand Down Expand Up @@ -541,6 +546,33 @@ platform UI or media keys, thereby improving the user experience.
{{MediaSessionActionHandler}} before running, as different tasks, the
steps defined to [$set a track's muted state$].
</p>
<p>
The {{MediaSessionAction/voiceactivity}} action source MUST always have a
target whose document MUST always have {{MediaStreamTrackState/live}}
microphone {{MediaStreamTrack}}s. A user agent MUST invoke the
{{MediaSessionActionHandler}} for {{MediaSessionAction/voiceactivity}}
only when voice activity is detected from a microphone with one or more
{{MediaStreamTrackState/live}} {{MediaStreamTrack}}s. A user agent MAY
ignore voice activity if the microphone is not muted and all
{{MediaStreamTrack}}s associated with the microphone are
{{MediaStreamTrack/enabled}}. It is RECOMMENDED for user agents to set a
minimal interval between invocations of the {{MediaSessionActionHandler}}
for {{MediaSessionAction/voiceactivity}} based on privacy and power
efficiency policies.
</p>

<p class=note>
{{MediaSessionAction/voiceactivity}} only indicates the start of voice
activity. Applications may display a notification if the user is speaking
while the {{MediaStreamTrack}} is muted, or start an {{AudioWorklet}} for
audio processing. No action is defined for the end of voice activity.
Unlike other actions which are explicitly triggered by the user,
{{MediaSessionAction/voiceactivity}} also depends on the voice activity
detection algorithm of the user agent or the system. For privacy and power
efficiency concerns, the web page may not be notified if voice activity
ends and restarts soon after the last {{MediaSessionAction/voiceactivity}}
action.
</p>

<p class=note>
A page should only register a {{MediaSessionActionHandler}} for a <a>media
Expand Down Expand Up @@ -716,7 +748,8 @@ enum MediaSessionAction {
"hangup",
"previousslide",
"nextslide",
"enterpictureinpicture"
"enterpictureinpicture",
"voiceactivity"
};

callback MediaSessionActionHandler = undefined(MediaSessionActionDetails details);
Expand Down Expand Up @@ -1496,6 +1529,7 @@ parameter whose dictionary type is:
<li>{{MediaSessionActionDetails}} for {{MediaSessionAction/nextslide}}.</li>
<li>{{MediaSessionActionDetails}} for
{{MediaSessionAction/enterpictureinpicture}}.</li>
<li>{{MediaSessionActionDetails}} for {{MediaSessionAction/voiceactivity}}.</li>
</ul>

The <dfn dict-member for="MediaSessionActionDetails">action</dfn>
Expand Down Expand Up @@ -1807,6 +1841,21 @@ media session</a>.
</pre>
</div>

<div class="example" id="example-enterpictureinpicture">
Handling voice activity:
<pre class="lang-javascript">
// Create a MediaStream with audio enabled.
const stream = await navigator.mediaDevices.getUserMedia({audio:true});
const track = stream.getAudioTracks()[0];
navigator.mediaSession.setActionHandler("voiceactivity", function() {
if (track.muted) {
// Show unmute notification. If user allows to unmute, call
// setMicrophoneActive(true) to unmute.
}
});
</pre>
</div>

<h2 id="acknowledgments" class="no-num">Acknowledgments</h2>

The editors would like to thank Paul Adenot, Jake Archibald, Tab Atkins,
Expand Down

0 comments on commit 0f6e693

Please sign in to comment.