-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add voiceactivity action. #333
Changes from 4 commits
298c998
35afdfa
2bb2843
7240fbe
6b22a34
f7ebd3d
be55bde
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -412,6 +412,11 @@ platform UI or media keys, thereby improving the user experience. | |||||||||||||
the action's intent is to open the media session in a | ||||||||||||||
picture-in-picture window. | ||||||||||||||
</li> | ||||||||||||||
<li> | ||||||||||||||
<dfn enum-value for=MediaSessionAction>voiceactivity</dfn>: | ||||||||||||||
the action's intent is to notify the web page that a voice activity | ||||||||||||||
has been detected by the microphone. | ||||||||||||||
</li> | ||||||||||||||
</ul> | ||||||||||||||
</p> | ||||||||||||||
|
||||||||||||||
|
@@ -541,6 +546,30 @@ platform UI or media keys, thereby improving the user experience. | |||||||||||||
{{MediaSessionActionHandler}} before running, as different tasks, the | ||||||||||||||
steps defined to [$set a track's muted state$]. | ||||||||||||||
</p> | ||||||||||||||
<p> | ||||||||||||||
A user agent MUST invoke the {{MediaSessionActionHandler}} for | ||||||||||||||
{{MediaSessionAction/voiceactivity}} only when voice activity is detected | ||||||||||||||
from a microphone with one or more live {{MediaStreamTrack}}s. A user | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If "live" here is the same as {{MediaStreamTrackState/live}}, we could link them:
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||||||||||
agent MAY ignore a {{MediaSessionAction/voiceactivity}} action if all | ||||||||||||||
{{MediaStreamTrack}}s associated with the source are not | ||||||||||||||
{{MediaStreamTrack/muted}}. It is RECOMMENDED for user agents to set a | ||||||||||||||
minimal interval for invoking {{MediaSessionActionHandler}} for | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest clarifying "minimal interval" - is this a time delay after voice activity is detected before the action handler is invoked? And how long does voice activity need to be present for? And does voice activity that comes and goes cause multiple invocations? (Not suggesting we spec these things, just clarify what we're recommending user agents to consider) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's intended to be a minimal interval between two
It actually depends on the voice activity detection (VAD) algorithm. Sometimes VAD algorithm may even consider background noise as a voice activity. Based on the use case (unmute microphone notification) we want to target, it is recommended not invoking this action handler too frequently. A new note section is added for some background about this action. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||||||||||
{{MediaSessionAction/voiceactivity}} based on privacy and power efficiency | ||||||||||||||
policies. | ||||||||||||||
</p> | ||||||||||||||
|
||||||||||||||
<p class=note> | ||||||||||||||
{{MediaSessionAction/voiceactivity}} only indicates the start of a voice | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||||||||||
activity. Application may display a notification if the user is speaking | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||||||||||
while the {{MediaStreamTrack}} is muted, or start an {{AudioWorklet}} for | ||||||||||||||
audio processing. No action is defined for the end of a voice activity. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||||||||||
Unlike other actions which are explicitely triggered by the user, | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||||||||||
{{MediaSessionAction/voiceactivity}} also depends on the voice activity | ||||||||||||||
detection algorithm of the user agent or the system. For privacy and power | ||||||||||||||
efficiency concern, web page may not be notified if the second voice | ||||||||||||||
activity started soon after last {{MediaSessionAction/voiceactivity}} | ||||||||||||||
action. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||||||||||
</p> | ||||||||||||||
|
||||||||||||||
<p class=note> | ||||||||||||||
A page should only register a {{MediaSessionActionHandler}} for a <a>media | ||||||||||||||
|
@@ -716,7 +745,8 @@ enum MediaSessionAction { | |||||||||||||
"hangup", | ||||||||||||||
"previousslide", | ||||||||||||||
"nextslide", | ||||||||||||||
"enterpictureinpicture" | ||||||||||||||
"enterpictureinpicture", | ||||||||||||||
"voiceactivity" | ||||||||||||||
}; | ||||||||||||||
|
||||||||||||||
callback MediaSessionActionHandler = undefined(MediaSessionActionDetails details); | ||||||||||||||
|
@@ -1496,6 +1526,7 @@ parameter whose dictionary type is: | |||||||||||||
<li>{{MediaSessionActionDetails}} for {{MediaSessionAction/nextslide}}.</li> | ||||||||||||||
<li>{{MediaSessionActionDetails}} for | ||||||||||||||
{{MediaSessionAction/enterpictureinpicture}}.</li> | ||||||||||||||
<li>{{MediaSessionActionDetails}} for {{MediaSessionAction/voiceactivity}}.</li> | ||||||||||||||
</ul> | ||||||||||||||
|
||||||||||||||
The <dfn dict-member for="MediaSessionActionDetails">action</dfn> | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we could an example that links voice activity with displaying a UI that can execute setMicrophoneActive(true) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||||||||||
|
@@ -1807,6 +1838,21 @@ media session</a>. | |||||||||||||
</pre> | ||||||||||||||
</div> | ||||||||||||||
|
||||||||||||||
<div class="example" id="example-enterpictureinpicture"> | ||||||||||||||
Handling voice activity: | ||||||||||||||
<pre class="lang-javascript"> | ||||||||||||||
// Create a MediaStream with audio enabled. | ||||||||||||||
const stream = await navigator.mediaDevices.getUserMedia({audio:true}); | ||||||||||||||
const track = stream.getAudioTracks()[0]; | ||||||||||||||
navigator.mediaSession.setActionHandler("voiceactivity", function() { | ||||||||||||||
if (track.muted) { | ||||||||||||||
// Show unmute notification. If user allows to unmute, call | ||||||||||||||
// setMicrophoneActive(true) to unmute. | ||||||||||||||
} | ||||||||||||||
}); | ||||||||||||||
</pre> | ||||||||||||||
</div> | ||||||||||||||
|
||||||||||||||
<h2 id="acknowledgments" class="no-num">Acknowledgments</h2> | ||||||||||||||
|
||||||||||||||
The editors would like to thank Paul Adenot, Jake Archibald, Tab Atkins, | ||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.