Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add voiceactivity action. #333

Merged
merged 7 commits into from
Jul 18, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,11 @@ platform UI or media keys, thereby improving the user experience.
the action's intent is to open the media session in a
picture-in-picture window.
</li>
<li>
<dfn enum-value for=MediaSessionAction>voiceactivity</dfn>:
the action's intent is to notify the action handler that a voice
activity is started.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the following be a bit clearer?

Suggested change
the action's intent is to notify the action handler that a voice
activity is started.
the action's intent is to notify the web page that voice
activity has been detected by the microphone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note to make it clearer.

</li>
</ul>
</p>

Expand Down Expand Up @@ -541,6 +546,17 @@ platform UI or media keys, thereby improving the user experience.
{{MediaSessionActionHandler}} before running, as different tasks, the
steps defined to [$set a track's muted state$].
</p>
<p>
A user agent MUST invoke {{MediaSessionActionHandler}} for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A user agent MUST invoke {{MediaSessionActionHandler}} for
A user agent MUST invoke the {{MediaSessionActionHandler}} for

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

{{MediaSessionAction/voiceactivity}} only when the voice activity is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{{MediaSessionAction/voiceactivity}} only when the voice activity is
{{MediaSessionAction/voiceactivity}} only when voice activity is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

detected from a source with one or more live {{MediaStreamTrack}}s. A user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to state that this is restricted to microphone tracks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed "source" to "microphone".

agent MAY ignore a {{MediaSessionAction/voiceactivity}} action if all
{{MediaStreamTrack}}s associated with the source are not
{{MediaStreamTrack/muted}}. It is RECOMMENDED for user agents to set a
minimal interval for invoking {{MediaSessionActionHandler}} for
Copy link
Member

@chrisn chrisn Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest clarifying "minimal interval" - is this a time delay after voice activity is detected before the action handler is invoked? And how long does voice activity need to be present for? And does voice activity that comes and goes cause multiple invocations? (Not suggesting we spec these things, just clarify what we're recommending user agents to consider)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest clarifying "minimal interval" - is this a time delay after voice activity is detected before the action handler is invoked?

It's intended to be a minimal interval between two voiceactivity actions (or events? event sounds to be more accurate here).

And how long does voice activity need to be present for? And does voice activity that comes and goes cause multiple invocations?

It actually depends on the voice activity detection (VAD) algorithm. Sometimes VAD algorithm may even consider background noise as a voice activity. Based on the use case (unmute microphone notification) we want to target, it is recommended not invoking this action handler too frequently.

A new note section is added for some background about this action.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
minimal interval for invoking {{MediaSessionActionHandler}} for
minimal interval between invocations of the {{MediaSessionActionHandler}} for

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

{{MediaSessionAction/voiceactivity}} based on privacy and power efficiency
policies.
</p>

<p class=note>
A page should only register a {{MediaSessionActionHandler}} for a <a>media
Expand Down Expand Up @@ -716,7 +732,8 @@ enum MediaSessionAction {
"hangup",
"previousslide",
"nextslide",
"enterpictureinpicture"
"enterpictureinpicture",
"voiceactivity"
};

callback MediaSessionActionHandler = undefined(MediaSessionActionDetails details);
Expand Down Expand Up @@ -1496,6 +1513,7 @@ parameter whose dictionary type is:
<li>{{MediaSessionActionDetails}} for {{MediaSessionAction/nextslide}}.</li>
<li>{{MediaSessionActionDetails}} for
{{MediaSessionAction/enterpictureinpicture}}.</li>
<li>{{MediaSessionActionDetails}} for {{MediaSessionAction/voiceactivity}}.</li>
</ul>

The <dfn dict-member for="MediaSessionActionDetails">action</dfn>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could an example that links voice activity with displaying a UI that can execute setMicrophoneActive(true)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Expand Down
Loading