From 298c998973d6708d983105816f80188edd9dd94c Mon Sep 17 00:00:00 2001 From: Jianjun Zhu Date: Fri, 28 Jun 2024 10:45:15 +0800 Subject: [PATCH 1/7] Add voiceactivity action. This change adds support for the voice activity detection (VAD) feature for microphones. It allows application to show a notification when user is speaking but MediaStreamTrack is muted. --- index.bs | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/index.bs b/index.bs index c140f86..975f22f 100644 --- a/index.bs +++ b/index.bs @@ -412,6 +412,11 @@ platform UI or media keys, thereby improving the user experience. the action's intent is to open the media session in a picture-in-picture window. +
  • + voiceactivity: + the action's intent is to notify the action handler that a voice + activity is started. +
  • @@ -541,6 +546,17 @@ platform UI or media keys, thereby improving the user experience. {{MediaSessionActionHandler}} before running, as different tasks, the steps defined to [$set a track's muted state$].

    +

    + A user agent MUST invoke {{MediaSessionActionHandler}} for + {{MediaSessionAction/voiceactivity}} only when the voice activity is + detected from a source with one or more live {{MediaStreamTrack}}s. A user + agent MAY ignore a {{MediaSessionAction/voiceactivity}} action if all + {{MediaStreamTrack}}s associated with the source are not + {{MediaStreamTrack/muted}}. It is RECOMMENDED for user agents to set a + minimal interval for invoking {{MediaSessionActionHandler}} for + {{MediaSessionAction/voiceactivity}} based on privacy and power efficiency + policies. +

    A page should only register a {{MediaSessionActionHandler}} for a media @@ -716,7 +732,8 @@ enum MediaSessionAction { "hangup", "previousslide", "nextslide", - "enterpictureinpicture" + "enterpictureinpicture", + "voiceactivity" }; callback MediaSessionActionHandler = undefined(MediaSessionActionDetails details); @@ -1496,6 +1513,7 @@ parameter whose dictionary type is:

  • {{MediaSessionActionDetails}} for {{MediaSessionAction/nextslide}}.
  • {{MediaSessionActionDetails}} for {{MediaSessionAction/enterpictureinpicture}}.
  • +
  • {{MediaSessionActionDetails}} for {{MediaSessionAction/voiceactivity}}.
  • The action From 35afdfa7e63082c5e729c4773550601d68ee8649 Mon Sep 17 00:00:00 2001 From: Jianjun Zhu Date: Fri, 28 Jun 2024 16:51:07 +0800 Subject: [PATCH 2/7] Add an example and restrict audio source to be micrphone. --- index.bs | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/index.bs b/index.bs index 975f22f..cae290a 100644 --- a/index.bs +++ b/index.bs @@ -549,8 +549,8 @@ platform UI or media keys, thereby improving the user experience.

    A user agent MUST invoke {{MediaSessionActionHandler}} for {{MediaSessionAction/voiceactivity}} only when the voice activity is - detected from a source with one or more live {{MediaStreamTrack}}s. A user - agent MAY ignore a {{MediaSessionAction/voiceactivity}} action if all + detected from a microphone with one or more live {{MediaStreamTrack}}s. A + user agent MAY ignore a {{MediaSessionAction/voiceactivity}} action if all {{MediaStreamTrack}}s associated with the source are not {{MediaStreamTrack/muted}}. It is RECOMMENDED for user agents to set a minimal interval for invoking {{MediaSessionActionHandler}} for @@ -1825,6 +1825,20 @@ media session. +

    + Handling voice activity: +
    +    // Create a MediaStream with audio enabled.
    +    const stream = await navigator.mediaDevices.getUserMedia({audio:true});
    +    const track = stream.getAudioTracks()[0];
    +    navigator.mediaSession.setActionHandler("voiceactivity", function() {
    +      if (track.muted) {
    +        // Show unmute notification.
    +      }
    +    });
    +  
    +
    +

    Acknowledgments

    The editors would like to thank Paul Adenot, Jake Archibald, Tab Atkins, From 2bb28431e209bcd39f9b3172a04d3888ecdc0bc8 Mon Sep 17 00:00:00 2001 From: Jianjun Zhu Date: Fri, 28 Jun 2024 16:54:18 +0800 Subject: [PATCH 3/7] Call setMicrophoneActive(true) if user allows to unmute. --- index.bs | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/index.bs b/index.bs index cae290a..bedd997 100644 --- a/index.bs +++ b/index.bs @@ -1833,7 +1833,8 @@ media session. const track = stream.getAudioTracks()[0]; navigator.mediaSession.setActionHandler("voiceactivity", function() { if (track.muted) { - // Show unmute notification. + // Show unmute notification. If user allows to unmute, call + // setMicrophoneActive(true) to unmute. } }); From 7240fbeda25af65e613137db12d9935f4c1c110d Mon Sep 17 00:00:00 2001 From: Jianjun Zhu Date: Tue, 2 Jul 2024 13:59:52 +0800 Subject: [PATCH 4/7] Add a note for voice activity explanation. --- index.bs | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/index.bs b/index.bs index bedd997..d75727c 100644 --- a/index.bs +++ b/index.bs @@ -414,8 +414,8 @@ platform UI or media keys, thereby improving the user experience.
  • voiceactivity: - the action's intent is to notify the action handler that a voice - activity is started. + the action's intent is to notify the web page that a voice activity + has been detected by the microphone.
  • @@ -547,10 +547,10 @@ platform UI or media keys, thereby improving the user experience. steps defined to [$set a track's muted state$].

    - A user agent MUST invoke {{MediaSessionActionHandler}} for - {{MediaSessionAction/voiceactivity}} only when the voice activity is - detected from a microphone with one or more live {{MediaStreamTrack}}s. A - user agent MAY ignore a {{MediaSessionAction/voiceactivity}} action if all + A user agent MUST invoke the {{MediaSessionActionHandler}} for + {{MediaSessionAction/voiceactivity}} only when voice activity is detected + from a microphone with one or more live {{MediaStreamTrack}}s. A user + agent MAY ignore a {{MediaSessionAction/voiceactivity}} action if all {{MediaStreamTrack}}s associated with the source are not {{MediaStreamTrack/muted}}. It is RECOMMENDED for user agents to set a minimal interval for invoking {{MediaSessionActionHandler}} for @@ -558,6 +558,19 @@ platform UI or media keys, thereby improving the user experience. policies.

    +

    + {{MediaSessionAction/voiceactivity}} only indicates the start of a voice + activity. Application may display a notification if the user is speaking + while the {{MediaStreamTrack}} is muted, or start an {{AudioWorklet}} for + audio processing. No action is defined for the end of a voice activity. + Unlike other actions which are explicitely triggered by the user, + {{MediaSessionAction/voiceactivity}} also depends on the voice activity + detection algorithm of the user agent or the system. For privacy and power + efficiency concern, web page may not be notified if the second voice + activity started soon after last {{MediaSessionAction/voiceactivity}} + action. +

    +

    A page should only register a {{MediaSessionActionHandler}} for a media session action when it can handle the action given that the user agent From 6b22a34e75a3dfe2fa37a48ae288ccb92e51aad1 Mon Sep 17 00:00:00 2001 From: Jianjun Zhu Date: Fri, 12 Jul 2024 10:54:12 +0800 Subject: [PATCH 5/7] Address comments. --- index.bs | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/index.bs b/index.bs index d75727c..aaa994a 100644 --- a/index.bs +++ b/index.bs @@ -414,8 +414,8 @@ platform UI or media keys, thereby improving the user experience.

  • voiceactivity: - the action's intent is to notify the web page that a voice activity - has been detected by the microphone. + the action's intent is to notify the web page that voice activity has + been detected by the microphone.
  • @@ -553,21 +553,21 @@ platform UI or media keys, thereby improving the user experience. agent MAY ignore a {{MediaSessionAction/voiceactivity}} action if all {{MediaStreamTrack}}s associated with the source are not {{MediaStreamTrack/muted}}. It is RECOMMENDED for user agents to set a - minimal interval for invoking {{MediaSessionActionHandler}} for - {{MediaSessionAction/voiceactivity}} based on privacy and power efficiency - policies. + minimal interval between invocations of the {{MediaSessionActionHandler}} + for {{MediaSessionAction/voiceactivity}} based on privacy and power + efficiency policies.

    - {{MediaSessionAction/voiceactivity}} only indicates the start of a voice - activity. Application may display a notification if the user is speaking + {{MediaSessionAction/voiceactivity}} only indicates the start of voice + activity. Applications may display a notification if the user is speaking while the {{MediaStreamTrack}} is muted, or start an {{AudioWorklet}} for - audio processing. No action is defined for the end of a voice activity. - Unlike other actions which are explicitely triggered by the user, + audio processing. No action is defined for the end of voice activity. + Unlike other actions which are explicitly triggered by the user, {{MediaSessionAction/voiceactivity}} also depends on the voice activity detection algorithm of the user agent or the system. For privacy and power - efficiency concern, web page may not be notified if the second voice - activity started soon after last {{MediaSessionAction/voiceactivity}} + efficiency concerns, the web page may not be notified if voice activity + ends and restarts soon after the last {{MediaSessionAction/voiceactivity}} action.

    From f7ebd3daa613777accc22a2da03d509254fbd313 Mon Sep 17 00:00:00 2001 From: Jianjun Zhu Date: Fri, 12 Jul 2024 10:58:16 +0800 Subject: [PATCH 6/7] Update the condition when UA may ignore voiceactivity. MediaStreamTrack.muted is a readonly attribute. Replace it with enabled, which can be set by the application. --- index.bs | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/index.bs b/index.bs index aaa994a..878933b 100644 --- a/index.bs +++ b/index.bs @@ -550,12 +550,12 @@ platform UI or media keys, thereby improving the user experience. A user agent MUST invoke the {{MediaSessionActionHandler}} for {{MediaSessionAction/voiceactivity}} only when voice activity is detected from a microphone with one or more live {{MediaStreamTrack}}s. A user - agent MAY ignore a {{MediaSessionAction/voiceactivity}} action if all - {{MediaStreamTrack}}s associated with the source are not - {{MediaStreamTrack/muted}}. It is RECOMMENDED for user agents to set a - minimal interval between invocations of the {{MediaSessionActionHandler}} - for {{MediaSessionAction/voiceactivity}} based on privacy and power - efficiency policies. + agent MAY ignore a {{MediaSessionAction/voiceactivity}} action if + microphone is not muted and all {{MediaStreamTrack}}s associated with the + source are {{MediaStreamTrack/enabled}}. It is RECOMMENDED for user agents + to set a minimal interval between invocations of the + {{MediaSessionActionHandler}} for {{MediaSessionAction/voiceactivity}} + based on privacy and power efficiency policies.

    From be55bde86158db0fd1daf243aeeb4b2813be87a1 Mon Sep 17 00:00:00 2001 From: Jianjun Zhu Date: Mon, 15 Jul 2024 10:28:29 +0800 Subject: [PATCH 7/7] Add some wording for live microphone tracks and some minor fixes. --- index.bs | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/index.bs b/index.bs index 878933b..6468441 100644 --- a/index.bs +++ b/index.bs @@ -547,15 +547,18 @@ platform UI or media keys, thereby improving the user experience. steps defined to [$set a track's muted state$].

    - A user agent MUST invoke the {{MediaSessionActionHandler}} for - {{MediaSessionAction/voiceactivity}} only when voice activity is detected - from a microphone with one or more live {{MediaStreamTrack}}s. A user - agent MAY ignore a {{MediaSessionAction/voiceactivity}} action if - microphone is not muted and all {{MediaStreamTrack}}s associated with the - source are {{MediaStreamTrack/enabled}}. It is RECOMMENDED for user agents - to set a minimal interval between invocations of the + The {{MediaSessionAction/voiceactivity}} action source MUST always have a + target whose document MUST always have {{MediaStreamTrackState/live}} + microphone {{MediaStreamTrack}}s. A user agent MUST invoke the {{MediaSessionActionHandler}} for {{MediaSessionAction/voiceactivity}} - based on privacy and power efficiency policies. + only when voice activity is detected from a microphone with one or more + {{MediaStreamTrackState/live}} {{MediaStreamTrack}}s. A user agent MAY + ignore voice activity if the microphone is not muted and all + {{MediaStreamTrack}}s associated with the microphone are + {{MediaStreamTrack/enabled}}. It is RECOMMENDED for user agents to set a + minimal interval between invocations of the {{MediaSessionActionHandler}} + for {{MediaSessionAction/voiceactivity}} based on privacy and power + efficiency policies.