diff --git a/index.bs b/index.bs index c140f86..6468441 100644 --- a/index.bs +++ b/index.bs @@ -412,6 +412,11 @@ platform UI or media keys, thereby improving the user experience. the action's intent is to open the media session in a picture-in-picture window. +
  • + voiceactivity: + the action's intent is to notify the web page that voice activity has + been detected by the microphone. +
  • @@ -541,6 +546,33 @@ platform UI or media keys, thereby improving the user experience. {{MediaSessionActionHandler}} before running, as different tasks, the steps defined to [$set a track's muted state$].

    +

    + The {{MediaSessionAction/voiceactivity}} action source MUST always have a + target whose document MUST always have {{MediaStreamTrackState/live}} + microphone {{MediaStreamTrack}}s. A user agent MUST invoke the + {{MediaSessionActionHandler}} for {{MediaSessionAction/voiceactivity}} + only when voice activity is detected from a microphone with one or more + {{MediaStreamTrackState/live}} {{MediaStreamTrack}}s. A user agent MAY + ignore voice activity if the microphone is not muted and all + {{MediaStreamTrack}}s associated with the microphone are + {{MediaStreamTrack/enabled}}. It is RECOMMENDED for user agents to set a + minimal interval between invocations of the {{MediaSessionActionHandler}} + for {{MediaSessionAction/voiceactivity}} based on privacy and power + efficiency policies. +

    + +

    + {{MediaSessionAction/voiceactivity}} only indicates the start of voice + activity. Applications may display a notification if the user is speaking + while the {{MediaStreamTrack}} is muted, or start an {{AudioWorklet}} for + audio processing. No action is defined for the end of voice activity. + Unlike other actions which are explicitly triggered by the user, + {{MediaSessionAction/voiceactivity}} also depends on the voice activity + detection algorithm of the user agent or the system. For privacy and power + efficiency concerns, the web page may not be notified if voice activity + ends and restarts soon after the last {{MediaSessionAction/voiceactivity}} + action. +

    A page should only register a {{MediaSessionActionHandler}} for a media @@ -716,7 +748,8 @@ enum MediaSessionAction { "hangup", "previousslide", "nextslide", - "enterpictureinpicture" + "enterpictureinpicture", + "voiceactivity" }; callback MediaSessionActionHandler = undefined(MediaSessionActionDetails details); @@ -1496,6 +1529,7 @@ parameter whose dictionary type is:

  • {{MediaSessionActionDetails}} for {{MediaSessionAction/nextslide}}.
  • {{MediaSessionActionDetails}} for {{MediaSessionAction/enterpictureinpicture}}.
  • +
  • {{MediaSessionActionDetails}} for {{MediaSessionAction/voiceactivity}}.
  • The action @@ -1807,6 +1841,21 @@ media session
    . +
    + Handling voice activity: +
    +    // Create a MediaStream with audio enabled.
    +    const stream = await navigator.mediaDevices.getUserMedia({audio:true});
    +    const track = stream.getAudioTracks()[0];
    +    navigator.mediaSession.setActionHandler("voiceactivity", function() {
    +      if (track.muted) {
    +        // Show unmute notification. If user allows to unmute, call
    +        // setMicrophoneActive(true) to unmute.
    +      }
    +    });
    +  
    +
    +

    Acknowledgments

    The editors would like to thank Paul Adenot, Jake Archibald, Tab Atkins,