index.bs

<pre class="metadata">
Title: Media Session
Repository: w3c/mediasession
Status: ED
ED: https://w3c.github.io/mediasession/
TR: https://www.w3.org/TR/mediasession/
Shortname: mediasession
Level: None
Editor: Tommy Steimel, w3cid 135774, Google Inc., steimel@google.com
Editor: Youenn Fablet, w3cid 96458, Apple Inc., youenn@apple.com
Former Editor: Mounir Lamouri, w3cid 45389, Google Inc., mlamouri@google.com
Former Editor: Becca Hughes, w3cid 103353, Google Inc., beccahughes@google.com
Former Editor: Zhiqiang Zhang, Google Inc., zqzhang@google.com
Former Editor: Rich Tibbett, Opera, richt@opera.com

Markup Shorthands: markdown yes
Group: mediawg
Logo: https://resources.whatwg.org/logo-mediasession.svg
Abstract: This specification enables web developers to show customized media
Abstract: metadata on platform UI, customize available platform media
Abstract: controls, and access platform media keys such as hardware keys found
Abstract: on keyboards, headsets, remote controls, and software keys found in
Abstract: notification areas and on lock screens of mobile devices.
!Version History: <a href="https://github.com/w3c/mediasession/commits">https://github.com/w3c/mediasession/commits</a>
Ignored Vars: context, media, session
</pre>

<style>
  /* https://github.com/tabatkins/bikeshed/issues/485 */
  .example .self-link { display: none; }
</style>

<style>
table {
  border-collapse: collapse;
  border-left-style: hidden;
  border-right-style: hidden;
  text-align: left;
}
table caption {
  font-weight: bold;
  padding: 3px;
  text-align: left;
}
table td, table th {
  border: 1px solid black;
  padding: 3px;
}
</style>

<pre class="link-defaults">
spec:html; type:element; text:link
spec:webidl; type:interface; text:object
</pre>

<pre class="anchors">
urlPrefix: https://html.spec.whatwg.org/multipage/; spec: HTML
    type: dfn
        urlPrefix: webappapis.html
            text: entry settings object
        urlPrefix: interaction.html
            text: activation notification
</pre>

<h2 id="introduction">Introduction</h2>

<em>This section is non-normative.</em>

Media is used extensively today, and the Web is one of the primary means of
consuming media content. Many platforms can display media metadata, such as
title, artist, album and album art on various UI elements such as notifications,
media control center, device lockscreen, and wearable devices. This
specification aims to enable web pages to specify the media metadata to be
displayed in platform UI, and respond to media controls that may come from
platform UI or media keys, thereby improving the user experience.

<section>
  <h2 id='security-privacy-considerations'>Security and Privacy
  Considerations</h2>

  <em>This section is non-normative.</em>

  <p>
    The API introduced in this specification has very low impact with regards to
    security and privacy. Part of the API allows a website to expose metadata
    that can be used by the user agent. The user agent obviously needs to use
    this data with care. Another part of the API allows a website to receive
    commands from the user via buttons or other form of controls which might
    sometimes introduce a new input layer between the user and the website.
  </p>

  <section>
    <h3 id='user-interface-guidelines'>User interface guidelines</h3>

    <p>
      The {{MediaMetadata}} introduced in this specification allows a website to
      offer more information with regards to what is being played. The user
      agent is meant to use this information in any UI related to media
      playback, either internal to the user agent or within the platform.
    </p>

    <p>
      The {{MediaMetadata}} are expected to be used in the context of media
      playback, making spoofing harder but because the {{MediaMetadata}} has
      text fields and image fields, a malicious website could try to spoof
      another website's identity. It is recommended that the user agent offers a
      way to find the origin or clearly expose the origin of the website which
      the metadata are coming from.
    </p>

    <p>
      If a user agent offers a mechanism to go back to a website from a UI
      element created based on the {{MediaMetadata}}, it is recommended that the
      action should not be noticeable by the website, thus reducing the chances
      of spoofing.
    </p>

    <p>
      In general, all security and privacy considerations related to the display
      of notifications from a website should apply here. It is worth noting that
      the {{MediaMetadata}} offer less customization than regular web
      notifications, thus would be harder to spoof.
    </p>
  </section>

  <section>
    <h3 id='incognito-mode-privacy'>Incognito mode</h3>

    <p>
      For privacy purposes, when in incognito mode, the user agent should be
      careful when sharing the information from {{MediaMetadata}} with the
      system and make sure they will not be used in a way that would harm the
      user. Displaying this information in a way that is very visible would be
      against the user's intent of browsing in incognito mode. When available,
      the UI elements should be advertized as private to the platform.
    </p>
  </section>

  <section>
    <h3 id='media-session-actions-privacy'>Media Session Actions</h3>

    <p>
      <a>Media session actions</a> expose a new input layer to the web platform.
      User agents should make sure users are aware that their actions might be
      routed to the website with the <a>active media session</a>. Especially,
      when the actions are coming from remote devices such as a headset or other
      remote device. It is recommended for the user agent to follow the platform
      conventions when listening to these inputs in order to facilitate the user
      understanding.
    </p>
  </section>
</section>

<section>
  <h2 id='model'>Model</h2>

  <section>
    <h3 id='playback-state-model'>Playback State</h3>

    <p>
      In order to make {{MediaSessionAction/play}} and
      {{MediaSessionAction/pause}} actions work properly, the user agent SHOULD
      be able to determine if a [=/browsing context=] of the <a>active media
      session</a> is playing media or not, which is called the <dfn>guessed
      playback state</dfn>. The RECOMMENDED way for determining the <a>guessed
      playback state</a> is to monitor the media elements whose node document's
      [=Document/browsing context=] is the [=/browsing context=]. The
      [=/browsing context=]'s <a>guessed playback state</a> is
      {{MediaSessionPlaybackState/"playing"}} if any of them is [=media
      element/potentially playing=] and not [=media element/muted=], and is
      {{MediaSessionPlaybackState/"paused"}} otherwise. Other information SHOULD
      also be considered, such as WebAudio and plugins.
    </p>

    <p>
      The {{MediaSession/playbackState}} attribute specifies the <a>declared
      playback state</a> from the [=/browsing context=]. The state is combined
      with the <a>guessed playback state</a> to compute the
      <dfn>actual playback state</dfn>, which is a finalized state and will be
      used for {{MediaSessionAction/play}} and {{MediaSessionAction/pause}}
      actions.
    </p>

    <p>
      The <a>actual playback state</a> is computed in the following way:
      <ul>
        <li>
          If the <a>declared playback state</a> is
          {{MediaSessionPlaybackState/playing}}, return
          {{MediaSessionPlaybackState/playing}}.
        </li>
        <li>
          Otherwise, return the <a>guessed playback state</a>.
        </li>
      </ul>
    </p>

    <p class=note>
      The {{MediaSession/playbackState}} attribute could be useful when the page
      wants to do some preparation steps when the media is paused but it allows
      the preparation steps to be interrupted by {{MediaSessionAction/pause}}
      action. See <a href="#example-set-playbackState">Setting playbackState</a>
      for example.
    </p>

    <p>
      When the <a>actual playback state</a> of the <a>active media session</a>
      changes, the user agent MUST run the <a>media session actions update
      algorithm</a>.
    </p>
  </section>

  <section>
    <h3 id="media-session-routing">Routing</h3>

    There could be multiple {{MediaSession}} objects existing at the same time
    since the user agent could have multiple tabs, each tab could contain a
    [=browsing context/top-level traversable=] and descendant [=/navigables=],
    and each [=/navigable=] could have a {{MediaSession}} object.

    The user agent MUST select at most one of the {{MediaSession}} objects to
    present to the user, which is called the <dfn>active media session</dfn>.
    The <a>active media session</a> may be null. The selection is up to the user
    agent and SHOULD be based on preferred user experience. Note that the
    {{MediaSession/playbackState}} attribute MUST not affect media session
    routing. It only takes effect for the <a>active media session</a>.

    It is RECOMMENDED that the user agent selects the <a>active media
    session</a> by managing <a>audio focus</a>. A tab or [=Window/browsing
    context=] is said to have <dfn>audio focus</dfn> if it is currently playing
    audio or the user expects to control the media in it. The AudioFocus API
    targets this area and could be used once it's finished.

    Whenever the <a>active media session</a> is changed, the user agent MUST run
    the <a>media session actions update algorithm</a> and the <a>update metadata
    algorithm</a>.
  </section>

  <section>
    <h3 id='metadata'>Metadata</h3>

    The media metadata for the <a>active media session</a> MAY be displayed in
    the platform UI depending on platform conventions. Whenever the <a>active
    media session</a> changes or setting {{MediaSession/metadata}} of the
    <a>active media session</a>, the user agent MUST run the <dfn>update
    metadata algorithm</dfn>. The steps are as follows:

    <ol>
      <li>
        If the <a>active media session</a> is null, unset the media metadata
        presented to the platform, and terminate these steps.
      </li>
      <li>
        If the {{MediaSession/metadata}} of the
        <a>active media session</a> is an <a>empty metadata</a>, unset the media
        metadata presented to the platform, and terminate these steps.
      </li>
      <li>
        Update the media metadata presented to the platform to match the
        {{MediaSession/metadata}} for the
        <a>active media session</a>.
      </li>
      <li>
        If the user agent wants to display an [=MediaMetadata/artwork image=],
        it is RECOMMENDED to run the <a>fetch image algorithm</a>.
      </li>
    </ol>

    The RECOMMENDED <dfn>fetch image algorithm</dfn> is as follows:

    <ol>
      <!-- XXX https://www.w3.org/Bugs/Public/show_bug.cgi?id=24055 -->
      <li>
        If there are other <a>fetch image algorithms</a> running, cancel
        existing algorithm execution instances.
      </li>
      <li>
        If <var>metadata</var>'s {{MediaMetadata/artwork}} of the <a>active
        media session</a> is empty, then terminate these steps.
      </li>
      <li>
        If the platform supports displaying media artwork, select a
        <dfn>preferred artwork image</dfn> from <var>metadata</var>'s
        {{MediaMetadata/artwork}} of the <a>active media session</a>.
      </li>
      <li>
        [=Fetch=] the <a>preferred artwork image</a>'s {{MediaImage/src}}.

        Then, <a>in parallel</a>:

        <ol>
          <li>
            Wait for the [=/response=].
          </li>
          <li>
            If the [=/response=]'s [=response/type=] is
            {{ResponseType/"default"}}, attempt to decode the resource as an
            image.
          </li>
          <li>
            If the image format is supported, use the image as the artwork for
            display in platform UI. Otherwise the <a>fetch image algorithm</a>
            fails and terminates.
          </li>
        </ol>
      </li>
    </ol>

    If no images are fetched in the <a>fetch image algorithm</a>, the user agent
    MAY have fallback behavior such as displaying a default image as artwork.
  </section>

  <section>
    <h3 id="actions-model">Actions</h3>

    <p>
      A <dfn>media session action</dfn> is an action that the page can handle in
      order for the user to interact with the {{MediaSession}}. For example, a
      page can handle some actions that will then be triggered when the user
      presses buttons from a headset or other remote device.
    </p>

    <p>
      A <dfn>media session action source</dfn> is a source that might produce a
      <a>media session action</a>. Such a source can be the platform or the UI
      surfaces created by the user agent.
    </p>
    <p>
      A <a>media session action source</a> has an optional
      <dfn for="media session action source">target</dfn> which should be the
      recipient of any <a>media session action</a> created by the
      <a>media session action source</a>. If a <a>media session action
      source</a>'s
      <a for="media session action source">target</a> is `null`, the <a>active
      media session</a> is the recipient of all
      <a>media session action source</a>'s actions.
    </p>

    <p>
      A <a>media session action</a> is represented by a {{MediaSessionAction}}
      which can have one of the following value:
      <ul>
        <li>
          <dfn enum-value for=MediaSessionAction>play</dfn>: the action's intent
          is to resume the playback.
        </li>
        <li>
          <dfn enum-value for=MediaSessionAction>pause</dfn>: the action's
          intent is to pause the currently active playback.
        </li>
        <li>
          <dfn enum-value for=MediaSessionAction>seekbackward</dfn>: the
          action's intent is to move the playback time backward by a short
          period (eg. a few seconds).
        </li>
        <li>
          <dfn enum-value for=MediaSessionAction>seekforward</dfn>: the action's
          intent is to move the playback time forward by a short period (eg. a
          few seconds).
        </li>
        <li>
          <dfn enum-value for=MediaSessionAction>previoustrack</dfn>: the
          action's intent is to either start the current playback from the
          beginning if the playback has a notion of beginning, or move to the
          previous item in the playlist if the playback has a notion of
          playlist.
        </li>
        <li>
          <dfn enum-value for=MediaSessionAction>nexttrack</dfn>: the action's
          intent is to move to the playback to the next item in the playlist if
          the playback has a notion of playlist.
        </li>
        <li>
          <dfn enum-value for=MediaSessionAction>skipad</dfn>: the action's
          intent is to skip the advertisement that is currently playing.
        </li>
        <li>
          <dfn enum-value for=MediaSessionAction>stop</dfn>: the action's intent
          is to stop the playback and clear the state if appropriate.
        </li>
        <li>
          <dfn enum-value for=MediaSessionAction>seekto</dfn>: the action's
          intent is to move the playback time to a specific time.
        </li>
        <li>
          <dfn enum-value for=MediaSessionAction>togglemicrophone</dfn>: the
          action's intent is to mute or unmute the user's microphone.
        </li>
        <li>
          <dfn enum-value for=MediaSessionAction>togglecamera</dfn>: the
          action's intent is to turn the user's active camera on or off.
        </li>
        <li>
          <dfn enum-value for=MediaSessionAction>togglescreenshare</dfn>: the
          action's intent is to turn the user's active screenshare on or off.
        </li>
        <li>
          <dfn enum-value for=MediaSessionAction>hangup</dfn>: the action's
          intent is to end a call.
        </li>
        <li>
          <dfn enum-value for=MediaSessionAction>previousslide</dfn>: the
          action's intent is to go back to the previous slide when presenting
          slides.
        </li>
        <li>
          <dfn enum-value for=MediaSessionAction>nextslide</dfn>: the action's
          intent is to go to the next slide when presenting slides.
        </li>
        <li>
          <dfn enum-value for=MediaSessionAction>enterpictureinpicture</dfn>:
          the action's intent is to open the media session in a
          picture-in-picture window.
        </li>
        <li>
          <dfn enum-value for=MediaSessionAction>voiceactivity</dfn>: the
          action's intent is to notify the web page that voice activity has been
          detected by the microphone.
        </li>
      </ul>
    </p>

    <p>
      All {{MediaSession}}s have a map of <dfn>supported media session
      actions</dfn> with, as a key, a <a>media session action</a> and as a value
      a {{MediaSessionActionHandler}}.
    </p>

    <p>
      When the <dfn>update action handler algorithm</dfn> on a given
      {{MediaSession}} with <var>action</var> and <var>handler</var> parameters
      is invoked, the user agent MUST run the following steps:
      <ol>
        <li>
          If <var>handler</var> is `null`, remove <var>action</var>
          from the <a>supported media session actions</a> for {{MediaSession}}
          and abort these steps.
        </li>
        <li>
          Add <var>action</var> to the <a>supported media session actions</a>
          for {{MediaSession}} and associate to it the <var>handler</var>.
        </li>
      </ol>
    </p>

    <p>
      When the <a>supported media session actions</a> are changed, the user
      agent SHOULD run the <a>media session actions update algorithm</a>. The
      user agent MAY <a>queue a task</a> in order to run the <a>media session
      actions update algorithm</a> in order to avoid UI flickering when multiple
      actions are modified in the same event loop.
    </p>

    <p>
      When the user agent is notified by a <a>media session action source</a>
      named <var>source</var> that a
      <a>media session action</a> named <var>action</var> has been triggered,
      the user agent MUST <a>queue a task</a>, using the [=user interaction task
      source=], to run the following
      <dfn>handle media session action</dfn> steps:
      <ol>
        <li>
          Let <var>session</var> be <var>source</var>'s <a for="media session
          action source">target</a>.
        </li>
        <li>
          If <var>session</var> is `null`, set <var>session</var> to the
          <a>active media session</a>.
        </li>
        <li>
          If <var>session</var> is `null`, abort these steps.
        </li>
        <li>
          Let <var>actions</var> be <var>session</var>'s
          <a>supported media session actions</a>.
        </li>
        <li>
          If <var>actions</var> does not contain the key <var>action</var>,
          abort these steps.
        </li>
        <li>
          Let <var>handler</var> be the {{MediaSessionActionHandler}} associated
          with the key <var>action</var> in <var>actions</var>.
        </li>
        <li>
          Run <var>handler</var> with the <var>details</var> parameter set to:
          {{MediaSessionActionDetails}}.
        </li>
        <li>
          Run the <a>activation notification</a> steps in the [=/browsing
          context=] associated with <var>session</var>.
        </li>
      </ol>
    </p>

    <p>
      When the user agent receives a joint command for <a enum-value
      for=MediaSessionAction>play</a> and <a enum-value
      for=MediaSessionAction>pause</a>, such as a headset button click, it MUST
      <a>queue a task</a>, using the [=user interaction task source=], to run
      the following steps:
      <ol>
        <li>
          If the <a>active media session</a> is `null`, abort these steps.
        </li>
        <li>
          Let <var>action</var> be a <a>media session action</a>.
        </li>
        <li>
          If the <a>actual playback state</a> of the <a>active media session</a>
          is <a enum-value for="MediaSessionPlaybackState">playing</a>, set
          <var>action</var> to <a enum-value for=MediaSessionAction>pause</a>.
        </li>
        <li>
          Otherwise, set <var>action</var> to <a enum-value
          for=MediaSessionAction>play</a>.
        </li>
        <li>
          Run the <a>handle media session action</a> steps with
          <var>action</var>.
        </li>
      </ol>
    </p>

    <p>
      It is RECOMMENDED for user agents to implement a default handler for the
      <a enum-value for=MediaSessionAction>play</a> and <a enum-value
      for=MediaSessionAction>pause</a> <a>media session actions</a> if none was
      provided for the <a>active media session</a>.
    </p>

    <p>
      A user agent MAY implement a default handler for the <a enum-value
      for=MediaSessionAction>togglemicrophone</a>, <a enum-value
      for=MediaSessionAction>togglecamera</a>, or <a enum-value
      for=MediaSessionAction>togglescreenshare</a>, or <a enum-value
      for=MediaSessionAction>hangup</a> <a>media session actions</a> if none was
      provided for the <a>active media session</a>.
    </p>
    <p>
      A user agent MAY expose microphone, camera, and screenshare state to web
      pages via {{MediaStreamTrack}}'s {{MediaStreamTrack/muted}} attribute in
      addition to {{MediaSessionAction/togglemicrophone}},
      {{MediaSessionAction/togglecamera}} or
      {{MediaSessionAction/togglescreenshare}} [=media session action=]. In that
      case, the user agent MUST execute the corresponding
      {{MediaSessionActionHandler}} before running, as different tasks, the
      steps defined to [$set a track's muted state$].
    </p>
    <p>
      The {{MediaSessionAction/voiceactivity}} action source MUST always have a
      target whose document MUST always have {{MediaStreamTrackState/live}}
      microphone {{MediaStreamTrack}}s. A user agent MUST invoke the
      {{MediaSessionActionHandler}} for {{MediaSessionAction/voiceactivity}}
      only when voice activity is detected from a microphone with one or more
      {{MediaStreamTrackState/live}} {{MediaStreamTrack}}s. A user agent MAY
      ignore voice activity if the microphone is not muted and all
      {{MediaStreamTrack}}s associated with the microphone are
      {{MediaStreamTrack/enabled}}. It is RECOMMENDED for user agents to set a
      minimal interval between invocations of the {{MediaSessionActionHandler}}
      for {{MediaSessionAction/voiceactivity}} based on privacy and power
      efficiency policies.
    </p>

    <p class=note>
      {{MediaSessionAction/voiceactivity}} only indicates the start of voice
      activity. Applications may display a notification if the user is speaking
      while the {{MediaStreamTrack}} is muted, or start an {{AudioWorklet}} for
      audio processing. No action is defined for the end of voice activity.
      Unlike other actions which are explicitly triggered by the user,
      {{MediaSessionAction/voiceactivity}} also depends on the voice activity
      detection algorithm of the user agent or the system. For privacy and power
      efficiency concerns, the web page may not be notified if voice activity
      ends and restarts soon after the last {{MediaSessionAction/voiceactivity}}
      action.
    </p>

    <p class=note>
      A page should only register a {{MediaSessionActionHandler}} for a <a>media
      session action</a> when it can handle the action given that the user agent
      will list this as a <a>supported media session action</a> and update the
      <a>media session action sources</a>.
    </p>

    <p>
      When the <dfn>media session actions update algorithm</dfn> is invoked, the
      user agent MUST run the following steps:
      <ol>
        <li>
          Let <var>available actions</var> be an array of <a>media session
          actions</a>.
        </li>
        <li>
          If the <a>active media session</a> is null, set <var>available
          actions</var> to the empty array.
        </li>
        <li>
          Otherwise, set the <var>available actions</var> to the list of keys
          available in the <a>active media session</a>'s <a>supported media
          session actions</a>.
        </li>
        <li>
          For each <a>media session action source</a> <var>source</var>, run the
          following substeps:
          <ol>
            <li>
              Optionally, if the <a>active media session</a> is not null:
              <ol>
                <li>
                  If the <a>active media session</a>'s <a>actual playback
                  state</a> is <a enum-value
                  for="MediaSessionPlaybackState">playing</a>, remove <a
                  enum-value for=MediaSessionAction>play</a> from <var>available
                  actions</var>.
                </li>
                <li>
                  Otherwise, remove <a enum-value
                  for=MediaSessionAction>pause</a> from <var>available
                  actions</var>.
                </li>
              </ol>
            </li>
            <li>
              If the <var>source</var> is a UI element created by the user
              agent, it MAY remove some elements from <var>available
              actions</var> if there are too many of them compared to the
              available space.
            </li>
            <li>
              Notify the <var>source</var> with the updated list of
              <var>available actions</var>.
            </li>
          </ol>
        </li>
      </ol>
    </p>
  </section>

  <section>
    <h3 id='position-state-sec'>Position State</h3>

    <p>
      A user agent MAY display the <a>current playback position</a> and
      <a>duration</a>
      of a media session in the platform UI depending on platform conventions.
      The
      <dfn>position state</dfn> is the combination of the following:
      <ul>
        <li>
          The <dfn>duration</dfn> of the media in seconds.
        </li>
        <li>
          The <dfn>playback rate</dfn> of the media. It is a coefficient.
        </li>
        <li>
          The <dfn>last reported playback position</dfn> of the media. This is
          the playback position of the media in seconds when the <a>position
          state</a>
          was created.
        </li>
      </ul>
    </p>

    <p>
      The <a>position state</a> is represented by a {{MediaPositionState}} which
      MUST always be stored with the <dfn>last position updated time</dfn>. This
      is the time the <a>position state</a> was last updated in seconds.
    </p>

    <p>
      The RECOMMENDED way to determine the <a>position state</a> is to monitor
      the media elements whose node document's browsing context is the
      [=/browsing context=].
    </p>

    <p>
      The <dfn>actual playback rate</dfn> is a coefficient computed in the
      following way:
      <ul>
        <li>
          If the <a>actual playback state</a> is <a enum-value
          for="MediaSessionPlaybackState">paused</a>, then return zero.
        </li>
        <li>
          Return <a>playback rate</a>.
        </li>
      </ul>
    </p>

    <p>
      The <dfn>current playback position</dfn> in seconds is computed in the
      following way:
      <ul>
        <li>
          Set <var>time elapsed</var> to the system time in seconds minus the
          <a>last position updated time</a>.
        </li>
        <li>
          Mutliply <var>time elapsed</var> with <a>actual playback rate</a>.
        </li>
        <li>
          Set <var>position</var> to <var>time elapsed</var> added to
          <a>last reported playback position</a>.
        </li>
        <li>
          If <var>position</var> is less than zero, return zero.
        </li>
        <li>
          If <var>position</var> is greater than <a>duration</a>, return
          <a>duration</a>.
        </li>
        <li>
          Return <var>position</var>.
        </li>
      </ul>
    </p>

  </section>

</section>

<h2 id="the-mediasession-interface">The {{MediaSession}} interface</h2>

<pre class="idl">
[Exposed=Window]
partial interface Navigator {
  [SameObject] readonly attribute MediaSession mediaSession;
};

enum MediaSessionPlaybackState {
  "none",
  "paused",
  "playing"
};

enum MediaSessionAction {
  "play",
  "pause",
  "seekbackward",
  "seekforward",
  "previoustrack",
  "nexttrack",
  "skipad",
  "stop",
  "seekto",
  "togglemicrophone",
  "togglecamera",
  "togglescreenshare",
  "hangup",
  "previousslide",
  "nextslide",
  "enterpictureinpicture",
  "voiceactivity"
};

callback MediaSessionActionHandler = undefined(MediaSessionActionDetails details);

[Exposed=Window]
interface MediaSession {
  attribute MediaMetadata? metadata;

  attribute MediaSessionPlaybackState playbackState;

  undefined setActionHandler(MediaSessionAction action, MediaSessionActionHandler? handler);

  undefined setPositionState(optional MediaPositionState state = {});

  Promise&lt;undefined&gt; setMicrophoneActive(boolean active);

  Promise&lt;undefined&gt; setCameraActive(boolean active);

  Promise&lt;undefined&gt; setScreenshareActive(boolean active);
};
</pre>

<p>
  A {{MediaSession}} object represents a media session for a given document and
  allows a document to communicate to the user agent some information about the
  playback and how to handle it.
</p>

<p>
  A {{MediaSession}} has an associated <dfn for="MediaSession">metadata</dfn>
  object represented by a {{MediaMetadata}}. It is initially `null`.
</p>

<p>
  The <dfn attribute for="Navigator"><code>mediaSession</code></dfn> attribute
  MUST return the {{MediaSession}} instance associated with the {{Navigator}}
  object.
</p>

<p>
  The <dfn attribute for="MediaSession"><code>metadata</code></dfn> attribute
  reflects the {{MediaSession}}'s {{MediaSession/metadata}}. On getting, it MUST
  return the {{MediaSession}}'s {{MediaSession/metadata}}. On setting, it MUST
  run the following steps with <var>value</var> being the new value being set:
  <ol>
    <li>
      If the {{MediaSession}}'s {{MediaSession/metadata}} is not `null`, set its
      [=MediaMetadata/media session=] to `null`.
    </li>
    <li>
      Set the {{MediaSession}}'s {{MediaSession/metadata}} to
      <var>value</var>.
    </li>
    <li>
      If the {{MediaSession}}'s {{MediaSession/metadata}} is not `null`, set its
      [=MediaMetadata/media session=] to the current {{MediaSession}}.
    </li>
    <li>
      <a>In parallel</a>, run the <a>update metadata algorithm</a>.
    </li>
  </ol>
</p>

<p>
  The <dfn attribute for="MediaSession"><code>playbackState</code></dfn>
  attribute represents the <dfn>declared playback state</dfn> of the <a>media
  session</a>, by which the session declares whether its [=/browsing context=]
  is playing media or not. The initial value is <a enum-value
  for="MediaSessionPlaybackState">none</a>. On setting, the user agent MUST set
  the IDL attribute to the new value if it is a valid
  {{MediaSessionPlaybackState}} value. On getting, the user agent MUST return
  the last valid value that was set. The {{MediaSession/playbackState}}
  attribute is a hint for the user agent to determine whether the [=/browsing
  context=] is playing or paused.
</p>

<p class=note>
  Setting {{MediaSession/playbackState}} may cause the <a>actual playback
  state</a> to change and run the <a>media session actions update algorithm</a>.
</p>

<p>
  The {{MediaSessionPlaybackState}} enum is used to indicate whether a
  [=/browsing context=] is playing media or not, the values are described as
  follows:

  <ul>
    <li>
      <dfn enum-value for="MediaSessionPlaybackState">none</dfn> means the
      [=/browsing context=] does not specify whether it's playing or paused, it
      can only be used in the {{MediaSession/playbackState}} attribute.
    </li>
    <li>
      <dfn enum-value for="MediaSessionPlaybackState">playing</dfn> means the
      [=/browsing context=] is currently playing media and it can be paused.
    </li>
    <li>
      <dfn enum-value for="MediaSessionPlaybackState">paused</dfn> means the
      [=/browsing context=] has paused media and it can be resumed.
    </li>
  </ul>
</p>

<p>
  The <dfn method for=MediaSession>setActionHandler(action, handler)</dfn>
  method, when invoked, MUST run the <a>update action handler algorithm</a> with
  <var>action</var> and <var>handler</var> on the {{MediaSession}}.
</p>

<p>
  The <dfn method for=MediaSession>setPositionState(|state|)</dfn> method, when
  invoked MUST perform the following steps:

  <ul>
    <li>
      If <var>state</var> is an empty dictionary, clear the <a>position
      state</a>
      and abort these steps.
    </li>
    <li>
      If <var>state</var>'s <a dict-member for="MediaPositionState">duration</a>
      is not present, throw a <a exception>TypeError</a>.
    </li>
    <li>
      If <var>state</var>'s {{MediaPositionState/duration}} is negative or
      <code>NaN</code>, throw a <a exception>TypeError</a>.
    </li>
    <li>
      If <var>state</var>'s {{MediaPositionState/position}} is not present, set
      it to zero.
    </li>
    <li>
      If <var>state</var>'s <a dict-member for="MediaPositionState">position</a>
      is negative or greater than <a dict-member
      for="MediaPositionState">duration</a>, throw a
      <a exception>TypeError</a>.
    </li>
    <li>
      If <var>state</var>'s <a dict-member
      for="MediaPositionState">playbackRate</a> is not present, set it to 1.0.
    </li>
    <li>
      If <var>state</var>'s {{MediaPositionState/playbackRate}} is zero, throw a
      <a exception>TypeError</a>.
    </li>
    <li>
      Update the <a>position state</a> and <a>last position updated time</a>.
    </li>
  </ul>
</p>

<p>
  The <dfn method for=MediaSession>setMicrophoneActive(active)</dfn> method
  indicates to the user agent the microphone capture state desired by the page
  (e.g. if the microphone is considered "inactive" by the page since it is no
  longer sending audio through a call, the page can invoke
  <code>setMicrophoneActive(false)</code>). When invoked, it MUST perform the
  following steps:
  <ol>
    <li>
      Let <var>document</var> be [=this=]'s [=relevant global object=]'s
      [=associated Document=].
    </li>
    <li>
      Let <var>captureKind</var> be "microphone".
    </li>
    <li>
      Return the result of running the [=update capture state algorithm=] with
      <var>document</var>, <var>active</var> and <var>captureKind</var>.
    </li>
  </ol>
</p>
<p>
  Similarly, the <dfn method for=MediaSession>setCameraActive(active)</dfn>
  method indicates to the user agent the camera capture state desired by the
  page. When invoked, it MUST perform the following steps:
  <ol>
    <li>
      Let <var>document</var> be [=this=]'s [=relevant global object=]'s
      [=associated Document=].
    </li>
    <li>
      Let <var>captureKind</var> be "camera".
    </li>
    <li>
      Return the result of running the [=update capture state algorithm=] with
      <var>document</var>, <var>active</var> and <var>captureKind</var>.
    </li>
  </ol>
</p>
<p>
  Similarly, the <dfn method for=MediaSession>setScreenshareActive(active)</dfn>
  method indicates to the user agent the screenshare capture state desired by
  the page. When invoked, it MUST perform the following steps:
  <ol>
    <li>
      Let <var>document</var> be [=this=]'s [=relevant global object=]'s
      [=associated Document=].
    </li>
    <li>
      Let <var>captureKind</var> be "screenshare".
    </li>
    <li>
      Return the result of running the [=update capture state algorithm=] with
      <var>document</var>, <var>active</var> and <var>captureKind</var>.
    </li>
  </ol>
</p>
<p>
  The <dfn>update capture state algorithm</dfn>, when invoked with
  <var>document</var>, <var>active</var> and <var>captureKind</var>, MUST
  perform the following steps:
  <ol>
    <li>
      If <var>document</var> is not [=fully active=], return [=a promise
      rejected with=] <a exception>InvalidStateError</a>.
    </li>
    <li>
      If <var>active</var> is <code>true</code> and <var>document</var>'s
      [=Document/visibility state=] is not "visible", the user agent MAY return
      [=a promise rejected with=] <a exception>InvalidStateError</a>.
    </li>
    <li>
      Let <var>p</var> be a new promise.
    </li>
    <li>
      <a>In parallel</a>, run the following steps:
      <ol>
        <li>
          Let <var>applyPausePolicy</var> be <code>true</code> if the user agent
          implements a policy of <dfn>pausing all input sources</dfn> of type
          <var>captureKind</var> in response to UI and <code>false</code>
          otherwise.
        </li>
        <li>
          If <var>applyPausePolicy</var> is <code>true</code>, run the following
          substeps:
          <ol>
            <li>
              Let <var>currentlyActive</var> be <code>false</code> if the user
              agent is currently [=pausing all input sources=] of type
              <var>captureKind</var>
              and <code>true</code> otherwise.
            </li>
            <li>
              If <var>active</var> is <var>currentlyActive</var>, resolve
              <var>p</var> with <code>undefined</code> and abort these steps.
            </li>
            <li>
              If <var>active</var> is <code>true</code>, the user agent MAY wait
              to proceed, for instance to prompt the user.
            </li>
            <li>
              If the user agent denies the request to update the capture state,
              reject <var>p</var> with a <a exception>NotAllowedError</a> and
              abort these steps.
            </li>
          </ol>
        </li>
        <li>
          Update the user agent capture state UI according to
          <var>captureKind</var>
          and <var>active</var>.
        </li>
        <li>
           <a>Queue a task</a> using the [=user interaction task source=]
           to resolve <var>p</var> with <code>undefined</code>.
        </li>
        <li>
          If <var>applyPausePolicy</var> is <code>true</code>, run the following
          substeps:
          <ol>
            <li>
              Let <var>newMutedState</var> be <code>true</code> if
              <var>active</var> is
            <code>false</code> and <code>false</code> otherwise.</li>
            <li>
              For each {{MediaStreamTrack}} whose source is of type
              <var>captureKind</var>,
              <a>queue a task</a>using the [=user interaction task source=]
              to [$set a track's muted state$] to <var>newMutedState</var>.
            </li>
          </ol>
        </li>
      </ol>
    </li>
    <li>
      Return <var>p</var>.
    </li>
  </ol>
</p>
<p class=note>
  The <a>setMicrophoneActive(active)</a>, <a>setCameraActive(active)</a>
  and <a>setScreenshareActive(active)</a> methods can reject based on user agent
  specific heuristics. This might in particular happen when the web page asks to
  activate (unmute) the microphone, camera or screenshare. The user agent could
  decide to require [=transient activation=] in that case. It might also require
  user input through a prompt to make the actual decision.
</p>

<p>
  The user agent MAY display UI which invokes handlers for
  <a>media session actions</a>.
</p>

<h2 id="the-mediametadata-interface">The {{MediaMetadata}} interface</h2>

<pre class="idl">

[Exposed=Window]
interface MediaMetadata {
  constructor(optional MediaMetadataInit init = {});
  attribute DOMString title;
  attribute DOMString artist;
  attribute DOMString album;
  attribute FrozenArray&lt;object> artwork;
  [SameObject] readonly attribute FrozenArray&lt;ChapterInformation> chapterInfo;
};

dictionary MediaMetadataInit {
  DOMString title = "";
  DOMString artist = "";
  DOMString album = "";
  sequence&lt;MediaImage> artwork = [];
  sequence&lt;ChapterInformationInit> chapterInfo = [];
};
</pre>

<p>
  A {{MediaMetadata}} object is a representation of the metadata associated with
  a {{MediaSession}} that can be used by user agents to provide customized user
  interface.
</p>

<p>
  A {{MediaMetadata}} can have an associated <dfn for="MediaMetadata">media
  session</dfn>.
</p>

<p>
  A {{MediaMetadata}} has an associated <dfn for="MediaMetadata">title</dfn>,
  <dfn for="MediaMetadata">artist</dfn> and <dfn for="MediaMetadata">album</dfn>
  which are DOMString.
</p>

<p>
  A {{MediaMetadata}} has an associated sequence of <dfn
  for="MediaMetadata">artwork images</dfn>, which is a sequence of type
  {{MediaImage}}. A {{MediaMetadata}} also has has an associated <dfn
  for="MediaMetadata">
  converted artwork images</dfn> which is initially <code>undefined</code>.
</p>

<p>
  A {{MediaMetadata}} has an associated list of <dfn for="MediaMetadata">
  chapter information</dfn>.
</p>

<p>
  A {{MediaMetadata}} is said to be an <dfn>empty metadata</dfn> if it is equal
  to `null` or all the following conditions are true:
  <ul>
    <li>Its <a for=MediaMetadata>title</a> is the empty string.</li>
    <li>Its <a for=MediaMetadata>artist</a> is the empty string.</li>
    <li>Its <a for=MediaMetadata>album</a> is the empty string.</li>
    <li>Its <a for=MediaMetadata title='artwork image'>artwork images</a> length
    is <code>0</code>.</li>
    <li>Its <a for=MediaMetadata>chapter information</a> length is
    <code>0</code>.</li>
  </ul>
</p>

<p>
  The <dfn constructor for="MediaMetadata">MediaMetadata(<var>init</var>)</dfn>
  constructor, when invoked, MUST run the following steps:

  <ol>
    <li>
      Let <var>metadata</var> be a new {{MediaMetadata}} object.
    </li>
    <li>
      Set <var>metadata</var>'s {{MediaMetadata/title}} to <var>init</var>'s
      {{MediaMetadataInit/title}}.
    </li>
    <li>
      Set <var>metadata</var>'s {{MediaMetadata/artist}} to <var>init</var>'s
      {{MediaMetadataInit/artist}}.
    </li>
    <li>
      Set <var>metadata</var>'s {{MediaMetadata/album}} to
      <var>init</var>'s {{MediaMetadataInit/album}}.
    </li>
    <li>
      Run the <a>convert artwork algorithm</a> with <var>init</var>'s
      {{MediaMetadataInit/artwork}} as <var>input</var> and set
      <var>metadata</var>'s <a for="MediaMetadata">artwork images</a> as the
      result if it succeeded.
    </li>
    <li>
      Let <var>chapters</var> be an empty list of type {{ChapterInformation}}.
    </li>
    <li>
      For each <var>entry</var> in <var>init</var>'s
      {{MediaMetadataInit/chapterInfo}}, [=create a ChapterInformation=] from
      <var>entry</var> and append it to
      <var>chapters</var>.
    </li>
    <li>
      Set <var>metadata</var>'s <a for="MediaMetadata">chapter information</a>
      to the result of [=Create a frozen array|creating a frozen array=] from
      <var>chapters</var>.
    </li>
    <li>
      Return <var>metadata</var>.
    </li>
  </ol>
</p>

When the <dfn>convert artwork algorithm</dfn> with <var>input</var> parameter is
invoked, where the <var>input</var> is a sequence of type {{MediaImage}}, the
user agent MUST run the following steps:
<ol>
  <li>
    Let <var>output</var> be an empty list of type {{MediaImage}}.
  </li>
  <li>
    For each <var>entry</var> in <var>input</var> (which is a {{MediaImage}}
    list), perform the following steps:
    <ol>
      <li>
        Let <var>image</var> be a new {{MediaImage}}.
      </li>
      <li>Let <var>baseURL</var> be the API base URL specified by the <a>entry
      settings object</a>. </li>
      <li>
        <a lt="url parser">Parse</a> <var>entry</var>'s {{MediaImage/src}} using
        <var>baseURL</var>. If it does not return failure, set
        <var>image</var>'s {{MediaImage/src}} to the return value. Otherwise,
        throw a <a exception>TypeError</a> and abort these steps.
      </li>
      <li>
        Set <var>image</var>'s {{MediaImage/sizes}} to <var>entry</var>'s
        {{MediaImage/sizes}}.
      </li>
      <li>
        Set <var>image</var>'s {{MediaImage/type}} to <var>entry</var>'s
        {{MediaImage/type}}.
      </li>
      <li>
        Append <var>image</var> to the <var>output</var>.
      </li>
    </ol>
  </li>
  <li>
    Return <var>output</var> as result.
  </li>
</ol>

<p>
  The <dfn attribute for="MediaMetadata">title</dfn> attribute reflects the
  {{MediaMetadata}}'s <a for=MediaMetadata>title</a>. On getting, it MUST return
  the {{MediaMetadata}}'s <a for=MediaMetadata>title</a>. On setting, it MUST
  set the {{MediaMetadata}}'s <a for=MediaMetadata>title</a> to the given value.
</p>

<p>
  The <dfn attribute for="MediaMetadata">artist</dfn> attribute reflects the
  {{MediaMetadata}}'s <a for=MediaMetadata>artist</a>. On getting, it MUST
  return the {{MediaMetadata}}'s <a for=MediaMetadata>artist</a>. On setting, it
  MUST set the {{MediaMetadata}}'s <a for=MediaMetadata>artist</a>
  to the given value.
</p>

<p>
  The <dfn attribute for="MediaMetadata">album</dfn> attribute reflects the
  {{MediaMetadata}}'s <a for=MediaMetadata>album</a>. On getting, it MUST return
  the {{MediaMetadata}}'s <a for=MediaMetadata>album</a>. On setting, it MUST
  set the {{MediaMetadata}}'s <a for=MediaMetadata>album</a> to the given value.
</p>

<p>
  The <dfn attribute for="MediaMetadata">artwork</dfn>
  attribute reflects the {{MediaMetadata}}'s <a for="MediaMetadata">artwork
  images</a>. On getting, it MUST run the following steps:
  <ol>
    <li>
      If the {{MediaMetadata}}'s <a>converted artwork images</a> is
      <code>undefined</code>, run the following steps:
      <ol>
        <li>
          Let <var>frozenArtwork</var> be a JavaScript Array value.
        </li>
        <li>
          For each <var>entry</var> in the {{MediaMetadata}}'s <a
          for="MediaMetadata">artwork images</a>, perform the following steps:
          <ol>
            <li>
              Let <var>image</var> be the result of [=converted to a JavaScript
              value|converting to a JavaScript object=] <var>entry</var>.
            </li>
            <li>
              Perform [=!=] <a
              abstract-op>SetIntegrityLevel</a>(<var>image</var>,
              "<code>frozen</code>"), to prevent accidental mutation by scripts.
            </li>
            <li>
              Push <var>image</var> to <var>frozenArtwork</var>.
            </li>
          </ol>
        </li>
        <li>
          Perform [=!=] <a
          abstract-op>SetIntegrityLevel</a>(<var>frozenArtwork</var>,
          "<code>frozen</code>").
        </li>
        <li>
          Set the {{MediaMetadata}}'s <a>converted artwork images</a> to
          <var>frozenArtwork</var>.
        </li>
      </ol>
    </li>
    <li>
      Return the {{MediaMetadata}}'s <a>converted artwork images</a>.
    </li>
  </ol>
  On setting, it MUST run the following steps with <var>value</var> being the
  new value being set:
  <ol>
    <li>
      Let <var>convertedArtwork</var> be the result of [=converted to an IDL
      value|converting=] <var>value</var> to a sequence of type {{MediaImage}}.
    </li>
    <li>
      Run <a>convert artwork algorithm</a> with <var>convertedArtwork</var>, and
      set the {{MediaMetadata}}'s <a for="MediaMetadata">artwork images</a> as
      the result if it succeeds.
    </li>
    <li>
      Set the {{MediaMetadata}}'s <a>converted artwork images</a> to
      <code>undefined</code>.
    </li>
  </ol>
</p>

<p>
  When {{MediaMetadata}}'s <a for=MediaMetadata>title</a>, <a
  for=MediaMetadata>artist</a>, <a for=MediaMetadata>album</a> or <a
  for=MediaMetadata>artwork images</a> are modified, the user agent MUST run the
  following steps:
  <ol>
    <li>
      If the instance has no associated [=MediaMetadata/media session=], abort
      these steps.
    </li>
    <li>
      Otherwise, <a>queue a task</a> to run the following substeps:
      <ol>
        <li>
          If the instance no longer has an associated <a for=MediaMetadata>media
          session</a>, abort these steps.
        </li>
        <li>
          Otherwise, <a>in parallel</a>, run the <a>update metadata
          algorithm</a>.
        </li>
      </ol>
    </li>
  </ol>
</p>

<h2 id="the-chapterinformation-interface">The {{ChapterInformation}}
interface</h2>

<pre class="idl">
[Exposed=Window]
interface ChapterInformation {
  readonly attribute DOMString title;
  readonly attribute double startTime;
  [SameObject] readonly attribute FrozenArray&lt;MediaImage> artwork;
};

dictionary ChapterInformationInit {
  DOMString title = "";
  double startTime = 0;
  sequence&lt;MediaImage> artwork = [];
};

</pre>

<p>
  A {{ChapterInformation}} object is a representation of metadata for an
  individual chapter, such as the title of the section, its timestamp, and
  screenshot image data of this section, that can be used by user agents to
  provide a customized user interface.
</p>

<p>
  A {{ChapterInformation}} can have an associated <dfn for="ChapterInformation">
  media metadata</dfn>.
</p>

<p>
  A {{ChapterInformation}} has an associated <dfn
  for="ChapterInformation">title</dfn>
  which is DOMString.
</p>

<p>
  A {{ChapterInformation}} has an associated <dfn for="ChapterInformation">
  startTime</dfn> which is double.
</p>

<p>
  A {{ChapterInformation}} has an associated list of <dfn
  for="ChapterInformation">
  artwork images</dfn>.
</p>

<p>
  To <dfn>create a {{ChapterInformation}}</dfn> with <var>init</var>, run the
  following steps:

  <ol>
    <li>
      Let <var>chapterInfo</var> be a new {{ChapterInformation}} object.
    </li>
    <li>
      Set <var>chapterInfo</var>'s {{ChapterInformation/title}} to
      <var>init</var>'s {{ChapterInformationInit/title}}.
    </li>
    <li>
      Set <var>chapterInfo</var>'s {{ChapterInformation/startTime}} to
      <var>init</var>'s {{ChapterInformationInit/startTime}}. If the <a
      for=ChapterInformation>startTime</a> is negative or greater than
      [=duration=], throw a <a exception>TypeError</a>.
    </li>
    <li>
      Let {{ChapterInformationInit/artwork}} be the result of running the
      <a>convert artwork algorithm</a> with <var>init</var>'s
      {{ChapterInformation/artwork}} as <var>input</var>.
    </li>
    <li>
      Set <var>chapterInfo</var>'s <a for="ChapterInformation">artwork
      images</a> to the result of [=Create a frozen array|creating a frozen
      array=] from {{ChapterInformationInit/artwork}}.
    </li>
    <li>
      Return <var>chapterInfo</var>.
    </li>
  </ol>
</p>

<p>
  The <dfn attribute for="ChapterInformation">title</dfn> attribute reflects the
  {{ChapterInformation}}'s <a for=ChapterInformation>title</a>. On getting, it
  MUST return the {{ChapterInformation}}'s <a for=ChapterInformation>title</a>.
</p>

<p>
  The <dfn attribute for="ChapterInformation">startTime</dfn> attribute reflects
  the {{ChapterInformation}}'s <a for=ChapterInformation>startTime</a> in
  seconds. On getting, it MUST return the {{ChapterInformation}}'s <a
  for=ChapterInformation>startTime</a>.
</p>

<p>
  The <dfn attribute for="ChapterInformation">artwork</dfn>
  attribute reflects the {{ChapterInformation}}'s <a
  for="ChapterInformation">artwork images</a>. On getting, it MUST return the
  {{ChapterInformation}}'s <a for=ChapterInformation>
  artwork images</a>.
</p>

<h2 id="the-mediaimage-dictionary">The {{MediaImage}} dictionary</h2>

<pre class="idl">

dictionary MediaImage {
  required USVString src;
  DOMString sizes = "";
  DOMString type = "";
};
</pre>

<p class="informative">The {{MediaImage}} dictionary members are inspired by
{{ImageResource}} in [[IMAGE-RESOURCE]].</p>

The <dfn dict-member for="MediaImage">src</dfn> <a>dictionary member</a> is used
to specify the {{MediaImage}} object's <dfn attribute
for="MediaImage">source</dfn>. It is a URL from which the user agent can fetch
the image's data.

The <dfn dict-member for="MediaImage">sizes</dfn> <a>dictionary member</a> is
used to specify the {{MediaImage}} object's {{MediaImage/sizes}}. It follows the
spec of <{link/sizes}> attribute in the HTML <{link}> element, which is a string
consisting of an [=unordered set of unique space-separated tokens=] which are
[=ASCII case-insensitive=] that represents the dimensions of an image. Each
keyword is either an [=ASCII case-insensitive=] match for the string "any", or a
value that consists of two valid non-negative integers that do not have a
leading U+0030 DIGIT ZERO (0) character and that are separated by a single
U+0078 LATIN SMALL LETTER X or U+0058 LATIN CAPITAL LETTER X character. The
keywords represent icon sizes in raw pixels (as opposed to CSS pixels). When
multiple image objects are available, a user agent MAY use the value to decide
which icon is most suitable for a display context (and ignore any that are
inappropriate). The parsing steps for the {{MediaImage/sizes}} attribute MUST
follow <a attribute for="HTMLLinkElement" lt="sizes">the parsing steps for HTML
<code>link</code> element <code>sizes</code> attribute</a>.

The <dfn dict-member for="MediaImage">type</dfn> <a>dictionary member</a> is
used to specify the {{MediaImage}} object's <a>MIME type</a>. It is a hint as to
the media type of the image. The purpose of this attribute is to allow a user
agent to ignore images of media types it does not support.

<h2 id="the-mediapositionstate-dictionary">The {{MediaPositionState}}
dictionary</h2>

<pre class="idl">

dictionary MediaPositionState {
  unrestricted double duration;
  double playbackRate;
  double position;
};
</pre>

The {{MediaPositionState}} dictionary is a representation of the current
playback position associated with a {{MediaSession}} that can be used by user
agents to provide a user interface that displays the current playback position
and duration.

The <dfn dict-member for="MediaPositionState">duration</dfn> <a>dictionary
member</a>
is used to specify the <a>duration</a> in seconds. It should always be positive
and positive infinity can be used to indicate media without a defined end such
as live playback.

The <dfn dict-member for="MediaPositionState">playbackRate</dfn> <a>dictionary
member</a>
is used to specify the <a>playback rate</a>. It can be positive to represent
forward playback or negative to represent backwards playback. It should not be
zero.

The <dfn dict-member for="MediaPositionState">position</dfn> <a>dictionary
member</a>
is used to specify the <a>last reported playback position</a> in seconds. It
should always be positive.

<h2 id="the-mediasessionactiondetails-dictionary">The
{{MediaSessionActionDetails}} dictionary</h2>

<pre class="idl">

dictionary MediaSessionActionDetails {
  required MediaSessionAction action;
  double seekOffset;
  double seekTime;
  boolean fastSeek;
  boolean isActivating;
};

</pre>

The {{MediaSessionActionHandler}} MUST be run with the <var>details</var>
parameter whose dictionary type is {{MediaSessionActionDetails}}.

The <dfn dict-member for="MediaSessionActionDetails">action</dfn> <a>dictionary
member</a>
is used to specify the <a>media session action</a> that the
{{MediaSessionActionHandler}} is associated with.

The <dfn dict-member for="MediaSessionActionDetails">seekOffset</dfn>
<a>dictionary member</a> MAY be provided when the <a>media session action</a>
is {{MediaSessionAction/seekbackward}} or {{MediaSessionAction/seekforward}}. It
is the time in seconds to move the playback time by. If present, it should
always be positive. If it is not provided then the site should choose a sensible
time (e.g. a few seconds).

When the <a>media session action</a> is {{MediaSessionAction/seekto}}:
<ul>
  <li>
    The <dfn dict-member for="MediaSessionActionDetails">seekTime</dfn>
    <a>dictionary member</a> MUST be provided and is the time in seconds to move
    the playback time to.
  </li>

  <li>
    The <dfn dict-member for="MediaSessionActionDetails">fastSeek</dfn>
    <a>dictionary member</a> MAY be provided and will be true if the [=media
    session action|action=] is being called multiple times as part of a sequence
    and this is not the last call in that sequence.
  </li>
</ul>

The <dfn dict-member for="MediaSessionActionCaptureDetails">isActivating</dfn>
<a>dictionary member</a> will be <code>false</code> if the user agent is about
to [=pausing all input sources|pause all input sources=] related to the capture
[=media session action|action=] and <code>true</code> otherwise. This
<a>dictionary member</a> MUST be present if the user agent implements a policy
of [=pausing all input sources=] and the <a>media session action</a>
is {{MediaSessionAction/togglecamera}}, {{MediaSessionAction/togglemicrophone}}
or {{MediaSessionAction/screenshare}}.

<h2 id="permissions-policy">Permissions Policy Integration</h2>

This specification defines a [=policy-controlled feature=] identified by the
string "mediasession". Its [=default allowlist=] is [=default allowlist/*=].

A document's <a>permissions policy</a> determines whether any content in that
document is allowed to use the MediaSession API. If disabled in the document,
the User Agent MUST NOT select the document's media session as the <a>active
media session</a>.

<h2 id="examples">Examples</h2>

<em>This section is non-normative.</em>

<div class="example" id="example-setting-metadata">
  Setting <a for=MediaSession>metadata</a>:

  <pre class="lang-javascript">
    navigator.mediaSession.metadata = new MediaMetadata({
      title: "Episode Title",
      artist: "Podcast Host",
      album: "Podcast Title",
      artwork: [{src: "podcast.jpg"}],
      chapterInfo: [
        {title: "Chapter 1", startTime: 0, artwork: [{src: "chapter1.jpg"}]},
        {title: "Chapter 2", startTime: 120, artwork: [{src: "chapter2.jpg"}]}
      ]
    });
  </pre>

  Alternatively, providing multiple <a for="MediaMetadata" title="artwork
  image">artwork images</a> in the metadata can let the user agent be able to
  select different artwork images for different display purposes and better fit
  for different screens (the same for the artwork in
  {{MediaMetadata/chapterInfo}}):

  <pre class="lang-javascript">
    navigator.mediaSession.metadata = new MediaMetadata({
      title: "Episode Title",
      artist: "Podcast Host",
      album: "Podcast Title",
      artwork: [
        {src: "podcast.jpg", sizes: "128x128", type: "image/jpeg"},
        {src: "podcast_hd.jpg", sizes: "256x256"},
        {src: "podcast_xhd.jpg", sizes: "1024x1024", type: "image/jpeg"},
        {src: "podcast.png", sizes: "128x128", type: "image/png"},
        {src: "podcast_hd.png", sizes: "256x256", type: "image/png"},
        {src: "podcast.ico", sizes: "128x128 256x256", type: "image/x-icon"}
      ],
      chapterInfo: [
        {title: "Chapter 1", startTime: 0, artwork: [
           {src: "chapter1_a.jpg", sizes: "128x128", type: "image/jpeg"},
           {src: "chapter1_b.png", sizes: "256x256", type: "image/png"}
         ]},
        {title: "Chapter 2", startTime: 120, artwork: [
           {src: "chapter2_a.jpg", sizes: "128x128", type: "image/jpeg"},
           {src: "chapter2_b.png", sizes: "256x256", type: "image/png"}
         ]}
      ]
    });
  </pre>

  For example, if the user agent wants to use an image as icon, it may choose
  `"podcast.jpg"` or `"podcast.png"` for a low-pixel-density screen, and
  `"podcast_hd.jpg"` or `"podcast_hd.png"` for a high-pixel-density screen. If
  the user agent wants to use an image for lockscreen background,
  `"podcast_xhd.jpg"` will be preferred.

</div>

<div class="example" id="example-changing-metadata">
  Changing [=MediaSession/metadata=]:

  For playlists or chapters of an audio book, multiple [=media elements=] can
  share a single <a>media session</a>.

  <pre class="lang-javascript">
    var audio1 = document.createElement("audio");
    audio1.src = "chapter1.mp3";

    var audio2 = document.createElement("audio");
    audio2.src = "chapter2.mp3";

    audio1.play();
    audio1.addEventListener("ended", function() {
      audio2.play();
    });
  </pre>

  Because the session is shared, the metadata must be updated to reflect what is
  currently playing.

  <pre class="lang-javascript">
    function updateMetadata(event) {
      navigator.mediaSession.metadata = new MediaMetadata({
        title: event.target == audio1 ? "Chapter 1" : "Chapter 2",
        artist: "An Author",
        album: "A Book",
        artwork: [{src: "cover.jpg"}]
      });
    }

    audio1.addEventListener("play", updateMetadata);
    audio2.addEventListener("play", updateMetadata);
  </pre>
</div>

<div class="example" id="example-media-session-actions">
  Handling <a>media session actions</a>:
  <pre class="lang-javascript">
    var tracks = ["chapter1.mp3", "chapter2.mp3", "chapter3.mp3"];
    var trackId = 0;

    var audio = document.createElement("audio");
    audio.src = tracks[trackId];

    function updatePlayingMedia() {
      audio.src = tracks[trackId];
      // Update metadata (omitted)
    }

    navigator.mediaSession.setActionHandler("previoustrack", function() {
      trackId = (trackId + tracks.length - 1) % tracks.length;
      updatePlayingMedia();
    });

    navigator.mediaSession.setActionHandler("nexttrack", function() {
      trackId = (trackId + 1) % tracks.length;
      updatePlayingMedia();
    });

    navigator.mediaSession.setActionHandler("seekto", function(details) {
      audio.currentTime = details.seekTime;
    });
  </pre>
</div>

<div class="example" id="example-set-playbackState">
  Setting {{MediaSession/playbackState}}:

  When a page pauses its media and plays a third-party ad in an iframe, the UA
  might consider the session as "not playing", however the page wants to allow
  the user to pause the ad playback and cancel the pending playback after the ad
  finishes.

  <pre class="lang-javascript">
    var adFrame;
    var audio = document.createElement("audio");
    audio.src = "foo.mp3";

    function resetActionHandlers() {
      navigator.mediaSession.setActionHandler("play", _ => audio.play());
      navigator.mediaSession.setActionHandler("pause", _ => audio.pause());
    }

    resetActionHandlers();

    // This method will be called when the page wants to play some ad.
    function pauseAudioAndPlayAd() {
      audio.pause();
      navigator.mediaSession.playbackState = "playing";
      setUpAdFrame();
      adFrame.contentWindow.postMessage("play_ad");
      navigator.mediaSession.setActionHandler("pause", pauseAd);
    }

    function pauseAd() {
      adFrame.contentWindow.postMessage("pause_ad");
      navigator.mediaSession.playbackState = "paused";
      navigator.mediaSession.setActionHandler("play", resumeAd);
    }

    function resumeAd() {
      adFrame.contentWindow.postMessage("resume_ad");
      navigator.mediaSession.playbackState = "playing";
      navigator.mediaSession.setActionHandler("pause", pauseAd);
    }

    window.onmessage = function(e) {
      if (e.data === "ad finished") {
        removeAdFrame();
        navigator.mediaSession.playbackState = "none";
        resetActionHandlers();
      }
    }

    function setUpAdFrame() {
      adFrame = document.createElement("iframe");
      adFrame.src = "https://example.com/ad-iframe.html";
      document.body.appendChild(adFrame);
    }

    function removeAdFrame() {
      adFrame.remove();
    }
  </pre>
</div>

<div class="example" id="example-media-position-state">
  Setting <a>position state</a>:
  <pre class="lang-javascript">
    // Media is loaded, set the duration.
    navigator.mediaSession.setPositionState({
      duration: 60
    });

    // Media starts playing at the beginning.
    navigator.mediaSession.playbackState = "playing";

    // Media starts playing at 2x 10 seconds in.
    navigator.mediaSession.setPositionState({
      duration: 60,
      playbackRate: 2,
      position: 10
    });

    // Media is paused.
    navigator.mediaSession.playbackState = "paused";

    // Media is reset.
    navigator.mediaSession.setPositionState(null);
  </pre>
</div>

<div class="example" id="example-microphone-camera-hangup">
  Using video conferencing actions:
  <pre class="lang-javascript">
    var isMicrophoneActive = false;
    var isCameraActive = false;

    navigator.mediaSession.setMicrophoneActive(isMicrophoneActive);
    navigator.mediaSession.setCameraActive(isCameraActive);

    navigator.mediaSession.setActionHandler("togglemicrophone", function() {
      if (isMicrophoneActive) {
        // Mute the microphone. Implementation omitted.
      } else {
        // Unmute the microphone. Implementation omitted.
      }
      isMicrophoneActive = !isMicrophoneActive;
      navigator.mediaSession.setMicrophoneActive(isMicrophoneActive);
    });

    navigator.mediaSession.setActionHandler("togglecamera", function() {
      if (isCameraActive) {
        // Disable the camera. Implementation omitted.
      } else {
        // Enable the camera. Implementation omitted.
      }
      isCameraActive = !isCameraActive;
      navigator.mediaSession.setCameraActive(isCameraActive);
    });

    navigator.mediaSession.setActionHandler("hangup", function() {
      // End the call. Implementation omitted.
    });
  </pre>
</div>

<div class="example" id="example-presenting-slide-actions">
  Handling presenting slide actions:
  <pre class="lang-javascript">
    var currentSlideIndex = 0;

    navigator.mediaSession.setActionHandler("previousslide", function() {
      currentSlideIndex--;
      // Set current slide. Implementation omitted.
    });

    navigator.mediaSession.setActionHandler("nextslide", function() {
      currentSlideIndex++;
      // Set current slide. Implementation omitted.
    });
  </pre>
</div>

<div class="example" id="example-enterpictureinpicture">
  Handling picture-in-picture:
  <pre class="lang-javascript">
    navigator.mediaSession.setActionHandler("enterpictureinpicture", function() {
      remoteVideo.requestPictureInPicture();
    });
  </pre>
</div>

<div class="example" id="example-enterpictureinpicture">
  Handling voice activity:
  <pre class="lang-javascript">
    // Create a MediaStream with audio enabled.
    const stream = await navigator.mediaDevices.getUserMedia({audio:true});
    const track = stream.getAudioTracks()[0];
    navigator.mediaSession.setActionHandler("voiceactivity", function() {
      if (track.muted) {
        // Show unmute notification. If user allows to unmute, call
        // setMicrophoneActive(true) to unmute.
      }
    });
  </pre>
</div>

<h2 id="acknowledgments" class="no-num">Acknowledgments</h2>

The editors would like to thank Paul Adenot, Jake Archibald, Tab Atkins,
Jonathan Bailey, François Beaufort, Marcos Caceres, Domenic Denicola, Ralph
Giles, Anne van Kesteren, Tobie Langel, Michael Mahemoff, Jer Noble, Elliott
Sprehn, Chris Wilson, and Jörn Zaefferer for their participation in technical
discussions that ultimately made this specification possible.

Special thanks go to Philip Jägenstedt and David Vest for their help in
designing every aspect of media sessions and for their seemingly infinite
patience in working through the initial design issues; Jer Noble for his help in
building a model that also works well within the iOS audio focus model; and
Mounir Lamouri and Anton Vayvod for their early involvement, feedback and
support in making this specification happen.