forked from w3c/webrtc-nv-use-cases
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
374 lines (370 loc) · 20.2 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
<!DOCTYPE html>
<html lang="en" dir="ltr">
<head>
<meta name="generator" content="HTML Tidy for HTML5 for Linux/x86 version 5.3.9">
<meta charset="utf-8">
<link href="webrtc.css" rel="stylesheet">
<title>WebRTC Next Version Use Cases</title>
<script class="remove" src="respec-w3c-common.js" type="text/javascript"></script>
<script src="respec-config.js" class="remove"></script>
</head>
<body>
<section id="abstract">
<p>This document describes a set of use cases motivating the development of
WebRTC Next Version (WebRTC-NV), as well as the requirements derived from
those use cases.</p>
</section>
<section id="sotd"></section>
<section id="overview*">
<h2>Scope and Motivation</h2>
<p>To motivate the development of WebRTC 1.0, the IETF RTCWEB WG developed [[RFC7478]].
This document describes use cases motivating the development of
"WebRTC Next Version" (WebRTC-NV), and the requirements deriving from those use cases.
The use cases fall into one of two categories:
enhancements to use cases already covered in [[RFC7478]], and
new use cases currently not implementable in WebRTC 1.0.</p>
<section id="conformance">
</section>
</section>
<section id="existingusecases*">
<h2>Existing Use Cases</h2>
<p>The uses cases in this section improve upon use cases described in [[RFC7478]].</p>
<section id="multipartygame*">
<h3>Multiparty online game with voice communications</h3>
<p>[[RFC7478]] Section 2.3.12 describes a use case involving a multiparty online game with voice communications.
In these scenarios, reducing time to join the game and receive media is important. To minimize this,
ICE enhancements are desirable, such as
the ability to control candidate gathering and pruning. Also, “parallel forking” minimizes conference
establishment time by allowing a participant to broadcast an Offer to a “room” abstraction (maintained on a
server), with other room participants responding back directly to the Offerer, avoiding a separate discovery step.
It is desirable to allow media to be received from responders before the initiator receives an answer.
Also, the ability to impose a bandwidth limit across all mesh endpoints limits the build up of queues that can
affect audio quality or perceived latency in the game. Supporting this enhancement adds the following requirements:</p>
<pre class="highlight">
----------------------------------------------------------------
REQ-ID DESCRIPTION
----------------------------------------------------------------
N01 The user agent can control candidate gathering
and pruning, limiting the networks on which
candidates are gathered, the types of candidates,
etc.
N02 The user agent must be able to support parallel
forking, including reuse of local ICE candidates
and local certificates with multiple Answerers.
N03 The user agent must be able to impose a bandwidth
limit across mesh endpoints.
N04 Early media must be supported.
</pre>
<p>Experience: This use case has been implemented by a gaming service utilizing [[ORTC]].</p>
References:
<ol>
<li><a href="https://github.com/w3c/ortc/issues/54">ORTC Issue 54</a></li>
<li><a href="https://github.com/w3c/ortc/issues/603">ORTC Issue 603</a></li>
</ol>
</section>
<section id="mobility*">
<h3>Mobility</h3>
<p>[[RFC7478]] Section 2.3.6 describes a simple communications service where the user changes access network
during the session. This use case is enhanced by being able to re-route media over an alternate path
(potentially taking network cost into account) without need for signaling.</p>
<pre class="highlight">
----------------------------------------------------------------
REQ-ID DESCRIPTION
----------------------------------------------------------------
N05 The ICE agent must be able to maintain multiple
candidate pairs and move traffic between them.
N06 The ICE agent must be able to take the network
cost into account when considering re-routing.
</pre>
References:
<ol>
<li><a href="https://lists.w3.org/Archives/Public/public-webrtc/2018May/0079.html">Mailing list proposal</a></li>
<li><a href="https://lists.w3.org/Archives/Public/public-webrtc/2018May/0019.html">Mailing list proposal</a></li>
<li><a href="https://github.com/w3c/ortc/issues/583">ORTC Issue 583</a></li>
</ol>
</section>
<section id="videoconferencing*">
<h3>Video Conferencing with a Central Server</h3>
<p>[[RFC7478]] Section 2.4.3.1 describes a use case involving Multiparty Video Communications with
a central conferencing server. In such a use case, clients with disparate capabilities such as
differing bandwidth availability,
screen size and maximum displayable frame rate may participate in the same conference.
In such a situation it is advantageous to support Scalable Video Coding (SVC).
Encoding with temporal scalability is supported by several browsers today and is utilized by most
centralized conferencing services.</p>
<p>It is expected that spatial scalability (supported by VP9 and AV1) will become more popular with time.
In this use case, if the desired video codec is known beforehand and participants are muted by default
(as in a very large meeting), it is desirable to allow new participants to start receiving immediately,
without negotiation. Supporting this enhancement adds the following requirements:</p>
<pre class="highlight">
----------------------------------------------------------------
REQ-ID DESCRIPTION
----------------------------------------------------------------
N07 The user agent must be able to encode and decode
video utilizing temporal scalability and (if
supported by the chosen codec) spatial scalability.
N08 A user agent can receive audio/video without a
corresponding sender.
N09 It is possible to select the sending and/or
receiving codec without negotiation.
N10 The user agent must be able to control
robustness (RTX, RED, FEC) applied to individual
simulcast and SVC layers.
</pre>
<p>This use case has been implemented by conferencing services utilizing [[ORTC]],
as well as proprietary additions to [[WEBRTC10]].</p>
</section>
</section>
<section id="newusecases*">
<h2>New Use Cases</h2>
<p>Several new uses cases relate to scenarios that cannot be supported in [[WEBRTC10]].</p>
<section id="filesharing*">
<h3>File Sharing</h3>
<p>Participants in a mesh exchange large files without disruption to audio/video sessions.
It is also possible for a participant to send a large file to a user who is not currently online.
Supporting this use case adds the following requirements:</p>
<pre class="highlight">
----------------------------------------------------------------
REQ-ID DESCRIPTION
----------------------------------------------------------------
N11 It must be possible for the user agent to
transfer large files as a single message.
N12 The application must be able to signal backpressure
(flow control) when receiving data. It must also
receive a backpressure signal when sending data.
N13 It must be possible for the user agent to transfer
data utilizing a congestion control algorithm
that does not compete aggressively with
audio/video communications.
N14 It must be possible for the file exchange to
be supported by servers as well as user agents.
N15 It must be possible to support data exchange
in a worker.
</pre>
References:
<ol>
<li><a href="https://lists.w3.org/Archives/Public/public-webrtc/2018May/0079.html">Mailing list discussion</a></li>
<li><a href="https://lists.w3.org/Archives/Public/public-webrtc/2018May/0082.html">Mailing list discussion</a></li>
</ol>
</section>
<section id="iot*">
<h3>Internet of Things</h3>
<p>An IoT sensor maintains a long-term connection and seeks to minimize power consumption.
Some of the sensor’s data may need to be sent reliable and ordered while other sensors may
provide data that can be sent unreliable and unordered or in a partially reliable manner.
This use case adds the following requirements:</p>
<pre class="highlight">
----------------------------------------------------------------
REQ-ID DESCRIPTION
----------------------------------------------------------------
N16 The application must be able to minimize ICE
connectivity checks.
N17 The application must be able to control aspects
of the data transport (e.g. set the SCTP
heartbeat interval or turn it off), RTO values,
etc.
N18 It must be possible to send arbitrary data
reliable, unreliable or partially reliable with
a specific maximum number of retransmissions
or a specific maximum timeout.
N19 It must be possible to send arbitrary data
ordered or unordered.
</pre>
Reference:
<a href="https://lists.w3.org/Archives/Public/public-webrtc/2018May/0079.html">Mailing list discussion</a>
</section>
<section id="funnyhats*">
<h3>Funny Hats</h3>
<p>A communications service that manipulates captured media prior to encoding and after
decoding to provide effects including:</p>
<ol>
<li>Funny hats</li>
<li>Background removal or blurring</li>
<li>In-browser compositing</li>
<li>Voice effects</li>
<li>Stress detection</li>
</ol>
<p>This use case adds the following requirements:</p>
<pre class="highlight">
----------------------------------------------------------------
REQ-ID DESCRIPTION
----------------------------------------------------------------
N20 The application must be able to obtain raw media
from the capture device.
N21 The application must be able to insert processed
frames into the outgoing media path.
N22 The application must be able to obtain decoded
media from the remote party.
N23 It must be possible to efficiently share media
between the main thread and worker threads.
N24 It must be possible to do efficient media
manipulation in worker threads.
</pre>
References:
<ol>
<li><a href="https://lists.w3.org/Archives/Public/public-webrtc/2018May/0037.html">Mailing list discussion</a></li>
<li><a href="https://lists.w3.org/Archives/Public/public-webrtc/2018Jun/0006.html">Mailing list discussion</a></li>
<li><a href="https://ai.googleblog.com/2016/11/enhance-raisr-sharp-images-with-machine.html">Sharper Image Research</a></li>
</ol>
</section>
<section id="machinelearning*">
<h3>Machine Learning</h3>
<p>In a web game called “NameTheBird.com” participants use their devices to provide audio and video observations of
birds to the service along with identifications for training purposes, allowing the service to identify birds from
the provided audio and video and returning this information to the users in real-time.</p>
<p>The web application has a site specific federated learning-based classifier for contextual object detection,
user intent prediction and media manipulation, allowing it to augment the streams it receives and inject identifying
or other supplemental information into the streams sent or received.</p>
<p>The shared classification models are trained on the birds found by the participants and are based on the feedback
of the participants. Each device client updates of the model are up-streamed to a shared model server that pushes
updates of the global model to the clients.</p>
<p>Implementation outline:</p>
<ol>
<li>Originating media (raw) streams are cloned for inference and training purposes, denoted “inference stream” and
“training stream”, with the inference stream also being the media stream shared with peer(s). The cloning can occur
any time during a session.</li>
<li>Inference stream: A web site specific classifier acts on the raw inference stream, with the result used to
guide a custom encoder in the sender device and send metadata to the server and peer devices outside the media stream.
The encoder adds proper augmentation, e.g. sign with “name this bird” hovering over the enlarged bird in case of video
enrichment, or enhanced bird song if audio.</li>
<li>Training stream: Model in training classifies the raw data and evaluate the classification using user feedback,
said feedback loop being web site specific. The evaluation may be “online” or “offline”, offline meaning the training
is done at a later stage on the recorded encoded media set.</li>
<li>Both inference stream and training streams may use payload protection depending on trust model on compute
resources for optional intermedia server side of app.</li>
<li>Both inference stream and training streams use transport object for communicating with peers or servers,
the communication in some cases can be a site specific QUIC based transport solution, in others RTP based.</li>
</ol>
<p>This use case adds the following requirements:</p>
<pre class="highlight">
----------------------------------------------------------------
REQ-ID DESCRIPTION
----------------------------------------------------------------
N20 The application must be able to obtain raw media
from the capture device.
N21 The application must be able to insert processed
frames into the outgoing media path.
N22 The application must be able to obtain decoded
media from the remote party.
N23 It must be possible to efficiently share media
between the main thread and worker threads.
N24 It must be possible to do efficient media
manipulation in worker threads.
</pre>
</section>
<section id="vr*">
<h3>Virtual Reality Gaming</h3>
<p>A virtual reality gaming service utilizing a centralized conferencing server wants
to synchronize data with media, using an existing Selective Forwarding Unit (SFU)
to distribute the data. This use case adds the following requirements:</p>
<pre class="highlight">
----------------------------------------------------------------
REQ-ID DESCRIPTION
----------------------------------------------------------------
N25 The user agent must be able to send data synchronized
with audio and video.
</pre>
References:
<a href="https://lists.w3.org/Archives/Public/public-webrtc/2018May/0063.html">Mailing list discussion</a>
</section>
<section id="securecommunications*">
<h3>Secure Communications</h3>
<p>A service that supports end-to-end confidentiality of communications in mesh and
centralized conferencing scenarios. While the service provides the Javascript
application which is considered trusted, backend services such as the SFU are considered
untrusted. This use cases adds the following requirements:</p>
<pre class="highlight">
----------------------------------------------------------------
REQ-ID DESCRIPTION
----------------------------------------------------------------
N26 The application must be able to enable end-to-end
payload confidentiality and integrity protection.
N27 The browser must be able to obtain e2e keying
keying material so as to enable content to be
rendered.
N28 TBD: restrictions on the application so as to
prevent unauthorized recording of the session.
</pre>
</section>
<section id="requirements*">
<h3>Requirements</h3>
<p>This section lists the requirements arising from
the use-cases catalogued in this document.</p>
<section id="requirements*">
<h3>Requirements</h3>
<pre class="highlight">
----------------------------------------------------------------
REQ-ID DESCRIPTION
---------------------------------------------------------------
N01 The user agent can control candidate gathering
and pruning, limiting the networks on which
candidates are gathered, the types of candidates,
etc.
N02 The user agent must be able to support parallel
forking, including reuse of local ICE candidates
and local certificates with multiple Answerers.
N03 The user agent must be able to impose a bandwidth
limit across mesh endpoints.
N04 Early media must be supported.
N05 The ICE agent must be able to maintain multiple
candidate pairs and move traffic between them.
N06 The ICE agent must be able to take the network
cost into account when considering re-routing.
N07 The user agent must be able to encode and decode
video utilizing temporal scalability and (if
supported by the chosen codec) spatial scalability.
N08 A user agent can receive audio/video without a
corresponding sender.
N09 It is possible to select the sending and/or
receiving codec without negotiation.
N10 The user agent must be able to control
robustness (RTX, RED, FEC) applied to individual
simulcast and SVC layers.
N11 It must be possible for the user agent to
transfer large files as a single message.
N12 The application must be able to signal backpressure
(flow control) when receiving data. It must also
receive a backpressure signal when sending data.
N13 It must be possible for the user agent to transfer
data utilizing a congestion control algorithm
that does not compete aggressively with
audio/video communications.
N14 It must be possible for the file exchange to
be supported by servers as well as user agents.
N15 It must be possible to support data exchange
in a worker.
N16 The application must be able to minimize ICE
connectivity checks.
N17 The application must be able to control aspects
of the data transport (e.g. set the SCTP
heartbeat interval or turn it off), RTO values,
etc.
N18 It must be possible to send arbitrary data
reliable, unreliable or partially reliable with
a specific maximum number of retransmissions
or a specific maximum timeout.
N19 It must be possible to send arbitrary data
ordered or unordered.
N20 The application must be able to obtain raw media
from the capture device.
N21 The application must be able to insert processed
frames into the outgoing media path.
N22 The application must be able to obtain decoded
media from the remote party.
N23 It must be possible to efficiently share media
between the main thread and worker threads.
N24 It must be possible to do efficient media
manipulation in worker threads.
N25 The user agent must be able to send data synchronized
with audio.
N26 The application must be able to enable end-to-end
payload confidentiality and integrity protection.
N27 The browser must be able to obtain e2e keying
keying material so as to enable content to be
rendered.
N28 TBD: restrictions on the application so as to
prevent unauthorized recording of the session.
</pre>
</section>
</section>
</section>