-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should WebCodecs be exposed in Window environments? #211
Comments
@chcunningham, can you describe the usecases you think will benefit from using WebCodecs in Window? |
In the previous issue I mentioned the use cases as:
And I later clarified:
I used the word "boutique" to suggest that such uses need not fit into one of our well known categories. The web is a vibrant surprising place (and my own creativity is pretty limited). Can we agree that such uses will exist and they may meet one or both of the scenarios I gave above? I didn't intend to say MSE low latency is in this camp. MSE low latency has lots of codec io and many sites in that category will have a busy main thread. Let me grab a few other snips from that issue since we're moving here...
|
I see it as just a control thread, to me, it's fine. I don't expect folks to do processing on the media, but I do expect developers to use this API in conjunction with the Web Audio API, WebGL or Canvas. |
I do not have any issue with Window being the control thread. The current proposal is to surface media data in the control thread. This somehow conflates control and media threads. With @chcunningham, we agreed that using main thread as the output data thread is a potential issue:
If we look at the WebCodecs API surface, it is easy to write shims that expose codec APIs to window environments. |
The goal of the control vs codec thread separation was to ensure that implementers don't perform actual encoding/decoding work on the main thread. We can maintain that separation while still receiving outputs on the main thread. The actual threads used under the hood aren't what we intend to describe. For example, in Chromium the VideoToolbox APIs are invoked in a completely different sandboxed process from where the web page is rendered. And, in that separate process, we expect that actually many threads are used.
I don't agree that the main thread inherently creates a memory issue. The nuance from the other issue is important. There, you wrote:
I agree* with the above statement, irrespective of the whether the thread is the main window thread vs the main dedicated worker thread. * nit: depending on the implementation, it may be more of a performance problem rather than a memory problem. For Chromium, I think we have a finite pool of frames to allocate to a camera stream. If users fail to release (close()) the frames back to us, the stream simply stalls. |
Do you agree though that, on a worker thread, spinning the controlling thread is most certainly a bug that the web application can (and should fix)? But that this is not the case on main thread (i.e. some code outside of the control of the web application may randomly spin the web app controlling thread). So far, no use case has been brought that justifies taking this risk.
That is a very good point we should discuss. |
Generally yes.
I replied to this point in the previous issue.
To emphasize, I expect some sites will offer very focused experiences, free of third party scripts, ads, etc, and for which the window main thread plenty available.
I offered scenarios in the comments above. In these scenarios, there is no memory/perf risk. Why is this not sufficient? Maybe a concrete example would help. Imagine a simple site that streams a feed from a single security camera. This need not be a flashy big name site. It might even be someones hobby project. The UI for this site is just a box with the video feed. The site has no adds, no third party scripts. It's main thread is doing very little. There is ample room for the main thread manage fetching, codec io, and rendering of the video. |
This is not really a use case for exposing to main thread, this is more a scenario where the issues I am pointing out may not happen (but see below). First, the JS shim works equally well and does not bring any major drawback AFAIK. Do you agree? Also, if the web site wants frame access, this is probably to do some fancy processing on each frame.
Why not using a MediaStreamTrack directly created by the UA from the decoder output then?
Are you suggesting to restrict exposure of WebCodecs on window environments to only those safe cases? In any case, let's say as a user I sent an email containing that website URL to a friend and the user clicks on the link from a web mail (say gmail). Depending on the web mail, the website and the UA, the website may be opened in the same process as the web mail, or in a process with other pages. This might be especially true on low-hand devices as a UA decision to save memory. To reliably select whether using a worker or not, a web developer will need to understand a lot of things and do extensive research. In practice, it will be difficult to get any guarantee across User Agents, OSes and devices. Exposing WebCodecs solely to workers is a good hint to web developers that they should do their processing in workers. |
I disagree with the premise that it's only correct to use WebCodecs in a worker even in the case the main thread is contended. Offloading WebCodecs use can improve latency, but you can still get full WebCodecs throughput when controlling it from a contended thread. Low-latency use cases are not the only use cases that WebCodecs is intended for (if they were, we wouldn't have an input queue). WebCodecs allows for low-latency use but I do not think we should require apps to use it that way.
Having to have a worker context around to be able to use the Chrome JS Console to experiment with WebCodecs would substantially frustrate my learning and debugging. And I'm an expert at this!
Main thread contention across sites is to me a quality of UA issue, and is substantially improved in recent history due to the widespread adoption of site isolation. Blink is currently experimenting with multiple render threads in a single process, which has the potential to resolve the remaining cases.
I think it's generally understood that the most direct solution to main thread contention is worker offload, so any sites that bother to collect performance metrics won't have any confusion here. |
For my projects, I can't think of a use case where I wouldn't use WebCodecs from Window. I'm manipulating audio/video, with the video and canvas objects actually displayed on the page. Shuffling all that data off to a worker thread, just to then have it shuffled off to a codec on some other user-agent internal process or thread seems like unnecessary overhead, and is definitely a hassle for the developer. I wholeheartedly agree with @chcunningham that there will be other use cases not imagined here. |
Unless TAG or another standards group has concluded that certain classes of APIs must be limited to worker contexts, I think the phrasing of the initial question is inverted. I.e., we should instead be discussing why wouldn't we expose this on window. We shouldn't apply restrictions without reason. The only reason I can think is that we want to limit the ability of users to shoot themselves in the foot under certain specific low latency scenarios. While the reasons against seem multitude -- especially in the pain it would cause for common use cases and how it would impact first frame latency. |
The driving use-cases I heard are low-latency MSE and WebRTC-like stacks.
This seems like a usability issue. I fear that the same principle will end up pages using WebCodecs in main thread while they should not. WebCodec examples I saw are most of them main thread only even though they are dealing with real time data.
These are all good points that might solve the issues I described. It is great to see this coming. It is always easy to extend an API to Window environment in the future. |
That is interesting. Can you provide pointers to your applications? In WebRTC, we went to a model where APIs are Window only/mostly but do not give the lowest granularity. |
Low latency scenario is one such example. Again, I am hopeful this can be solved. |
I agree buffer pools are an issue, but I feel that's orthogonal to window vs worker for a few reasons:
I have trouble following this logic. I don't agree that limiting to a worker covers the main use cases. We'll certainly query all the developers in our OT though for feedback. Limiting to a worker will be detrimental to high frame rate rendering (transferControlToOffscreen would help this -- but is Chromium only) and time to first frame. Especially for single-frame media such a limit would dominate the total cost. |
Youenn said:
Pool exhaustion seems most likely to be caused by a memory leak (e.g. VideoFrame.close() not being called), possibly in conjuction with use of a downstream API for rendering (e.g. Canvas, MSTGenerator, WebGL or WebGPU). It seems that this problem can occur regardless of whether WebCodecs or other related APIs runs in a Worker. So to address pool exhaustion concerns, you'd probably want to require related APIs to automatically free buffers. |
We sent an email to all participants in the Chrome origin trial. On the question of "should we keep WebCodecs in Window scope", the tally was 10 in favor, 6 neutral, 1 ambivalent, 1 opposed. Overall I think this shows a compelling case for maintaining window-exposed interfaces. Breakdown below. Click the summaries below to see reply excerpts. The opposed response argues for forcing developers into a pattern that frees the main thread.@willmorgan of iproov.com wrote:
But many apps found no performance reason to use workers and highlight that workers creates additional complexity.@koush of vysor.io wrote:
@etiennealbert of jitter.video wrote:
@AshleyScirra of scirra.com wrote:
@BenV of bash.video wrote:
@bonmotbot of Google Docs wrote:
Also, some apps desire to use WC in combination with other APIs that are only Window-exposed.Mentioned examples include: Canvas, RTCDataChannel, WebAudio, input (e.g touch) events. Forcing apps to use DedicatedWorkers adds complexity to code that needs other Window-only APIs.
From the performance angle, Canvas is of particular note. It is the most common path for apps to paint VideoFrames. OffscreenCanvas is not yet shipped in Safari and Firefox, which means no way to paint directly from a worker. OffscreenCanvas may eventually ship everywhere, but its absence now adds complexity to using WebCodecs from workers and removes the theoretical performance benefit. @surma of squoosh.app (Google) wrote:
Aside: the use of Canvas above is not unique to ImageDecoder. @AshleyScirra of scirra.com wrote:
@jamespearce2006 of grassvalley.com wrote:
Finally, even for apps that use WC in workers, Window-interfaces are useful for synchronous feature detection.@BenV of bash.video wrote:
|
There was some interesting feedback in particular from 'many apps found no performance reason' at last W3C WebRTC WG meeting. @aboba, would it be possible to have the web developer feedback here? |
This is a fair concern. Do we know which APIs are missing in workers? |
More and more feature detection are done asynchronously, for instance listing existing codec capabilities. |
Oh, I now see the list. About RTCDataChannel, it is now exposed in workers in Safari and there is a PR for it. |
OffscreenCanvas is not widely supported, so it can only be used as a progressive enhancement. But I envision Web Codecs to also be useful with other, main-thread-only APIs like CSS Paint API or WebUSB. |
@youennf, each of those statements in my above comment is a clickable zippy that expands to show the developer feedback. LMK if this isn't what you meant. |
@youennf, does the web developer feedback above persuade you to maintain window exposure for WebCodecs? |
@surma That's how Vysor already works today (https://app.vysor.io/). |
The above two descriptions of Media application developers appear in opposition. I think the latter is correct. Yes, media libraries are going to have to think about threading. Yes, the developer experience is going to be harder with workers than with exposure on main thread. The problem is none of those arguments seem contained to non-realtime use cases. Instead, they highlight the path of least resistance. This makes me more concerned, not less, that if we expose to both main thread and worker, then some realtime applications may never be written correctly (on a worker). And a chorus of end users will blame individual browsers for the sub-par experience. Even well-read and well-guided media application developers have bosses. Designing an app's media threading model to use workers, may not be something a few such devs will succeed at pushing for on their own (because of short-term costs). We have an opportunity to help them push for doing it right, and help end-users have better experiences, by making the right option the default option. This is what we're here to do. |
One of the main arguments I've heard for deferring decision is a lack of use cases that require Window exposure. As somebody in the discussion said, it's likely any use case could be made to work in Worker context - even if that means transferring data between Window and Worker. This would mean that a decision to defer at this time leaves us without good criteria to later decide to expose in Window. This makes me concerned that a decision to defer becomes a decision we cannot re-evaluate later. What new information would you be looking for? |
As a consumer of WebCodecs, if it becomes worker only, I'll need to write a shim to expose it to window (or pass the data to worker), because WebUSB (the data source) is only available in window. |
+1 to @chrisn. I was just typing out the same point. @jan-ivar it's hard to reconcile:
Given the copious amount of feedback you've received from both inside and outside the work group indicating a worker restriction is problematic for performance and developer experience reasons. Combined with statements like:
We are unsure what criteria would ever satisfy you. I.e., you seem hyper-focused on real-time use cases to a point that precludes discussion of other use cases and the performance costs of the shim (~2.68x memory usage for a toy example; 51mb -> 135mb!). Can you provide any criteria which would ever change your mind?
Why is this outcome any more likely than those same developers just using a shim and resulting in an even worse experience? |
I believe the decision to reevaluate rests with the chairs. A deferral (unlike a no decision) might not technically even require new information to reopen (but check that). Would people feel better if we scheduled to revisit it, say a year from now? A year from now, I'd expect there would be production sites to look at and measure, and even more widespread support for media sources and sinks in workers across browsers. If we find key use cases that are hurting, we can weigh the pros and cons of exposure then. We'll be in a better position to decide at that time than now. In contrast, if we expose to main thread now, and a year from now we find this was a mistake, we won't be able to change it. |
Mozilla considers WebUSB harmful, so this use case is not compelling to us. |
On the topic to defer the decision: I can see that it’s an attractive decision from a standards point, but — in my opinion — it’s at the cost of developers. Developers are already struggling with a lumpy web platform where some UAs support an API and others don’t. The resulting feature detection techniques and progressive enhancements are rather complex. If we now have to extend feature detection from yes/no to yes/no/yes-but-not-on-this-thread, I worry about the impact.
Is there precedent that exposing an API in more places (in scope A in addition to scope B) has ever been considered a mistake? How did the mistake manifest? (Note that this is different from exposing an API in scope A instead of scope B.) |
The WebCodecs argument seems to apply equally to WebSockets:
So was it a mistake to expose WebSockets on window? Surely not. Yes, some people will choose wrong, and do real-time-critical WebSockets on the main thread. That's a shame. But some people will jump through hoops to choose wrong, no matter what the API design is. To me, this applies equally to WebCodecs. It's not the place of API design to force decisions on developers who might have good reasons to choose differently. |
I sense agreement against Balkanizing WebCodecs support, and I hope it doesn't come to that. I'm going to ignore the WebSockets strawman and other lateral comparisons of pattern, since they ignore context. The context is that:
In this context, the cautious and sensible approach to preserve the quality of end-user experiences across user agents and devices is to consider JS exposure to workers first, because this is closest to the environment we have today. Hopefully, no-one is surprised to hear that all browsers fire up background threads to handle media today. There's precedence here with ScriptProcessorNode, and we'd never have AudioWorklet today if we'd applied some of the criteria cited here about things being "unprecedented". This API is unprecedented.
Sorry, but that's not so. There are many examples of this, like permissions. Nevermind that malicious web developers may also think they have good reasons. End-user experiences trump developer convenience in the priority of constintuencies. I'm also having trouble reconciling the opposition with the few use cases that would seem to genuinely be affected. If so many people can't get excited about this wonderful new API in workers, it makes me wonder how many were planning on using workers in the first place. |
@jan-ivar you are conflating rendering and buffering. WebCodecs is not a rendering API, its decoder outputs may be used in rendering, but it is not a rendering API itself. As you know the WebCodecs encoding and decoding pipelines are already detached from the main thread. What you're trying to control by forcing worker only exposure for WebCodecs is where developers create their rendering pipelines. I.e., you're indirectly trying to use WebCodecs to force developers to operate their rendering pipelines in an OffscreenCanvas. It seems like you would be better served arguing for a worker restriction on WebGPU and OffscreenCanvas. |
I thought that is the case, too! If it is, what is the reason to force developers to create yet another thread? (Apologies if this has been answered before, this thread has become... long 😅 ) |
For what it's worth, given the above discussion and points raised, I'd like to revise my original position of being anti-main-thread. I think they should be exposed on the main thread. |
@dalecurtis this API isn't limited to Canvas and OffscreenCanvas.
It's not just rendering. Critical realtime use case pipelines are going to be capture + send, and receive + playback:
Chrome intends to ship APIs to do this, so this isn't theoretical. These nonstandard APIs are riding along in origin trials of WebCodecs. Touching main thread from these pipelines would block on main thread, a significant regression from status quo. Just because they may work on high-end devices in user agents affording one process per tab, doesn't mean our concerns over web compat and real-world jank aren't valid. Looking at WebCodecs in a vacuum doesn't get it off the hook, because in all use cases, it will be part of some media pipeline, with inherent source/sink problems that are transitive, such as buffer exhaustion. Lower-end devices with smaller CPU caches / smaller number of buffers they can keep around in GPU memory, may suffer if we expose to unsuitable environments. |
You can s/rendering/processing/ in my comment and my point remains the same: you're indirectly using WebCodecs to force an outcome upon developers. You could just as well be arguing that WebTransport, WebML, WebGPU, OffscreenCanvas, etc should be limited to worker only and your arguments are interchangeable. WebCodecs is just a convenient scapegoat. E.g., In both of your examples, limiting the MST processor and generator to a worker would achieve the same results. Your last point also ignores CFC feedback and test data showing that low end devices sometimes suffer more with workers due to their constrained core counts. |
@dalecurtis It seems to me that restrictions (if they are to be imposed at all) should be imposed on the APIs where there is a demonstrated problem. Restricting WebCodecs or MST processor/generator to a worker in order to address a problem in WebML does not make sense to me. That's like looking for your keys under a lamp post because the light is better there. |
@dalecurtis That's w3c/mediacapture-transform#23, but as you can see, we're seeing similar resistance there. But one can also compose near-realtime media pipelines around it. E.g.
Both WebCodecs and MST+ expose raw video frames for manipulation in JS, and should be limited to workers.
@aboba WebML?? Oh, because I said "ML in JS?" Sorry, I meant that as a stand-in for any kind of processing or analysis including plain JS/WASM bit manipulation, be it face-tracking, compositing participants together, or lip/mood reading for accessibility. But this confusion proves my point: APIs like WebTransport or WebML can be used on all sorts of data, and make no sense to restrict. It's specifically chewing on raw video data on the main-thread we want to restrict, so we restrict the access APIs (WebCodecs & MST+) that expose that data. |
You've included WebCodecs to make your point, but your point is the same even if we remove WebCodecs. By your arguments you wouldn't want "WebTransport -> Processing -> MSE" or "OffscreenCanvas -> WebTransport" to be done on window. I.e., you're against any high bandwidth / expensive processing on window that could lead to a poor user experience. |
Specifically (near) realtime raw video manipulation and framerate drop. E.g. I'm not opposed to WebTransport -> decrypt -> MSE which operates on (much less) encoded data. |
Complete nit: The first pipeline should be
* worker: WebTransport -> WC decode -> composite participants together in
JS -> (transfer either before or after) -> MTSG -> video element.
No need to involve either encode or MSE.
…On Wed, Jul 28, 2021 at 11:17 PM Jan-Ivar Bruaroey ***@***.***> wrote:
In both of your examples, limiting the MST processor and generator to a
worker would achieve the same results.
@dalecurtis <https://github.com/dalecurtis> That's
w3c/mediacapture-transform#23
<w3c/mediacapture-transform#23>, but as you can
see, we're seeing similar resistance there.
But one can also compose near-realtime media pipelines around it. E.g.
- worker: WebTransport → WC decode → composite participants together
in JS → WC encode → MSE
- worker: OffscreenCanvas → WC encode → WebTransport
Both WebCodecs and MST+ expose raw video frames for manipulation in JS,
and should be limited to workers.
Restricting WebCodecs or MST processor/generator to a worker in order to
address a problem in WebML does not make sense to me
@aboba <https://github.com/aboba> WebML?? Oh, because I said "ML in JS?"
Sorry, I meant that as a stand-in for any kind of processing or analysis
including plain JS/WASM bit manipulation, be it face-tracking, compositing
participants together, or lip/mood reading for accessibility.
But this confusion proves my point: APIs like WebTransport or WebML can be
used on all sorts of data, and make no sense to restrict. It's specifically
chewing on raw video data we want to restrict, so we restrict their access
APIs (WebCodecs & MST+) that expose that data.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#211 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADVM7NJC7YQB277YKRK5IDT2BXWTANCNFSM4343POLA>
.
|
Yes, I mentioned that already. I was contriving a case without MSTP/MSTG to counter the claim we could just limit MSTP/MSTG. |
WebGL and canvas regularly operate at >= 60fps on the main thread without any queuing, so I assume they meet your definition for (near) real-time "video" manipulation. If your position is that those APIs shouldn't have shipped on window, I'll drop this line of reasoning. If it isn't, then I don't think your position on a worker-only restriction for WebCodecs is fully considered. My point with the prior dialog is that WebCodecs isn't a source of main thread contention - it's fully detached from the main thread. Performance loss can only occur based on how the API is given input and where its outputs are processed. E.g., a user on a canvas drawing site with a busy main thread feeding a WebCodecs encoder in a worker will suffer in experience regardless of a worker restriction - possibly more. I recognize that your goal with a worker restriction is to force developers to move their entire pipelines (potentially at the cost of web compatibility for the next N years) into a worker to avoid these issues, but lets be clear that it's an indirect mechanism without precedent for having the effect you're trying to achieve. It's certainly not clear to me why you think developers won't just workaround this limitation instead of embracing it (indeed, developers in this very thread have said they will just do that). |
Given the lack of support for even OffscreenCanvas, I think requiring Workers here is over-ambitious. |
Group feedback:The precise recommendation is to optimize for developer experience - which in this case is to allow on the main thread. We did discuss this in a call and the overall consensus (albeit not unanimous) was to allow usage on Window. We took a similar position for the autoplay API, where our recommendation was to do sync for developer convenience. Personal feedback:I believe for scenarios where main thread jank is an issue the developers would make good judgement and move it to worker. I don't think it is up to us to dictate their choice. Finally, poorly optimized apps which jank will jank regardless of this API being there or not, since poor development practices are rarely isolated to media-specific calls. If one wants to protect the main thread from compute intensive APIs, maybe we should look into heavy compute as a thing we should gate behind a permission? |
If we step back a little, maybe we can get consensus on a few principles that might guide the design and shipping of the API.
|
I have just posted a response from the Media WG chairs to the mailing list. |
Thanks @chrisn, @jernoble, and @cynthia (on behalf of TAG) for your consideration on this long discussion. From the response:
I'll begin working on a PR to add some non-normative text around this (inline with the suggestion from #211 (comment)) to the specification and instruct the team to update the documentation and examples we've produced so far. |
On Thu, Jul 29, 2021 at 10:13 AM youennf ***@***.***> wrote:
If we step back a little, maybe we can get consensus on a few principles
that might guide the design and shipping of the API.
Do we agree on the following?
1. Applications processing realtime raw video (or realtime raw audio)
should do processing (including WebCodecs) in a worker context
No. The fact that the source is realtime tells you exactly nothing about
whether the processing needs to be realtime or not.
And we can't gate API access on the purpose that is in the programmer's
mind when he writes his code.
…
1. Applications handling non-realtime raw video (or audio) may use
WebCodecs in a window context
I'm OK with this.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#211 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADVM7MGB4TOVDKUYWAX6ULT2EERRANCNFSM4343POLA>
.
|
As a preliminary to #199, let's discuss whether it is useful to expose WebCodecs in Window environment.
See the good discussions in #199 for context.
The text was updated successfully, but these errors were encountered: