Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We shouldn't require track transferability #113

Open
guidou opened this issue Sep 25, 2024 · 33 comments
Open

We shouldn't require track transferability #113

guidou opened this issue Sep 25, 2024 · 33 comments

Comments

@guidou
Copy link
Contributor

guidou commented Sep 25, 2024

The current version of the API requires track transferability, but this shouldn't be necessary.
Currently, tracks are useless on workers except for this API, so we shouldn't add that as a requirement.

A way to keep the API worker first which has several benefits is to follow the postMessage-like approach of webrtc-encoded-transform.

Something (subject to discussion) like:

For MediaStreamTrackProcessor:

// main
navigator.mediaDevices.createTrackProcessor(myWorker, mytrack, myOptions, [myOptions]);`

// worker
ontrackprocessor = event => {
    let processor = event.processor;
   // Process frames using processor. `event.options` has the data sent via myOptions for extra configuration. 
}

For VideoTrackGenerator:

// main
let generatedTrack = navigator.mediaDevices.createVideoTrackGenerator(myWorker, myOptions, [myOptions]);`

// worker
ontvideorackgenerator = event => {
    let generator = event.generator;
   // generate frames for `generatedTrack`. `event.options` has the data sent via myOptions for extra configuration. 
}
@guidou
Copy link
Contributor Author

guidou commented Sep 25, 2024

cc @jan-ivar @youennf

@jan-ivar
Copy link
Member

What problem is this solving? Transferring the track has other benefits like being able to apply constraints and read track stats and settings.

@guidou
Copy link
Contributor Author

guidou commented Sep 26, 2024

It solves the problem that you don't need track transferability to implement this API, which we consider a blocker, at least for the medium term.
Also, the benefits of transferring a track to a worker are very limited. In fact, I would argue that the only benefit would be using this API.
You can call applyConstraints and read stats/settings on Window, where the main application is.

This proposal uses a pattern that we are already using in encoded transform and should easily allow us to have interoperable implementations.

@youennf
Copy link
Contributor

youennf commented Sep 26, 2024

One use case for transferring media stream track is to create a track (via VideoTrackGenerator) and send it to sinks like RTCRtpSender or MediaRecorder.

@guidou
Copy link
Contributor Author

guidou commented Sep 26, 2024

The proposal is that createVideoTrackGenerator() is called from Window and returns a promise with a MediaStreamTrack on Window, where the RTCRtpSender or MediaRecorder are. The generator (which no longer has a track field) is created in the worker (the application gets it via an event, just like an RTCRtpScriptTransformer).

This removes the need to transfer the track. You only needed to transfer the track from the worker to window because the current spec creates the track on the worker, where it is largely useless, as all the track APIs are on Window.

@guidou
Copy link
Contributor Author

guidou commented Sep 26, 2024

Another benefit of this API surface is that it allows feature detection on main without creating a worker.

@jan-ivar
Copy link
Member

This can be feature detected on main like this:

function isMstTransferable() {
  try {
    const [track] = document.createElement('canvas').captureStream().getVideoTracks();
    new MessageChannel().port1.postMessage(track, [track]);
    return true;
  } catch (e) {
    if (e.name != "DataCloneError") throw e;
    return false;
  }
}

@jan-ivar
Copy link
Member

It solves the problem that you don't need track transferability to implement this API, which we consider a blocker, at least for the medium term.

@guidou why is transfer a blocker? What do you mean by medium term? Safari has already shipped this and it works.

If you explain the problem, perhaps their engineers can help?

@guidou
Copy link
Contributor Author

guidou commented Sep 30, 2024

It's a blocker for Chromium to ship it in the short term since Chromium doesn't implement track transferability and will not have it for quite some time.

I don't expect Chromium to have track transferability in the short term, so I guess we won't have an interoperable API for a long time.

@guidou
Copy link
Contributor Author

guidou commented Sep 30, 2024

This can be feature detected on main like this:

function isMstTransferable() {
  try {
    const [track] = document.createElement('canvas').captureStream().getVideoTracks();
    new MessageChannel().port1.postMessage(track, [track]);
    return true;
  } catch (e) {
    if (e.name != "DataCloneError") throw e;
    return false;
  }
}

This feature-detects track transferability, not mediacapture-transform.
With the current API you need to create a worker to feature-detect, which is costly and unergonomic.

@jan-ivar
Copy link
Member

I doubt that's needed. As you said, "tracks are useless on workers except for this API". If MST transfer is detected, it seems reasonable to assume some purpose awaits these tracks in the worker.

This works in the only current implementation: "WebKit for Safari 18 beta adds support for MediaStreamTrack processing in a dedicated worker."

This seems like a property worth emulating. I've added a note to Firefox's implementation bug to do the same. Thanks for bringing attention to this!

@jan-ivar
Copy link
Member

... Chromium doesn't implement track transferability and will not have it for quite some time.

If there's some difficulty or problem with the spec's transfer steps as specified, please bring it to our attention so we can address it.

@jan-ivar
Copy link
Member

You can call applyConstraints and read stats/settings on Window, where the main application is.

Yes, but waiting on postMessage for these measurements hardly seems ideal. In the current spec, the worker transform can inspect real-time track stats counters like deliveredFrames, discardedFrames and totalFrames synchronously, and correlate them with the VideoFrame it is currently processing.

@guidou
Copy link
Contributor Author

guidou commented Oct 15, 2024

You can call applyConstraints and read stats/settings on Window, where the main application is.

Yes, but waiting on postMessage for these measurements hardly seems ideal.

That goes the other way too if you want access to the track on Window (which is the more common case today).

In the current spec, the worker transform can inspect real-time track stats counters like deliveredFrames, discardedFrames and totalFrames synchronously, and correlate them with the VideoFrame it is currently processing.

I'm not opposed to supporting transferability. I'm opposed to making it a requirement to use mediacapture-transform, as that will have the practical consequence of delaying interoperable implementations.

We already have a pattern for adding worker support without requiring transferability of tracks or streams. This doesn't mean applications are forbidden from transferring tracks on browsers that support it if they want to.
It just means that applications that don't need to transfer tracks to do processing (which are most if not all applications today) can more quickly have an interoperable API in practice.

@jan-ivar
Copy link
Member

That goes the other way too if you want access to the track on Window ...

No, because tracks can be cloned. With transfer, stats are readily available in both places. So the problem of a transformer needing a roundtrip to main to read settings and applyConstraints, for lack of transfer, would be new with this proposal.

@jan-ivar
Copy link
Member

I'm not opposed to supporting transferability.

Great! Since you said tracks are useless on workers except for the worker API, does this mean you support the worker API?

I'm opposed to making it a requirement to use mediacapture-transform, ...

It already is a requirement.

... as that will have the practical consequence of delaying interoperable implementations.

I doubt attempting to standardize a third new API and waiting for three implementations will get us to interop quicker.

Safari has shipped, and Firefox is working on it. 1½ < 3 + one WG. I've filed w3c/mediacapture-extensions#158 to help.

Creating a permanent web API to solve one implementer's short-term scheduling seems against § 1.7. Add new capabilities with care and § 1.9. Leave the web better than you found it.

This doesn't mean applications are forbidden from transferring tracks on browsers that support it if they want to.

Having web developers navigate between 3 instead of 2 different APIs to do the same thing sounds worse, not better.

@guidou
Copy link
Contributor Author

guidou commented Oct 29, 2024

I'm not opposed to supporting transferability.

Great! Since you said tracks are useless on workers except for the worker API, does this mean you support the worker API?

What is the Worker API?
I said tracks are useless on workers except for this API (i.e., mediacapture-transform) which artificially requires them.
So, to clarify, there nothing I support about mediacapture-transform in its current form.

I'm opposed to making it a requirement to use mediacapture-transform, ...

It already is a requirement.

An artificial requirement. It would be very easy to have a spec that does not require track transferability for worker support. That also applies to new implementations (or updating existing ones), since the proposed approach is based on pre-existing patterns already implemented by all major browser engines.

... as that will have the practical consequence of delaying interoperable implementations.

I doubt attempting to standardize a third new API and waiting for three implementations will get us to interop quicker.

I also doubt an API that ignores developer requirements and concerns by at least one implementor will get us to interop.

Safari has shipped, and Firefox is working on it. 1½ < 3 + one WG. I've filed w3c/mediacapture-extensions#158 to help.

w3c/mediacapture-extensions#158 does not address this issue.

Creating a permanent web API to solve one implementer's short-term scheduling seems against § 1.7. Add new capabilities with care and § 1.9. Leave the web better than you found it.

The specific change proposed in this issue is not about "short-term" scheduling. It is to make the API better.
If a use case can be solved appropriately without introducing a dependency on another feature, then it is better to solve it without introducing that dependency. The fact that it results in an API that is easier to implement is a consequence of that design being better. Another benefit is that it allows easier feature detection on Window, more in line with 2.5. New features should be detectable than the current version of the API.

Ignoring the needs of web page authors and at least one user agent implementor, which the current API does overall, is directly against 1.1. Put user needs first (Priority of Constituencies).
Ignoring concerns of user agent implementors also goes against 1.1. Put user needs first (Priority of Constituencies).

User needs come before the needs of web page authors, which come before the needs of user agent implementors, which come before the needs of specification writers, which come before theoretical purity.

The track transferability requirement is IMO the opposite of § 1.7. Add new capabilities with care. That principle refers to adding "new capabilities to the web with consideration of existing functionality and content". Adding a feature that requires a dependency on another feature is not better than adding the feature following existing patterns that don't require such dependency.

This doesn't mean applications are forbidden from transferring tracks on browsers that support it if they want to.

Having web developers navigate between 3 instead of 2 different APIs to do the same thing sounds worse, not better.

What 3 different APIs?

Are you referring to the requirement of using AudioWorklet for audio processing, which is a different API that, in addition, is not suitable for all types of processing?

@youennf
Copy link
Contributor

youennf commented Oct 29, 2024

I said tracks are useless on workers except for this API (i.e., mediacapture-transform) which artificially requires them.

This is not artificial, transferring a track to a worker has real benefits compared to the approach you mention.
Let's take the example of a web application wanting to do background blur on a camera feed via a MediaStreamTrackProcessor and a VideoTrackGenerator.

First, lifetime management is easier.
When the VideoTrackGenerator track gets stopped, its WritableStream will be closed. The web application can listen to this via its closed promise and call stop on the getUserMedia track.
Also, stopping the worker will kill both VideoTrackGenerator and getUserMedia track, housekeeping is simpler :)
This is less convenient when the WritableStream lives in a different context than the track, web developer will need to post message.

Second, configuration management.
If the getUserMedia track is muted, the web app will likely want to mute the VideoTrackGenerator.
Ditto when getUserMedia track is unmuted.
If the getUserMedia track is in the same context as VideoTrackGenerator, it is very easy to implement for the web developer.
Otherwise, web app has to postMessage.

This has a real user consequences: a few frames will likely be missed by VideoTrackGenerator when getUserMedia track gets unmuted if the web app has to postMessage. With the worker approach, missing frames would be a bug in the UA implementation.

The same principle applies to configurationchange, getSettings, applyConstraints.
It is much easier for VideoTrackGenerator, MediaStreamTrackGenerator and getUserMedia track to be all in the same context to make use of these APIs.

Finally, we introduced MediaStreamTrack transferability as a way to cover some longer term use cases (grabbing camera in an iframe but do rendering/processing in another iframe). The current spec is more future-proof from that point of view as well.

User needs come before the needs of web page authors, which come before the needs of user agent implementors, which come before the needs of specification writers, which come before theoretical purity.

Right, I think user needs will likely be better served with the current API, as described above.
I tend to agree that track transferability requires more work from UA implementors, but these costs are overweighted by user and web developer benefits.

I also doubt an API that ignores developer requirements and concerns by at least one implementor will get us to interop.

What are the developer requirements that have been ignored?
So far, the developer feedback we received is that MSTP and VTG are working fine in Safari.

@guidou
Copy link
Contributor Author

guidou commented Oct 29, 2024

I said tracks are useless on workers except for this API (i.e., mediacapture-transform) which artificially requires them.

This is not artificial, transferring a track to a worker has real benefits compared to the approach you mention. Let's take the example of a web application wanting to do background blur on a camera feed via a MediaStreamTrackProcessor and a VideoTrackGenerator.

Yes, it is an artificial requirement.

If you have a use case where having the track in the worker is useful, then that can be very valid, but it doesn't justify making transferability it a requirement for mediacapture-transform. I didn't say track transferability is an artificial feature. I'm just saying it is an artificial requirement for mediacapture-transform that, in addition, is often detrimental.

First, lifetime management is easier. When the VideoTrackGenerator track gets stopped, its WritableStream will be closed. The web application can listen to this via its closed promise and call stop on the getUserMedia track. Also, stopping the worker will kill both VideoTrackGenerator and getUserMedia track, housekeeping is simpler :) This is less convenient when the WritableStream lives in a different context than the track, web developer will need to post message.

You don't need transferability as a requirement to support this use case.
UAs that support track transferability can perfectly support this use case even if transferability is not a requirement for mediacapture-transform.
BTW, I have never heard of this use case from actual developers, but I am not opposed to it.

Second, configuration management. If the getUserMedia track is muted, the web app will likely want to mute the VideoTrackGenerator. Ditto when getUserMedia track is unmuted. If the getUserMedia track is in the same context as VideoTrackGenerator, it is very easy to implement for the web developer. Otherwise, web app has to postMessage.

This has a real user consequences: a few frames will likely be missed by VideoTrackGenerator when getUserMedia track gets unmuted if the web app has to postMessage. With the worker approach, missing frames would be a bug in the UA implementation.

I don't think this is an actual problem because if the getUserMedia track is muted, it will produce no frames and the VideoTrackGenerator will see no frames.
Still, if you want to support this use case in this manner, there is nothing in this proposal preventing it.
Like I said, you don't need transferability as a requirement to support this. You just need transferability, which no one is opposing.

The same principle applies to configurationchange, getSettings, applyConstraints. It is much easier for VideoTrackGenerator, MediaStreamTrackGenerator and getUserMedia track to be all in the same context to make use of these APIs.

The same applies. I haven't heard developers request this, but even if it's a useful use case, you don't need transferability as a requirement to support this. You just need transferability.

Finally, we introduced MediaStreamTrack transferability as a way to cover some longer term use cases (grabbing camera in an iframe but do rendering/processing in another iframe). The current spec is more future-proof from that point of view as well.

That is an actual use case for which I have seen developer demand and IMO is the main value track transferability can provide. This is completely independent of having track transferability as a requirement for mediacapture-transform.

User needs come before the needs of web page authors, which come before the needs of user agent implementors, which come before the needs of specification writers, which come before theoretical purity.

Right, I think user needs will likely be better served with the current API, as described above.

User needs are not better served by having transferability as a requirement.
Eliminating the requirement of transferability does not prevent any of the use cases mentioned before.
For that you just need transferability, not transferability as a requirement for mediacapture-transform.

On the other hand, requiring transferability does make things difficult for some common use cases. The most obvious is playing both the gUM and VTG tracks on a video element in a before/after effects view. In this case you no longer have the gUM track on Window and therefore can't play it on an element. The same applies if you want to use any other track sink available only on Window.

I tend to agree that track transferability requires more work from UA implementors, but these costs are overweighted by user and web developer benefits.

All the arguments I've seen so far are benefits of transferability as a standalone feature. None of these benefits are derived from that transferability being a requirement for mediacapture-transform.
Moreover, I've presented use cases where transferability as a requirement makes it more difficult to support common use cases.

So, transferability as a standalone feature supports both the use cases you presented and the ones I presented, but transferability as a mediacapture-transform requirement only supports the use cases you presented and fails to properly support the ones I presented.
It is clear to me that removing the transferability as a requirement for mediacapture-transform better serves the needs of developers.

I also doubt an API that ignores developer requirements and concerns by at least one implementor will get us to interop.

What are the developer requirements that have been ignored? So far, the developer feedback we received is that MSTP and VTG are working fine in Safari.

Here are some developer requirements that are well known to us and which are ignored by the current version of the spec (not all of these are related to the issue we're discussing which is track transferability as a mediacapture-transform requirement):

  • Audio support
  • Keeping the gUM track on Window while processing on Worker (before/after view and other use cases that require track sinks available only on Window)
  • Easy feature detection (without requiring the creation of a Worker)
  • Processing on Window.

@youennf
Copy link
Contributor

youennf commented Oct 29, 2024

Here are some developer requirements that are well known to us and which are ignored by the current version of the spec (not all of these are related to the issue we're discussing which is track transferability as a mediacapture-transform requirement):

Let's only talk about the requirements that are relevant to this particular issue (audio support and processing on window are out of scope).

  • Keeping the gUM track on Window while processing on Worker (before/after view and other use cases that require track sinks available only on Window)

The before/after view can be implemented by transferring a clone of the track instead of transferring the track itself.

  • Easy feature detection (without requiring the creation of a Worker)

@jan-ivar provided a feature detection approach that works in Safari (and will likely work in Firefox).
This is good enough in practice, feature detection does not have to be technically pure.

I am sympathetic to the needs of browser implementors. So far though, I haven't seen new information that warrants revisiting the design of this API.
Also, this API shape has mostly remained untouched for several years, probably since the spec first public working draft and reached consensus at the time within the WebRTC WG. This API shipped in one UA, and is being implemented in another UA.

@jan-ivar
Copy link
Member

The specific change proposed in this issue is not about "short-term" scheduling. It is to make the API better.
If a use case can be solved appropriately without introducing a dependency on another feature, then it is better to solve it without introducing that dependency. The fact that it results in an API that is easier to implement is a consequence of that design being better.

I think this API is shortsighted. It's tightly coupled, artificially tied to main thread, and it reinvents postMessage.

Our goal is to enable MediaStreamTrack processing in dedicated workers. This might include MediaStreamTracks originating in the worker someday, e.g. from an OffscreenCanvas.captureStream() or other sources already exposed in the worker. Or an RTCDataChannel in a worker feeding a VideoTrackGenerator created there.

Since we all agree MediaStreamTracks will exist in workers eventually, the simplest API is the one that accepts them there.

The idiomatic way to get data to workers is with postMessage, using transferable objects if needed.

So I disagree we shouldn't depend on other web platform features. It's doing it all ourselves that's the mistake. At least that's how I read § 1.7 Add new capabilities with care.

@guidou
Copy link
Contributor Author

guidou commented Oct 29, 2024

I thought the plan was to summarize our positions in a separate issue and ask TAG for their opinion, but here's my reply.

The specific change proposed in this issue is not about "short-term" scheduling. It is to make the API better.
If a use case can be solved appropriately without introducing a dependency on another feature, then it is better to solve it without introducing that dependency. The fact that it results in an API that is easier to implement is a consequence of that design being better.

I think this API is shortsighted. It's tightly coupled, artificially tied to main thread,

This API is not tighly coupled with anything. If a developer wants to transfer a track to a Worker and manage all its state and lifetime there, there is nothing in the proposed API preventing it. Just like nothing prevents developers from managing the track on Window if that is what they prefer.

The one that forces developers to use track transferability even if they'd rather not use it is the one that tightly couples two features that should be independent of each other.

and it reinvents postMessage.

This API does not reinvent postMessage anymore than encoded transform does. If this is such a bad thing, should I file an issue in encoded transform to eliminate the same pattern there and require that RTCRtpSender and RTCRtpReceiver (or some other object) be transferrable too?

Our goal is to enable MediaStreamTrack processing in dedicated workers. This might include MediaStreamTracks originating in the worker someday, e.g. from an OffscreenCanvas.captureStream() or other sources already exposed in the worker. Or an RTCDataChannel in a worker feeding a VideoTrackGenerator created there.

All this can be supported without tightly coupling mediacapture-transform with track transferability.
I don't think it is possible to find a single use case where forcing the user to use track transferability is better than allowing it, but without forcing it.

Since we all agree MediaStreamTracks will exist in workers eventually, the simplest API is the one that accepts them there.

No it's not the simplest API. It is a lot more complex to tightly couple two features that should be independent.
It is not only more complex for UA implementors, which cannot develop the features independently; but, more importantly, it is more complex for Web developers, who are forced to use an unnecessary feature and complex workarounds to solve otherwise nonexisting problems.
Now Web developers are forced to clone a track, transfer one of them, and introduce track management logic in two separate realms even if their preference would be to do all track management on Window.

Even more importantly, the proposed API does not even need to be a replacement for the existing one.
Since this API removes the tight coupling between both features, it isn't really much of a problem to provide the constructors in the existing API as a convenience for hypothetical applications that would prefer to do all track management in the worker.

The idiomatic way to get data to workers is with postMessage, using transferable objects if needed.

Again I ask, why is this a problem. Is encoded transform non-idiomatic? Should we eliminate the RTCRtpScriptTransform constructor and introduce a new transferable object there to be used with postMessage, or make senders and receivers transferable?
Or is it a problem here, but not there?

So I disagree we shouldn't depend on other web platform features. It's doing it all ourselves that's the mistake.

The mistake is to force a dependency on another feature that should be independent.
I'd like to see a single use case where this dependency provides a benefit for web developers compared to having the features be orthogonal.

At least that's how I read § 1.7 Add new capabilities with care.

We read it very differently. Adding dependencies between features that should be orthogonal and forcing developers to use complex workarounds to deal with those unnecessary dependencies is, in my view, the opposite of adding capabilities with care.

@jan-ivar
Copy link
Member

@guidou I appreciate your efforts to simplify the API, but I believe your proposal introduces more complexity rather than reducing it. It seems unclear whether your proposed API is intended to replace the existing MediaCapture Transform API or to coexist alongside it.

If it's meant to coexist, then we're asking developers to navigate between multiple APIs that achieve similar goals, which can lead to confusion and fragmentation. This also increases the burden on browser implementers to support multiple APIs, delaying interoperability.

If it's meant to replace the existing API, it disregards the implementations already shipped in Safari and in progress in Firefox, which would fragment the ecosystem further and negate the developer feedback we've already received.

Moreover, your proposal doesn't seem to stand on its own because it doesn't cover all the use cases the current API does — particularly future scenarios where tracks originate in workers or need to be fully managed within a worker context.

Requiring track transferability isn't an unnecessary dependency; it's a design choice that provides significant benefits to developers, such as simplified lifetime and configuration management, as well as access to track stats and settings directly within the worker.

Adding another API also goes against the web platform design principles of keeping the platform consistent and avoiding unnecessary complexity.

I believe it's better for us to focus on implementing the existing API consistently across browsers and addressing any implementation challenges together, rather than introducing an alternative that could fragment the ecosystem.

@guidou
Copy link
Contributor Author

guidou commented Oct 30, 2024

@guidou I appreciate your efforts to simplify the API, but I believe your proposal introduces more complexity rather than reducing it.

Can you elaborate on how is it more complex? Especially for Web developers.
One way would be to compare how intended use cases are implemented with each API.

It seems unclear whether your proposed API is intended to replace the existing MediaCapture Transform API or to coexist alongside it.

It can be both. I would prefer replace.

If it's meant to coexist, then we're asking developers to navigate between multiple APIs that achieve similar goals, which can lead to confusion and fragmentation.

It would be better to replace, but since there is one implementation, coexist seems acceptable.

This also increases the burden on browser implementers to support multiple APIs, delaying interoperability.
It's not uncommon for an API to have multiple constructors or factory methods to serve different use cases.
In this case, I'm proposing factory methods that the provide the following benefits:

  • It properly supports all use cases that have been presented so far. In particular it is better at supporting the common use case of an application wanting to manage the track Window and media processing on Worker.
  • It removes the tight coupling between two separate API. This has the advantage that they can be developed independently by UA implementers.

If it's meant to replace the existing API, it disregards the implementations already shipped in Safari and in progress in Firefox, which would fragment the ecosystem further and negate the developer feedback we've already received.

For this reason, coexist would be acceptable.

Moreover, your proposal doesn't seem to stand on its own because it doesn't cover all the use cases the current API does — particularly future scenarios where tracks originate in workers or need to be fully managed within a worker context.

I'm talking about real use cases deployed in production right now, not hypothetical ones that might never exist. I believe the former should have more weight than the latter in the design of the WG's APIs.

Requiring track transferability isn't an unnecessary dependency;

Requiring track transferability for use cases that don't need it is indeed an unnecessary dependency.

it's a design choice that provides significant benefits to developers, such as simplified lifetime and configuration management, as well as access to track stats and settings directly within the worker.

There is no simplified lifetime and configuration management. Any use case that prefers to manage track lifetime on Window (basically all use cases deployed today) requires much more complex lifetime management with the existing API.

Adding another API also goes against the web platform design principles of keeping the platform consistent and avoiding unnecessary complexity.

Requiring track transferability for use cases that don't need it is precisely adding unnecessary complexity.

I believe it's better for us to focus on implementing the existing API consistently across browsers and addressing any implementation challenges together, rather than introducing an alternative that could fragment the ecosystem.

The ecosystem is already fragmented. This proposal might have the side effect of making it easier to reduce that fragmentation as it makes it possible for UA implementors to develop two features independently using patterns that are already implemented and tested. Forcing two features than can (and should) be orthogonal to have a dependency such that one has to be implemented before the other does nothing to help reduce the already existing fragmentation.

Finally, I think we have reached the point in which we are just repeating the same arguments without achieving consensus. Shouldn't we go ahead with the plan to file our positions in separate issues and get TAG's input?

@jan-ivar
Copy link
Member

Again I ask, why is this a problem. Is encoded transform non-idiomatic?

Encoded transform is bespoke.

Should we eliminate the RTCRtpScriptTransform constructor and introduce a new transferable object there to be used with postMessage, or make senders and receivers transferable?

No, because unique tradeoffs were involved, and that FPWD has already shipped in two browsers.

Or is it a problem here, but not there?

I believe it's on the person filing the issue to produce a convincing problem that needs fixing. Otherwise I see no new information since FPWD that warrants revisiting the design of this API.

Usage of the spec API seems fine, as seen in this blog.

@guidou
Copy link
Contributor Author

guidou commented Dec 11, 2024

Again I ask, why is this a problem. Is encoded transform non-idiomatic?

Encoded transform is bespoke.

In what way that doesn't apply here?

Should we eliminate the RTCRtpScriptTransform constructor and introduce a new transferable object there to be used with postMessage, or make senders and receivers transferable?

No, because unique tradeoffs were involved, and that FPWD has already shipped in two browsers.

There is a similar tradeoff here. There is nothing magical about FPWDs such that they cannot be improved.

Or is it a problem here, but not there?

I believe it's on the person filing the issue to produce a convincing problem that needs fixing. Otherwise I see no new information since FPWD that warrants revisiting the design of this API.

The use case where the application needs the track on Window is a convincing one.
Basically, that describes all applications using MediaStreamTrack on the Web today.

Usage of the spec API seems fine, as seen in this blog.

Doesn't seem particularly fine to me.
Properly supporting lots of real-world applications using MediaStreamTrack on Window would seem better to me.

@youennf
Copy link
Contributor

youennf commented Dec 11, 2024

There is nothing magical about FPWDs such that they cannot be improved.

Well, we are very far from FPWD, the spec reached consensus within the WebRTC WG and has been very stable for a few years now.
This design cannot come as surprise since both editors of this spec are from Google.
Also, there will be soon two implementations. This should prove that the current spec is implementable and precise enough to get interop. This would allow to move the spec to REC.

The use case where the application needs the track on Window is a convincing one.

So far, I have not seen a usecase where the API you propose is providing more than the existing API. Could you be more specific?

For instance, the API you are proposing is most probably shimable using the current API.
The reverse is not true.

Taking a real example to compare the two APIs as an exercise, we could take the use case of doing real time encoding and sending of a video track to the network (using data channel, web transport, or the future encoded source proposal). This requires the web page to potentially adapt the frame rate and/or resolution to the network conditions.

With the current API, the adaptation logic is all happening solely in the worker via applyConstraints.
With the proposed API, the web application would have to postMessage to the window environment, which is doable but more complex and suboptimal.

Doesn't seem particularly fine to me.
Properly supporting lots of real-world applications using MediaStreamTrack on Window would seem better to me.

Could you precise what use case and what advantages you see?
My understanding is that both APIs will roughly have a similar level of functionality.

@guidou
Copy link
Contributor Author

guidou commented Dec 11, 2024

There is nothing magical about FPWDs such that they cannot be improved.

Well, we are very far from FPWD, the spec reached consensus within the WebRTC WG and has been very stable for a few years now. This design cannot come as surprise since both editors of this spec are from Google. Also, there will be soon two implementations. This should prove that the current spec is implementable and precise enough to get interop. This would allow to move the spec to REC.

That there are two implementations doesn't prevent changing the spec to improve it. We do that all the time with specs that are much more mature than this one.

The use case where the application needs the track on Window is a convincing one.

So far, I have not seen a usecase where the API you propose is providing more than the existing API. Could you be more specific?

All existing applications today (including the ones that do video processing) currently do so with the tracks on Window, where they manage all the logic. Forcing them to migrate to manage the track in the worker is an unnecessary barrier.

For instance, the API you are proposing is most probably shimable using the current API. The reverse is not true.

Taking a real example to compare the two APIs as an exercise, we could take the use case of doing real time encoding and sending of a video track to the network (using data channel, web transport, or the future encoded source proposal). This requires the web page to potentially adapt the frame rate and/or resolution to the network conditions.

With the current API, the adaptation logic is all happening solely in the worker via applyConstraints. With the proposed API, the web application would have to postMessage to the window environment, which is doable but more complex and suboptimal.

That sounds more like a hypothetical example than a real one. Either way, my proposal supports it fine. There is nothing that prevents transferring the track to the worker in my proposal, so saying that some use case works better when you transfer the track is not an argument against the proposal.

You would have to show evidence that forcing the transfer of the track is better than making it optional in all, or at least most use cases.

Doesn't seem particularly fine to me.
Properly supporting lots of real-world applications using MediaStreamTrack on Window would seem better to me.

Could you precise what use case and what advantages you see? My understanding is that both APIs will roughly have a similar level of functionality.

The spec version forces a pattern of transferring a track and managing its lifetime there, while historically all applications manage tracks on Window. My proposal makes it easy to support both. Forcing the transfer of the track to the worker is a not necessarily a good pattern, especially for existing applications.

The use case for managing the track on Window is all existing applications.

@youennf
Copy link
Contributor

youennf commented Dec 11, 2024

That there are two implementations doesn't prevent changing the spec to improve it

What is proposed is an entire rewriting the API/WebIDL, which means rewriting a large part of the spec.

The use case for managing the track on Window is all existing applications.

It would help immensely if we have a solid proof that the new API is hard to use.
So far, it does not seem to be the case.

AIUI, the current API and the proposed API have the same feature set, so there should be no impact on end users. All we are debating is ease of use of the two APIs.

We should compare this potential ease-of-use benefit with the known amount of work building the new proposal would require:

  1. WebRTC WG discussions to reaching consensus on the new API design
  2. Rewriting of the spec
  3. Deprecation and removal of the current API (we surely do not want to have two APIs for the same job)
  4. Reimplementations in all browsers.

This is a big ask.

@guidou
Copy link
Contributor Author

guidou commented Dec 11, 2024

That there are two implementations doesn't prevent changing the spec to improve it

What is proposed is an entire rewriting the API/WebIDL, which means rewriting a large part of the spec.

Nothing of the sort. The proposal is basically adding a new factory method for MSTP /VTG. No need to remove anything

The use case for managing the track on Window is all existing applications.

It would help immensely if we have a solid proof that the new API is hard to use. So far, it does not seem to be the case.

Safari implements it. Are you aware of any major applications moving from canvas capture to the new API? Can you share any adoption numbers?
Maybe we can also conduct some sort of developer survey.

AIUI, the current API and the proposed API have the same feature set, so there should be no impact on end users. All we are debating is ease of use of the two APIs.

The proposed API is just an additional factory method so that you xan keep the tracks on Window, which will make it easier to migrate existing applications and develop new ones following commonly used patterns withvMediaStreamTracks

We should compare this potential ease-of-use benefit with the known amount of work building the new proposal would require:

  1. WebRTC WG discussions to reaching consensus on the new API design
  2. Rewriting of the spec
  3. Deprecation and removal of the current API (we surely do not want to have two APIs for the same job)
  4. Reimplementations in all browsers.

This is a big ask.

The needs of users and Web developers are above the needs of browser implementers and spec writers in the priority of constituencies so I don't see this as a blocker.
Also, in terms of implementation, the proposal reuses a pattern already implemented in browsers for encoded transform.

@youennf
Copy link
Contributor

youennf commented Dec 11, 2024

Nothing of the sort. The proposal is basically adding a new factory method for MSTP /VTG. No need to remove anything

If we are not removing the existing API, why adding an API that can be shimed on the existing API?

Safari implements it. Are you aware of any major applications moving from canvas capture to the new API?

This is still early as Safari shipped it recently.
There is active interest and, so far I haven't heard of any blocker or difficulty for using the API.

@guidou
Copy link
Contributor Author

guidou commented Dec 11, 2024

Nothing of the sort. The proposal is basically adding a new factory method for MSTP /VTG. No need to remove anything

If we are not removing the existing API, why adding an API that can be shimed on the existing API?

It removes the transferability requirement and makes it easy to do feature detection without creating a worker.

Safari implements it. Are you aware of any major applications moving from canvas capture to the new API?

This is still early as Safari shipped it recently. There is active interest and, so far I haven't heard of any blocker or difficulty for using the API.

It would be good to ask those potential users if they would prefer an API that forces to transfer tracks to a worker or one that lets them choose.between Worker and Window.

@jan-ivar
Copy link
Member

jan-ivar commented Dec 12, 2024

Nothing of the sort. The proposal is basically adding a new factory method for MSTP /VTG. No need to remove anything

If we are not removing the existing API, why adding an API that can be shimed on the existing API?

It's a bit more than a factory method. For comparison, here's a trivial factory function (works in Safari):

async function createVideoTrackGeneratorAndProcessor(worker, track) {
  const before = track.clone();
  worker.postMessage({before}, [before]);
  const {data} = await new Promise(r => worker.addEventListener("message", r));
  return data.after;
}

But this differs from what @guidou seems to want: the worker reacting to track.applyConstraints() on the main thread (or wherever it is transferred I suppose?) That doesn't exist today.

Having processing react to tracks on other threads seems more complex than requiring the track to be on the same thread (Safari stops processing if the before track is transferred away).

It feels like we haven't thought past main thread here. Asking websites to postMessage constraints seems simpler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants