-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose a MessagePort using Capture Handle #70
Comments
@jan-ivar, any thoughts here? |
A few thoughts/suggestions:
How about the following: partial interface CaptureController {// Should it be named DisplayCaptureController?
undefined postMessage(...);
attribute EventHandler onmessage;
}
partial interface MediaDevices {
// This event has a source attribute of type DisplayCapturer
attribute EventHandler oncapturermessage;
}
interface DisplayCapturer {
undefined postMessage(...);
} The assumption is that CaptureController would have access to the latest CaptureHandle information, which is not the case right now. |
Dialpad would benefit if a video-conferencing product were able to securely remote-control a presentation product, locally in the same browser, as well as remotely by an other participant. I support this proposal. |
I like the minimal API in #70 (comment) on the controller, assuming partial interface CaptureController {
undefined postMessage(any message, USVString targetOrigin, optional sequence<object> transfer = []);
undefined postMessage(any message, optional WindowPostMessageOptions options = {}); That way, apps have |
Adding There are a couple of questions in that area that would help driving the exact algorithms and API shapes:
I think we have ways to build whatever we want there. |
Code that cares about the target's origin would look like this: if (track.getCaptureHandle().origin == myExpectedOrigin) {
// postMessage() and so on.
} The comparison fails for any "real" value of
Maybe we could expose on the controller in addition to the track. But I think it's important to retain the API surface that's already on the track, because tracks are transferrable, and CaptureControllers are not. A receiver of a transferred track might be interested in learning that the track represents a capture of a tab tuned to a specific origin. It would NOT be possible to learn that if exposure is only on the controller, because message passing is async and the information might be out of date by the time the controller's iframe responds (e.g. navigation).
I think it's undesirable to expose postMessage on the controller, again because the track is transferable and the controller is not. If an iframe ORIGINAL_CAPTURER initiates the capture and the transfers the track to iframe IFRAME_X, why should IFRAME_X need to keep on bothering ORIGINAL_CAPTURER with requests to relay messages to the capturee on its behalf? In fact, I now think I've not gone far enough to begin with. I think we should expose the port-getter on the track itself or on the capture handle. Even if we were to make the controller transferrable, it would not be enough, because tracks are cloneable, and clones might be posted to different targets.
When the top-level is navigated, the new capturee needs to register a new listener, and it should only get messages sent expressly to it. My proposal ensures that, by killing off the old port and forcing the capturer to set up a new one.
No, it should error. Sending messages to someone that cannot receive them is an app-error, and the app should be made aware, so that its developers may fix the issue. I believe my proposal addresses that through
It should stop loudly. I believe my proposal addresses that through
That's a new issue. I think it's orthogonal to other design decisions facing as atm. If you agree (do you?), I propose tackling it after we settle other issues. |
One lens to look at things through is - if a track is cloned, and the clones are transferred to two different iframes IF_A and IF_B, then:
I believe my proposal addresses all of these, modulo that I need to change: partial interface CaptureController {
MessagePort getMessagePort();
} To: partial interface MediaStreamTrack {
MessagePort getMessagePort();
} (Or possibly make CaptureHandle an interface rather than a dictionary, and expose it there.) |
@youennf and @jan-ivar, thank you for providing verbal feedback; could you please provide written feedback here, lest we misremember our discussions? This proposal was briefly presented yesterday at the Screen Capture Community Group March 2023 meeting and there was Web developer interest. It would be good to settle on a shape soon; we intend to implement an origin trial of this API in Chrome soon. |
I was not at yesterday's meeting so I am not sure which proposal was presented. If the discussion is about postMessage vs. getMessagePort, my recollection of our past informal discussions is that there was agreement that the postMessage approach supports all use cases the getMessagePort approach would. |
Sorry for being ambiguous; I meant "this proposed shape which I have presented in this thread."
I think that it's a feature that
I think my proposed method is also on solid ground, as it uses MessagePort. |
I'm going to jot down a list of the benefits and drawbacks of the two approaches soon and solicit some more feedback. |
postMessage can allow this naturally, if we decide so. I haven't made my mind on whether we should enforce this rule or not, it would be worth digging into this (feedback provided earlier in this thread #70 (comment)).
MessagePorts are transferable so there is no guarantee that the message will be processed by the capturing application. The postMessage approach gives us more flexility here. If we want to, we can decide to enforce this rule without any race conditions.
Before diving into API shape, it would be good to nail down the exact behavior we want. |
That's the capturer->capturee direction. But we want bidirectional messaging, which requires a MessagePort be posted back. And since this will just be a normal run of the mill MessagePort - since we don't atm have any other one - then it won't exhibit this special behavior. But if we expose a new MessagePort through a getter, we can specify in the getter itself this new behavior. We don't need to modify MessagePort itself.
Transferring the port is delegating; I see it as equivalent to relaying the messages themselves. What my proposal guarantees is that the messages will only be transmitted as long as the capture session is active.
Could you help me understand why the approaches are different wrt races? Do you mean that if a task starts executing before the session-capture stopped, then
|
postMessage handles this with MessageEvent.source.
I am not clear about this. With regards to behavior, I think we agree on 1, 2, 3, 6, 7, 8. 6 is interesting in that the MessagePort approach would use the same object (MessagePort) for both transient channels and permanent channels. The postMessage approach would only use MessagePort for permanent channels. 9 is not about behavior but about ergonomics. |
Both are solutions provide bidirectional messaging, so we seem to agree on this being a requirement. Great!
Please note that you have an unterminated thought there. I'd love to hear the rest of it. Here is how I generally envision it happening without new hooks in the MessagePort spec:
Where the
The capturer knows when it's capturing X. So to name just one use case to motivate 4 and 5 - once a channel is established, the capturee might expose user-facing controls to produce action in the capturer. ("Start recording; stop recording; save to disk; discard recording.") Such user-facing controls would have to be hidden away when they become inactionable, which is the case when the capture-session stops.
Same class, not same object. I don't see it as an issue. Do you?
The level of complexity in the app code to handle navigation of the captured-tab would be staggering, and race-prone. This goes beyond mere ergonomics. |
P.S:
Won't we need to modify the MessagePort spec in some way to ensure that Do we really want |
No change to MessagePort spec needed.
This proposed algorithm is very imprecise, it would be hard to implement it in an interoperable manner.
If we were to do that at MessagePort level, we would need to update https://html.spec.whatwg.org/multipage/web-messaging.html#message-port-post-message-steps, ditto for implementations which would break isolation of MessagePort code from capture code.
This seems reasonable and would call for exposing display capturer as its own object.
It is not great to use the same class to represent two things that have different behaviours.
Our opinions differ here, but at this stage, this is nothing more than opinions. |
Problem Statement
When an application screen-captures another, it is often useful for the user if these two apps can then integrate more closely with each other. For example, a video conferencing application may allow the user to start remote-controlling a slides presentation.
Capture Handle introduced the ability for a capturee to declare its identity to a capturer. This identity can be used to kick off communication, either over shared cloud infrastructure, or locally, e.g. over a BroadcastChannel. Local communication is more efficient and robust, and is therefore much preferable. But what if the two apps are separated by Storage Partitioning? For that, it’s useful to set up a dedicated MessagePort between capturer and capturee.
Scoping
Note that a MessagePort cannot address all use cases we have in mind, and cannot replace Capture Handle, nor some of Capture Handle's future extensions.
The discussion is therefore scoped to the use case we can hope to address - improving things for tightly-coupled applications after capture has started and Conditional Focus decided, so as to allow a more ergonomic, efficient and robust communication.
Challenges
We note some challenges that a good solution must address:
Proposed Solution
Observe that Capture Handle already produces events that can be used on the capturing side to address the challenges specified above.
Extend CaptureHandleConfig with an event handler:
This allows the capturee to receive a dedicated event with a MessagePort whenever a capturer chooses to initiate contact.
A channel is established for the capturee when it gets a NewCapturerEvent with
type
set to "started". When the session ends, the capturee gets a new event with the very same port, but withtype
now set to "stopped".To trigger the "started" event on the capturee, a capturer calls the following API:
To check if it makes sense to call getMessagePort(), the capturer must check
CaptureHandle.supportsMessagePort
.The value of
CaptureHandle.supportsMessagePort
is determined by whether the capturee has set a handler or not.The capturee may change the CaptureHandleConfig without breaking off existing channels.
The channel is broken if:
We extend the capturehandlechange event to help the capturer distinguish non-channel-breaking events from channel-breaking events.
Fine Details
!getCaptureHandle().supportsMessagePort
.Security Considerations
Captured apps are encouraged to validate the origin of messages.
As MessagePorts are transferrable, it is imperative to check each individual message's origin.
Open Questions
Sample Usage
On the captured side:
On the capturing side:
The text was updated successfully, but these errors were encountered: