-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should the handle be an object? #68
Comments
I think this is a reasonable answer to the crop target on another tab use case. One thing to consider is that the current 1024 bytes size might be harder to define with objects. As a side note, and this also applies to structured exposure of CropTargets, if a web application starts to create CropTargets proactively, UAs should make the creation of such objects cheap and only start expensive operations when these objects are actually used to crop. This is another argument in favour of creating CropTarget synchronously. |
I intended that limitation to defang potential attacks where a captured site overwhelms a capturing site with extremely large messages. That's probably¹ not terribly concerning, though, as the captured site would have to pay as much for the attack as would the victim, and the cost would be paid without the attacker even knowing if there is a victim, or who it is. I think it's safe to drop this limitation².
There is value in the capturer being able to read something about the capturee, without letting the capturee know that there is a capturer.
Agreed. (But there are other arguments both ways.) For now, I am not concerned, because I expect a very limited number of CropTargets. But, yes, that is indeed a valid argument. If we later observe that sites often expose several CropTargets, this argument could increase in weight. -- |
To help drive collaboration between loosely-coupled apps, I have presented the following earlier today. Assume: dictionary ContentHints {
DOMString audio;
DOMString video;
};
dictionary CropTargetData {
DOMString name;
CropTarget cropTarget;
ContentHints suggestedContentHint;
};
dictionary CaptureHandleConfig {
... // Pre-existing.
// Suggested Content-Hint for uncropped video.
ContentHints suggestedContentHints;
// CropTargets of interest.
sequence<CropTargetData> cropTargets;
}; Then a VC app could do something like this: const stream = navigator.mediaDevices.getDisplayMedia(constraints);
const [track] = stream.getVideoTracks();
const captureHandle = track.getCaptureHandle();
if (Object.keys(captureHandle?.cropTargets).length > 1) {
// 1. Cycle through the crop-targets,
// grabbing one frame with each.
const thumbnails = GetThumbnails(track);
// 2. Display an in-content picker.
// The user chooses if they want to share
// the entire tab, or only a specific part of it.
const target = PromptUser(track, thumbnails);
// 3. Apply user’s choice.
track.cropTo(target); // No-op if `target` is `undefined`.
// 4. Transmit remotely.
TransmitRemotely(track);
} To put a cherry on top, in the This means that Teams, Zoom and Meet could all offer great functionality and performonace when capturing YouTube, Hulu, Slides, PowerPoint or anything else. |
TL;DR: I don't think captureHandle.handle was billed as an "identity" string for "Bootstrapping" "a connection" [1]:
Letting If storage partitioning has made out-of-band messaging nonviable, CropTargets aren't serializable, and we're trying to send objects through A way forward on that might be:
If the capturer never responds, then the capturee never learns it's being captured. |
I disagree with most parts of the previous message, but I think we can zero in on one point first - the one I quoted. We fundamentally disagree on how MessagePort would look like and how it could be used. I don't think the model you suggest is even possible, let alone desirable. First, feasibility. A tab can be captured by multiple capturers. A design centered around a MessagePort will require that the captured page register an event handler to receive a MessagePort after the capturer "dials in" by calling Second, even if this model were possible, it's evident that you noticed the following issue and tried to sidestep it - apps shouldn't have to waste CPU to repeatedly send the same message until a capturer materializes, which is why you introduced the caching mechanism. That is, you reinvented the handle. |
Let's take a step back then. My main point was:
Do we agree on this? This issue already tries to reinvent handle, opening up for alternative ideas. Neither a string identifier nor a traditional message port seem to satisfy requirements, so some innovation seems unavoidable. Since iteration has failed us, I think we need to reexamine the requirements, and figure out what shape makes most sense for them. With this issue and #11, The previous paragraph mentions every object that implements So for sake of illustration only, we might imagine a broadcast-channel-like API with targetOrigin, something like: // capturee
navigator.mediaDevices.capturer.postMessage({id, hint, cropTarget}, "*");
navigator.mediaDevices.capturer.onmessage = e => iRecognize(e.origin) && my1on1Port = e.port;
// capturer
captureController.onmessage = e => {
if (iRecognize(e.origin)) {
const {port1, port2} = new MessageChannel();
my1on1Port = port1;
captureController.postMessage(port2, {targetOrigin: e.origin, transfer: [port2]});
}
}; |
First, this illustration assumes that the capturee can only be captured by a single capturer. That is false, as I have pointed out in my last message. I have also explained that this should not be abstracted away for the sake of illustration, because it has corollaries. Namely, the only model that will work for producing a MessagePort, is one in which the capturee registers to be notified when the capturer tries to establish a MessagePort to it, and this notification has ramifications on issues which you have expressed concern for - self-censorship. Second, this does not work for loosely-coupled applications, as they would not be aligned on the protocol for the messages sent over the MessagePort. Again - a use-case you have previously cared much about. It is for those loosely-coupled apps that I have produced the proposal in this message. |
No, the illustration is of a new API where Self-censorhip is already possible in the current spec once the capturer establishes a "connection". I don't think we should solve more features for loosely-coupled applications atm. |
So to clarify the illustration in #68 (comment) further:
|
The current spec does NOT force an establishment of a connection. My proposal shows a simple mechanism for publishing actionable information without requiring a connection. That proposal was designed precisely to address the concerns you have previously raised, which were (1) self-censorship and (2) usefulness to loosely-coupled apps.
Could you please explain this 180-degrees change in your position? |
First, thank you for attempting to address concerns you thought I might have. I appreciate that, but it's not that simple. It goes back to use cases. I found users having nextslide and previousslide on a loosely-coupled captured page compelling. But I'm fairly confident those may also end up becoming keyboard hardware hotkeys someday, which is a high bar. But those also opened up a long tail of ideas that didn't make it ("firstslide", "lastslide", w3c/mediasession#282) and maybe never should, because there's a risk and a cost to baking into the web platform forever what we think apps want today. I don't find loosely-coupled pages advertising crop regions compelling enough to bake into the web platform now. It's not clear to me we're ready to say how this should work forever. App-controls also encroach on what's a fully user-controlled feature today, so there are risks they may sometimes undermine user choice if we're not careful. Since this spec already enables apps to experiment in this space, I'd rather wait an see how they solve this.
I remain curious about an answer to this question. |
I might have been premature in moving Capture Handle from the WICG to the WebRTC Working Group. It's unclear to me that we are aligned on the vision here. I've also not heard of any concrete plans from Firefox or Safari to implement the spec, so the timelines we have in mind are going to be different. Shall we rectify this mistake by moving the spec back to the WICG?
If you mean Capture Actions - I have been supportive multiple times. But this is an orthogonal use-case. |
I mean adding a MessagePort (or rather a messaging scheme #68 (comment)). It seems to me we can make progress there, which should give collaborating apps everything they need to experiment with handling these use cases. |
We are not aligned on any topic, and no browser other than Chrome has shared concrete plans to implement Capture Handle. I don't think anyone benefits of Capture Handle remaining in the WebRTC WG. The current setup is counter-productive. Let's move this back to the WICG. |
It occurs to me that #69 attempts to deprecate Capture Handle Identity altogether, but only works for tightly-coupled apps. (An unsuccessful attempt imho, e.g. because of Conditional Focus' window of opportunity.) |
That is a misunderstanding. |
I sense generally good agreement on the use case here, with the only differences being over API shape. This seems like something we should be able to make progress on. My concerns that I don't believe have been addressed here are:
While (1) might be addressed by adding the message to the event, this would seem to go against § 7.7. Use plain Events for state, perhaps suggesting that recreating (post) message-passing behavior using events is not a desirable pattern. |
I think there are multiple use cases here, and we only have agreement on some. It may be that the use-cases where we disagree are appear uninteresting to you, but they're important to me, so I disagree with the characterization of "general agreement". To clarify, we disagree on:
I intend to get back to this topic (future reshuffles of duties and priorities notwithstanding). When that happens, if you still find these uninteresting, to the point of objecting to tackling the problem, then I don't see a way forward for me other than to fork the document and proceed with other interested parties. If you can think of a nicer way forward, please do share it. |
How are the events insufficient for ensuring that nothing is missed?
As mentioned, I don't see any problems with the current approach; in fact, you have yourself convinced me to stop exposing the message on the event, because we both agreed it was unnecessary - see here. But assume for the sake of argument that adding the message back were necessary. The first words in § 7.7. Use plain Events are: "Where possible". |
At the moment, the handle field is a string. This limits it to stringable objects rather than serializable objects. If we change to an object, applications will be able to expose CropTargets as well.
@youennf, I think you were amenable to this during TPAC 2022...?
@jan-ivar, your opinion?
Note that I still intend to pursue orthogonal proposals for:
handle
is one way and immediate, which is important [a] for Conditional Focus without exceeding the task-end limit, and [b] for capturers that don't wish to communicate with the capturee and alert it to the existence of a capture session.)The text was updated successfully, but these errors were encountered: