-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solve user agent camera/microphone double-mute #39
Comments
Repeating (and rephrasing) myself from Chromium bug since it'd be unreasonable to expect the audiences to be identical: Media-controls exposed by the browser, which allows an ongoing mic/camera/screen-capture to be muted by the user, communicates an implicit promise from the browser to the user. If the application is allowed to override that promise, it's allowed to break that promise. I understand the double-mute problem and hope that there could be other ways to resolve it. For example, maybe if the application tries to unmute the track, the browser could show the user a prompt to approve that. This would mean that a user can still click the red mic button to unmute, but because additional approval is required, the application cannot unmute unilaterally in response to an arbitrary user gesture. |
That's a good idea. Having the method be asynchronous would allow this. partial interface MediaStreamTrack {
Promise<undefined> unmute();
} The goal of a spec here is to allow innovation in this space, without arriving at specific UX. It could be a prompt, or maybe a toast message is enough. I think a lot of users would be surprised to learn that when they mute microphone or camera on a web site today, they have zero assurances that it actually happens. Well-behaved websites have lulled most users into feeling in control when they're not. Most probably don't consider that the website may turn camera or microphone back on at any time as long as the page is open. The page doesn't even need to be open anymore: a different (origin) accomplice page can reopen/navigate to it without user interaction at a later point, since gUM doesn't require transient activation. We dropped the ball on transient activation in gUM. Having better UA-integrated muting with transient activation might give us a second chance to rectify some privacy concerns. For instance, a UA could choose to mute for the user if it detects a page accessing camera or microphone on pageload or without interaction. |
Also, in Safari, the mute/unmute is page wide, which means that all microphone tracks are either muted or unmuted. |
I fear this would create confusion and a false symmetry suggesting muting is under application control when it is not.
Applications have Here's a fiddle for toggling multiple tracks demonstrating Firefox updating its camera URL bar indicator and OS/hardware light (modulo bug 1694304 on Windows for the hardware light), whenever the number of enabled tracks goes to zero. I suggest leaving mute to UAs and concentrating on how apps can signal interest to unmute, to solve the issue at hand.
Do you mean page wide or document wide? What about iframes?
Maybe we could use constraints? 🙂
Maybe, except |
There's also the issue of multiple cameras. If we end up with |
A simplifying factor is that UA muting is 100% about privacy, and as soon as one track is unmuted on a page, then there's no more privacy. So per-track mute would serve no purpose, and make for a terrible API: await Promise.all(applicationKeepsTrackOfAllTracksItIsUsing.map(track => track.unmute())); // ugh But with that understanding (of UA mute as a privacy feature), it seems POLA for So I think I agree mute is a property of the source by that definition. But there can be multiple sources in Even if we don't care about that, we'd need |
This issue had an associated resolution in WebRTC WG 2023-04-18 – (Issue 39: Solve user agent camera/microphone double-mute):
|
For capture tracks, Page/document mute scope is probably covering 90% of the cases at least and is simpler to implement.
This is a MAY though. Setting all tracks of the same source with Looking at Safari, let's say that Safari would update its muted icon when all tracks are Looking at Web Applications, they tend to have clone tracks in window environment (local rendering and PC, potential different sizes as well), in the future in workers as well (for encoding/networking) or other windows (via transfer). Having to set enabled to false for each one of these objects, including transferred tracks, is cumbersome and potentially error prone. Looking at OS support, kAUVoiceIOProperty_MutedSpeechActivityEventListener and kAUVoiceIOProperty_MuteOutput are potentially useful to implement the "Are you talking UI" in the screenshot you added above. Overall, it seems cleaner to me to have two separate APIs:
|
Alternative to |
It was a WG design choice to reuse MST for other sources to avoid inheritance. The cost of that is living with the fact that not all sources have all track abilities, NOT that tracks only have abilities shared by all sources. getUserMedia returns camera and microphone tracks, so adding attributes, methods and constraints specific to camera and microphone should be fine. If it's not, then time to split inheritance. E.g. track.muted only makes sense for camera and microphone, and Other sources do not have the double-mute problem, so to not complicate discussion, let's not discuss them here.
Here's a SHOULD: "When a "live", unmuted, and enabled track sourced by a device exposed by getUserMedia() becomes either muted or disabled, and this brings all tracks connected to the device (across all navigables the user agent operates) to be either muted, disabled, or stopped, then the UA SHOULD relinquish the device within 3 seconds..."
Why would a UA mute an in-view application just because it disabled its tracks? That would be terrible for web compat. To clarify, Mozilla needs no spec changes to solve turning off privacy indicators or camera light. Our view is the path to interop there is changing the MAY and SHOULD to MUST. But please let's discuss that in a separate issue.
No that is not our plan. As explained in the OP we have a Sorry for any misunderstanding, but it's not my intent to standardize UA muting here, only application-induced unmuting. Muting remains up to user agents, and I think it is important for privacy that they be allowed to continue to own that problem. The scope of the proposal in the OP (and this issue) was to arm applications with tools to unmute themselves IF the user agent mutes them, not define when user agents mute them. |
We have
|
FYI this was recently fixed upstream in https://webrtc-review.googlesource.com/c/src/+/302200 |
I'd like to revive this discussion, since these types of system-level controls (either at the UA or the OS) are becoming more common and we have observed that they create a lot of confusion for users.
However, I disagree with this statement:
The reason sites don't listen to the This has become a major problem for VC applications and I think we need to solve it properly. |
The problem as I see it is that users can mute through multiple sources - app, UA, OS and hardware. The propagation of state through these layers is presently incomplete - an opportunity for us to earn our keep. In the high-level, I think we have to provide two mechanisms:
1. ListenThe principle here should be fairly uncontroversial.
I prefer no2. To start the ball rolling on a concerete proposal: enum MuteCause {
"unspecified", // Catch-all default.
"operating-system-choice",
"user-agent-choice",
// Extensible to hardware, and to -issue if it's not a -choice.
};
interface MuteEvent : Event {
/* Exercise for the reader */
};
partial interface MediaStreamTrack {
// Note that multiple causes might apply concurrently.
readonly attribute sequence<MuteCause> causes;
}; 2. ControlThere's some potential for controversy here, but I think we can resolve it. Jan-Ivar proposed:
While I'm sure VC apps would be delighted to have such control, I am afraid that no security/privacy department in any user agent would ever approve it (unless we add some UA-prompt; foreshadowing). Jan-Ivar suggested transient activation and focus as necessary gating mechanisms. These are fine requirements, but they are not sufficient as any transient activation would look identical to the user agent here, possibly subverting the user's actual intentions if they clicked on a mislabelled button. I'd suggest requiring also a PEPC-like prompt. Reasonable app code would then look something like this: unmuteButton.addEventListener('click', "unmuteClicked();");
async function unmuteClicked() {
// If necessary, prompt the user to unmute at UA-level etc.
if (upstreamUnmute) {
try {
await track.unmute();
} catch (error) {
return;
}
}
// Proceed with the "normal" unmuting in the app.
// * Resume remote transmission of media.
// * Change UX to reflect that clicking the button now means "mute".
// * Update internal state.
} |
I am not sure how much we need a mute reason. Distinct requestUnmute failures might be sufficient. |
|
The mute reasons may vary within the muted period and firing mute events when only the reason is changing is not really appealing. |
I'm OK with the mute event only firing when the muted attribute changes (not the muted reason attribute). WDYT?
Does this mean you support the approach of having an attribute for the mute cause? |
Why is it not appealing to fire a mute event when the set of reasons changes? (Note that we have a separate unmute event already, btw.) |
I see this as a potential improvement while I see an API to request unmute as a blocker. I also think that an API to request capture to be muted would be useful. The current approach (use enabled=false on all tracks of a device), is a bit tedious and might face backward compatibility issues. |
I agree. Let's focus on that first.
Also agree. This requires some more thinking because system-level muting is not necessarily equivalent to muting a source or a set of tracks. |
I'm having some trouble parsing the last few messages on this thread. If we're all in agreement that we want to add an API exposing the OS-mute-state, then I'll gladly present something in the next available interim opportunity. Specifically, I'd like to present my proposal here. Before I do - @youennf, you have said that firing an event whenever the reason changes is unappealing. I'd still like to understand why; understanding would help drive a better presentation in said interim. |
@guidou and I seem to agree on focusing on the following items (sorted by priority):
Getting back to
I would tend to unmute at the device level. |
The OP assumes users can always unmute. If there's a second actor controlling mute that the user agent cannot affect, then double-mute likely remains the best way to handle that piece. Otherwise we get:
Today's apps show C as A.¹ To maintain this, they'd need to distinguish I'd support those two Regarding method shape, I think
1. A case could be made for showing C as B, but then nothing happens when the user clicks on it, which seems undesirable. This is an app decision of course. |
We would have to ask Guido to see if you two agree, but I personally disagree with your prioritization, @youennf. It's necessary for Web applications like Meet to know when the input devices is muted through the upstream (browser or OS), or else the app can't update its UX to show the input device is muted, which means the user won't understand what's wrong and won't press any app-based unmute button, which then means the app won't even call The very first step is for the application to know the upstream is muted. That's the top priority. |
I agree that the media session action handler should not need to go to MediaStreamTrack to do its processing.
In that case, it seems better to actually design the action handler to execute first, and the mute events to fire second. Also, maybe we should deprecate
I am not sure this is needed, but the |
I think the following proposal is better: interface MuteReason {
readonly boolean upstream;
};
partial interace MediaStreamTrack {
sequence<MuteReason> getMuteReasons();
}; This is simple, it solves the problem, it is immediately available when the enum MuteSource {"unspecified", "user-agent", "operating-system", "hardware"};
interface MuteReason {
readonly boolean upstream;
readonly MuteSource source;
}; |
It is essential to know which devices are muted and which ones aren't. Multiple cameras and/or microphones is a very common case. The Media session looks like it is currently a poor fit that needs a lot of changes to a different spec to support our use case. With media session:
|
During the editors' meeting, Youenn suggested extending
I think it would still be unhelpful to go down that rabbit hole. The show-stoppers are:
|
I've published a skeleton of a PR in w3c/mediacapture-main#979 - PTAL. If you think |
When discussing muting, we should also reflect on the (long) discussion on VideoTrackGenerator.mute - w3c/mediacapture-transform#81 |
I see benefits in MediaSession approach. It is an API used for updating UI/visible application states which is exactly what we are talking about here. It also seems easier to do these updates in a single callback, compared to requiring the web app to potentially coalesce itself multiple mute events. There are things that should be improved in MediaSession, independently on whether we expose muted reason or not. With regards to the definition of mute reasons, it makes sense to me to piggy-back on MediaSession. |
To help the discussion culminate in a decision, comparing PRs would be helpful. I have produced a PR for the approach I suggested. @youennf, could you produce a PR for yours? |
Here is a media session based proposal:
These seem like valid improvements to the existing MediaSession API, independently of whether we expose a boolean on MediaStreamTrack to help disambiguating muted. Or maybe we should think of removing togglemicrophone/togglecamera, if we think onmute/onunmute is superior. It would help to get the sense of MediaSession people, @steimelchrome, @jan-ivar, thoughts? I think it is worth preparing slides for both versions, doing PRs now seems premature.
|
I like this proposal. I don't see a need to add more information since this seems to be exactly what the mediaSession API was built for (whether the toggles are in a desktop browser UX or on a phone lock screen seems irrelevant). Initial state seems solved by firing the mediaSession events early, e.g. on pageload. This issue is "Solve user agent camera/microphone double-mute", putting other sources out of scope. Multiple devices also seems out of scope since none of the global UA toggles so far (Safari or Firefox) work per-device AFAIK. They're page or browser global toggles, extending controls present in video conference pages today into the browser, imbuing them with ease of access and some privacy assurance that the webpage cannot hear them, solving the simple use cases of users not being heard, or worrying they can be heard (by participants or webpage). I.e. they affect all devices that page has. I think Chrome's mute behavior is a bug. I've filed w3c/mediacapture-main#982 to clarify the spec, so let's discuss that there. I think we should standardize requesting unmute. Too much in this thread. |
We need to solve all use cases that arise in practice, not just the simplest one.
We need to solve all use cases that arise in practice, not just the ones indicated in the first message of this thread.
Browser toggles are just one use case that needs to be handled. OS toggles (which can be per device, as in ChromeOS and maybe other OSes) need to be handled too. Hardware toggles need to be considered as well. Just because these were not mentioned in the original message doesn't really mean they're out of scope.
It's not a bug, based on the current language of the spec. If the problem is that the
I agree. Apps already implement a way to mute at the app level.
Slides that show how the proposal solves the problems should be enough. We have a slot in the December 12 meeting to continue discussing this. If you have some slides available, maybe we can look at them then. |
Was this suggested at some point?
PRs reveal the complexity that otherwise hides behind such phrases as "we could just..." |
Yes in #39 (comment).
This issue has 70 comments. Triaging discussion out to other (new or existing) issues such as w3c/mediacapture-main#982 or w3c/mediasession#279 seems worthwhile to me, or I don't see how we're going to reach any kind of consensus on all these feature requests. "Mute reason" probably deserves its own issue as well (there were 14 comments here when it was introduced to this conversation in #39 (comment)). It seems largely orthogonal to the OP proposal of letting apps unmute.
These are all User Agent toggles IMHO, the details of which W3C specs tend to leave to the User Agent, focusing instead on the surface between web app and UA. I think that's the level of abstraction we need to be at. |
Thanks for clarifying.
Not completely orthogonal, because
As a representative of one open source browser who has filed bugs and looked into the code of another open source browser, I hope you'll find this comment compelling. It discusses the value transparency brings to the entire ecosystem. |
Instead of the OP proposal of a navigator.mediaSession.setMicrophoneActive(false); E.g. an app calling this with user attention and transient activation, may be enough of a signal to the UA to unmute tracks it has muted in this document, either raising a toast message about it after the fact, or a prompt ahead of it. The remaining problem is how the app would learn whether unmuting was successful or not. E.g. might this suffice? navigator.mediaSession.setMicrophoneActive(false);
const [unmuted] = await Promise.all([
new Promise(r => track.onunmute),
new Promise(r => setTimeout(r, 0))
]); |
setMicrophoneActive looks good to me if we can validate its actual meaning with the media wg. |
Hiding an unmute control seems a small dent in the disappointment and frustration of being unable to unmute. IOW a secondary problem to the first. |
As I have mentioned multiple times before - the user agent has no idea what "shiny button text" means to the user, or what the user believed they were approving when they conferred transient activation on the page. Only the prompt-based approach is viable.
It does not look at all "small" to me. In fact, I am shocked that after months of debating whether an API should be sync or async, which would have no user-visible effect, you label this major user-visible issue as "small." What is the methodology you employ to classify the gravity of issues? |
I repeat - there is nothing "small" about a user clicking a button and it disappearing without having an effect. It looks like a bug and it would nudge users towards abandoning the Web app in favor of a native-app competitor. Web developers care much more about their users' perception of the app's reliability, than they do about the inconvenience of adding "await" to a method invocation. Let's focus our attention where it matters! |
Thank you for this engagement, Jan-Ivar. I am looking forward to hear why you disagree. Orthogonally, I'll be proposing that the rules of conduct in the WG be amended to discourage the use of the thumbs-down emoji without elaboration. Noting disagreement without elaborating on the reasons serves no productive purpose. |
Closing this as the double-mute problem was instead solved in w3c/mediasession#312. Here's an example of how a website can synchronize application mute state with that of the browser. |
User agent mute-toggles for camera & mic can be useful, yielding enhanced privacy (no need to trust site), and quick access (a sneeze coming on, or a family member walking into frame?)
privacy.webrtc.globalMuteToggles
in about:config)It's behind a pref in Firefox because:
This image is titled: "Am I muted?"
This issue is only about (1) the double-mute problem.
We determined we can only solve the double-mute problem by involving the site, which requires standardization.
The idea is:
The first point requires no spec change: sites can listen to the mute and unmute events on the track (but they don't).
The second point is key: if the user sees the site's button turn to "muted", they'll expect to be able to click it to unmute.
This is where it gets tricky, because we don't want to allow sites to unmute themselves at will, as this defeats any privacy benefits.
The proposal here is:
partial interface MediaStreamTrack { undefined unmute(); }
It would throw
InvalidStateError
unless it has transient activation, is fully active, and has focus. User agents may also throwNotAllowedError
for any reason, but if they don't then they must unmute the track (which will fire the unmute event).This should let user agents that wish to develop UX without the double-mute problem.
The text was updated successfully, but these errors were encountered: