-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early design review for the FLoC API #601
Comments
One thing we are particularly concerned about is the topic of "sensitive categories." As we wrote in the Ethical Web Principles, the web should not cause harm to society. Members of marginalised groups can often be harmed simply by being identified as part of that group. So we need to be really careful about this. Can you provide some additional information about possible mitigations against this type of misuse? |
Sensitive categoriesThe documentation of "sensitive categories" visible so far are on google ad policy pages. Categories that are considered "sensitive" are, as stated, not likely to be universal, and are also likely to change over time. I'd like to see:
Language about erring on the side of user privacy and safety when the "sensitivity" of a category is unknown might be appropriate. Browser supportI imagine not all browsers will actually want to implement this API. Is the result of this, from an advertisers point of view, that serving personalised ads is not possible in certain browsers? Does this create a risk of platform segmentation in that some websites could detect non-implementation of the API and refuse to serve content altogether (which would severely limit user choice and increase concentration of a smaller set of browsers)? A mitigation for this could be to specify explicitly 'not-implemented' return values for the API calls that are indistinguishable from a full implementation. The description of the experimentation phase mentions refreshing cohort data every 7 days; is timing something that will be specified, or is that left to implementations? Is there anything about cohort data "expiry" if a browser is not used (or only used to browse opted-out sites) for a certain period? Opting outI note that "Whether the browser sends a real FLoC or a random one is user controllable" which is good. I would hope to see some further work on guaranteeing that the "random" FLoCs sent in this situation does not become a de-facto "user who has disabled FLoC" cohort. It's worth further thought about how sending a random "real" FLoC affects personalised advertising the user sees - when it is essentially personalised to someone who isn't them. It might be better for disabling FLoC to behave the same as incognito mode, where a "null" value is sent, indicating to the advertiser that personalised advertising is not possible in this case. I note that sites can opt out of being included in the input set. Good! I would be more comfortable if sites had to explicitly opt in though. Have you also thought about more granular controls for the end user which would allow them to see the list of sites included from their browsing history (and which features of the sites are used) and selectively exclude/include them? If I am reading this correctly, sites that opt out of being included in the cohort input data cannot access the cohort information from the API themselves. Sites may have very legitimate reasons for opting out (eg. they serve sensitive content and wish to protect their visitors from any kind of tracking) yet be supported by ad revenue themselves. It is important to better explore the implications of this. Centralisation of ad targetingCentralisation is a big concern here. This proposal makes it the responsibility of browser vendors (a small group) to determine what categories of user are of interest to advertisers for targeting. This may make it difficult for smaller organisations to compete or innovate in this space. What mitigations can we expect to see for this? How transparent / auditable are the algorithms used to generates the cohorts going to be? When some browser vendors are also advertising companies, how to separate concerns and ensure the privacy needs of users are always put first? Accessing cohort informationI can't see any information about how cohorts are described to advertisers, other than their "short cohort name". How does an advertiser know what ads to serve to a cohort given the value "43A7"? Are the cohort descriptions/metadata served out of band to advertisers? I would like an idea of what this looks like. Security & privacy concernsI would like to challenge the assertion that there are no security impacts.
To clarify - does this mean that sites calling the API would receive an invalid/null result? In what circumstances in regular browsing mode is this the case? When a user hasn't been assigned to a valid cohort yet? Is that a common enough case that the probability of a 'null' result being due to use of incognito mode is relatively low? (Sites should not be able to detect the use of incognito mode.) Q14 is missing a response about how the browser gathers inputs for cohort calculating in incognito mode. I assume it gathers no data at all, but it would be good to say that explicitly. Thanks! |
Thanks @xyaoinum. Just to follow up, you are probably also aware of the EFF article which makes many of the same points from Amy's feedback. Despite the incendiary headline, please have a look through this feedback and take this on board as EFF is an important and credible stakeholder organisation when it comes to security & privacy on the web. |
Just to add, I don't think this is an accurate description of the status quo, and any response should acknowledge that. Particularly in the last few years, efforts have been made to deny sites behaviour and interest data from sources like 3rd party cookies and browser history detection via Javascript. One of the major motivations behind this has been the ability to combine it with PII for purposes that users consider unacceptable. At the very least this description of the status quo needs to be justified before use. |
I don't think this can be relied upon. Any change in behaviour can be used for tracking, and the null result is itself a cohort. A randomly selected cohort would be better. In fact it would be overall better if the browser selected a number of possible cohorts that fit the user's profile and randomly selected one in normal operation. Otherwise cohort membership will change too slowly to prevent it being used for tracking. The real problem is sites that already hold PII. There is no way I can think of it detect that and frustrate it, and as it stands FLoC is simply giving such sites more information that they would otherwise be able to gather with current default tracking protections in major browsers. |
Thanks for this review. I'm happy that the TAG is continuing the tradition of broad security-privacy aspects :-) In the meantime, perhaps this answers the concerns regarding incognito. |
Hello again, Not sure if this belongs to this review, but I sure hope that the final FloC will not have the potential of leaking web browsing history (which is not mentioned in the S&P questionnaire). |
Hi @lknik! The 50-bit SimHash values that you're calculating get masked down to many fewer bits before being used to pick your flock. It's designed for lots of collision — each cohort will cover thousands of people with hundreds of different browsing histories. |
@michaelkleber Can we then learn exactly what is the bit size and how it's defined? Would be great to have a full writeup to understand this proposal entirely. |
It seems like designing the SimHash to be resilient against all kinds of analysis, to prevent information about the user's browsing history being leaked, is likely to be extremely difficult. To prove it to be robust it would need to undergo extensive mathematical analysis, a very specialist subject that would probably require paying some academics to work on it. It should be externally validated. |
It's possible there's some confusion about the TAG's suggestion re: incognito mode. |
Hello, we looked at this again during our virtual face-to-face this week. I haven't seen a response to the points in my earlier feedback yet, and we also note that there has been a lot of community discussion about the potential negative implications of this work both for end-user privacy, and for the ad-supported sites which might depend on it. We're particularly concerned that FLoC is already being trialed, despite a lot of this feedback remaining unaddressed. We would be happy to arrange a call with you to discuss further, if that would help. |
Sorry for the very long delay in response. The delay was mostly due to the fact that your feedback, in concert with feedback from other parts of the community convinced us that we should take another go at the design. When we post the updated design, I will address the remaining relevant questions and concerns here. Note that it might make sense to remove the "already shipped" tag as it was in an Origin Trial only which has since ended. |
Thanks @jkarlin! We'll close this issue for now then. Please either reopen this one with updates, or open a new design review when you have a new design. |
To close the loop, I've opened a review in #726 for the Topics API that replaces FLoC. In that issue, I responded to the questions that were asked here. |
HIQaH! QaH! TAG!
I'm requesting a TAG review of the FLoC API.
In today's web, people’s interests are typically inferred based on observing what sites or pages they visit, which relies on tracking techniques like third-party cookies or less-transparent mechanisms like device fingerprinting. User privacy could be better protected if interest-based advertising could be accomplished without needing to collect a particular individual’s exact browsing history.
The FLoC API would enable ad-targeting based on the user’s general browsing interest, without the websites knowing their exact browsing history.
Please read the Security and Privacy self-review for the privacy goals and concerns.
Further details:
We'd prefer the TAG provide feedback as (please delete all but the desired option):
🐛 open issues in our GitHub repo for each point of feedback
☂️ open a single issue in our GitHub repo for the entire review
💬 leave review feedback as a comment in this issue and @-notify @xyaoinum, @jkarlin, @michaelkleber
The text was updated successfully, but these errors were encountered: