Early design review for the FLoC API #601

xyaoinum · 2021-01-25T20:02:13Z

HIQaH! QaH! TAG!

I'm requesting a TAG review of the FLoC API.

In today's web, people’s interests are typically inferred based on observing what sites or pages they visit, which relies on tracking techniques like third-party cookies or less-transparent mechanisms like device fingerprinting. User privacy could be better protected if interest-based advertising could be accomplished without needing to collect a particular individual’s exact browsing history.

The FLoC API would enable ad-targeting based on the user’s general browsing interest, without the websites knowing their exact browsing history.

Please read the Security and Privacy self-review for the privacy goals and concerns.

Explainer¹ (minimally containing user needs and example code): https://github.com/WICG/floc
Security and Privacy self-review²: https://github.com/WICG/floc/blob/master/security-and-privacy-self-review.md
GitHub repo (if you prefer feedback filed there): https://github.com/WICG/floc
Primary contacts (and their relationship to the specification):
- Yao Xiao (@xyaoinum), Google
- Josh Karlin (@jkarlin), Google
- Michael Kleber (@michaelkleber), Google
Organization/project driving the design: Google, Privacy Sandbox
External status/issue trackers for this feature (publicly visible, e.g. Chrome Status): https://chromestatus.com/features/5710139774468096

Further details:

I have reviewed the TAG's API Design Principles
The group where the incubation/design work on this is being done (or is intended to be done in the future): WICG
The group where standardization of this work is intended to be done ("unknown" if not known): Unknown
Existing major pieces of multi-stakeholder review or discussion of this design: Unknown
Major unresolved issues with or opposition to this design: None at the moment
This work is being funded by: Google

We'd prefer the TAG provide feedback as (please delete all but the desired option):

🐛 open issues in our GitHub repo for each point of feedback

☂️ open a single issue in our GitHub repo for the entire review

💬 leave review feedback as a comment in this issue and @-notify @xyaoinum, @jkarlin, @michaelkleber

torgo · 2021-02-22T17:51:03Z

One thing we are particularly concerned about is the topic of "sensitive categories." As we wrote in the Ethical Web Principles, the web should not cause harm to society. Members of marginalised groups can often be harmed simply by being identified as part of that group. So we need to be really careful about this. Can you provide some additional information about possible mitigations against this type of misuse?

rhiaro · 2021-02-23T00:34:30Z

Sensitive categories

The documentation of "sensitive categories" visible so far are on google ad policy pages. Categories that are considered "sensitive" are, as stated, not likely to be universal, and are also likely to change over time. I'd like to see:

an in-depth treatment of how sensitive categories will be determined (by a diverse set of stakeholders, so that the definition of "sensitive" is not biased by the backgrounds of implementors alone);
discussion of if it is possible - and desirable (it might not be) - for sensitive categories to differ based on external factors (eg. geographic region);
a persistent and authoritative means of documenting what they are that is not tied to a single implementor or company;
how such documentation can be updated and maintained in the long run;
and what the spec can do to ensure implementers actually abide by restrictions around sensitive categories.

Language about erring on the side of user privacy and safety when the "sensitivity" of a category is unknown might be appropriate.

Browser support

I imagine not all browsers will actually want to implement this API. Is the result of this, from an advertisers point of view, that serving personalised ads is not possible in certain browsers? Does this create a risk of platform segmentation in that some websites could detect non-implementation of the API and refuse to serve content altogether (which would severely limit user choice and increase concentration of a smaller set of browsers)? A mitigation for this could be to specify explicitly 'not-implemented' return values for the API calls that are indistinguishable from a full implementation.

The description of the experimentation phase mentions refreshing cohort data every 7 days; is timing something that will be specified, or is that left to implementations? Is there anything about cohort data "expiry" if a browser is not used (or only used to browse opted-out sites) for a certain period?

Opting out

I note that "Whether the browser sends a real FLoC or a random one is user controllable" which is good. I would hope to see some further work on guaranteeing that the "random" FLoCs sent in this situation does not become a de-facto "user who has disabled FLoC" cohort.

It's worth further thought about how sending a random "real" FLoC affects personalised advertising the user sees - when it is essentially personalised to someone who isn't them. It might be better for disabling FLoC to behave the same as incognito mode, where a "null" value is sent, indicating to the advertiser that personalised advertising is not possible in this case.

I note that sites can opt out of being included in the input set. Good! I would be more comfortable if sites had to explicitly opt in though.

Have you also thought about more granular controls for the end user which would allow them to see the list of sites included from their browsing history (and which features of the sites are used) and selectively exclude/include them?

If I am reading this correctly, sites that opt out of being included in the cohort input data cannot access the cohort information from the API themselves. Sites may have very legitimate reasons for opting out (eg. they serve sensitive content and wish to protect their visitors from any kind of tracking) yet be supported by ad revenue themselves. It is important to better explore the implications of this.

Centralisation of ad targeting

Centralisation is a big concern here. This proposal makes it the responsibility of browser vendors (a small group) to determine what categories of user are of interest to advertisers for targeting. This may make it difficult for smaller organisations to compete or innovate in this space. What mitigations can we expect to see for this?

How transparent / auditable are the algorithms used to generates the cohorts going to be? When some browser vendors are also advertising companies, how to separate concerns and ensure the privacy needs of users are always put first?

Accessing cohort information

I can't see any information about how cohorts are described to advertisers, other than their "short cohort name". How does an advertiser know what ads to serve to a cohort given the value "43A7"? Are the cohort descriptions/metadata served out of band to advertisers? I would like an idea of what this looks like.

Security & privacy concerns

I would like to challenge the assertion that there are no security impacts.

A large set of potentially very sensitive personal data is being collected by the browser to enable cohort generation. The impact of a security vulnerability causing this data to be leaked could be great.
The explainer acknowledges that sites that already know PII about the user can record their cohort - potentially gathering more data about the user than they could ever possibly have access to without explicit input from the user - but dismisses this risk by comparing it to the status quo, and does not mention this risk in the Security & Privacy self-check.
Sites which log cohort data for their visitors (with or without supplementary PII) will be able to log changes in this data over time, which may turn into a fingerprinting vector or allow them to infer other information about the user.
We have seen over past years the tendency for sites to gather and hoard data that they don't actually need for anything specific, just because they can. The temptation to track cohort data alongside any other user data they have with such a straightforward API may be great. This in turn increases the risk to users when data breaches inevitably occur, and correlations can be made between known PII and cohorts.
How many cohorts can one user be in? When a user is in multiple cohorts, what are the correlation risks related to the intersection of multiple cohorts? "Thousands" of users per cohort is not really that many. Membership to a hundred cohorts could quickly become identifying.

How do the features in this specification work in the context of a browser's Private Browsing or Incognito mode?

The behavior is the same as if the interest cohort is invalid/null in a regular browsing mode, i.e. an exception will be thrown.

To clarify - does this mean that sites calling the API would receive an invalid/null result? In what circumstances in regular browsing mode is this the case? When a user hasn't been assigned to a valid cohort yet? Is that a common enough case that the probability of a 'null' result being due to use of incognito mode is relatively low? (Sites should not be able to detect the use of incognito mode.)

Q14 is missing a response about how the browser gathers inputs for cohort calculating in incognito mode. I assume it gathers no data at all, but it would be good to say that explicitly.

Thanks!

torgo · 2021-03-08T17:10:43Z

Hi @xyaoinum - do you have anything you can share with us in response to the above points? It would be good to understand where we go from here. How would you like to proceed? At this point we are waiting for your feedback. /cc @chrishtr.

xyaoinum · 2021-03-08T17:23:27Z

Hi @torgo, @rhiaro: Thank you for your questions and comments. We're still thinking through them and we hope to respond to these points within a week or two.

torgo · 2021-03-09T08:47:57Z

Thanks @xyaoinum. Just to follow up, you are probably also aware of the EFF article which makes many of the same points from Amy's feedback. Despite the incendiary headline, please have a look through this feedback and take this on board as EFF is an important and credible stakeholder organisation when it comes to security & privacy on the web.

kuro68k · 2021-03-11T10:05:45Z

* The explainer acknowledges that sites that already know PII about the user can record their cohort - potentially gathering more data about the user than they could ever possibly have access to without explicit input from the user - but dismisses this risk by comparing it to the status quo, and does not mention this risk in the Security & Privacy self-check.

Just to add, I don't think this is an accurate description of the status quo, and any response should acknowledge that. Particularly in the last few years, efforts have been made to deny sites behaviour and interest data from sources like 3rd party cookies and browser history detection via Javascript. One of the major motivations behind this has been the ability to combine it with PII for purposes that users consider unacceptable.

At the very least this description of the status quo needs to be justified before use.

kuro68k · 2021-03-12T08:20:53Z

To clarify - does this mean that sites calling the API would receive an invalid/null result? In what circumstances in regular browsing mode is this the case? When a user hasn't been assigned to a valid cohort yet? Is that a common enough case that the probability of a 'null' result being due to use of incognito mode is relatively low? (Sites should not be able to detect the use of incognito mode.)

I don't think this can be relied upon. Any change in behaviour can be used for tracking, and the null result is itself a cohort.

A randomly selected cohort would be better. In fact it would be overall better if the browser selected a number of possible cohorts that fit the user's profile and randomly selected one in normal operation. Otherwise cohort membership will change too slowly to prevent it being used for tracking.

The real problem is sites that already hold PII. There is no way I can think of it detect that and frustrate it, and as it stands FLoC is simply giving such sites more information that they would otherwise be able to gather with current default tracking protections in major browsers.

lknik · 2021-03-15T16:04:24Z

@rhiaro

To clarify - does this mean that sites calling the API would receive an invalid/null result?

Thanks for this review. I'm happy that the TAG is continuing the tradition of broad security-privacy aspects :-)

In the meantime, perhaps this answers the concerns regarding incognito.

lknik · 2021-03-25T16:59:47Z

Hello again,

Not sure if this belongs to this review, but I sure hope that the final FloC will not have the potential of leaking web browsing history (which is not mentioned in the S&P questionnaire).

michaelkleber · 2021-03-25T17:02:25Z

Hi @lknik! The 50-bit SimHash values that you're calculating get masked down to many fewer bits before being used to pick your flock. It's designed for lots of collision — each cohort will cover thousands of people with hundreds of different browsing histories.

lknik · 2021-03-26T13:01:06Z

@michaelkleber Can we then learn exactly what is the bit size and how it's defined? Would be great to have a full writeup to understand this proposal entirely.

kuro68k · 2021-03-26T13:18:08Z

It seems like designing the SimHash to be resilient against all kinds of analysis, to prevent information about the user's browsing history being leaked, is likely to be extremely difficult.

To prove it to be robust it would need to undergo extensive mathematical analysis, a very specialist subject that would probably require paying some academics to work on it. It should be externally validated.

samuelweiler · 2021-04-16T14:52:00Z

It's possible there's some confusion about the TAG's suggestion re: incognito mode.

rhiaro · 2021-05-13T16:14:43Z

Hello, we looked at this again during our virtual face-to-face this week. I haven't seen a response to the points in my earlier feedback yet, and we also note that there has been a lot of community discussion about the potential negative implications of this work both for end-user privacy, and for the ad-supported sites which might depend on it. We're particularly concerned that FLoC is already being trialed, despite a lot of this feedback remaining unaddressed. We would be happy to arrange a call with you to discuss further, if that would help.

jkarlin · 2021-08-10T16:29:16Z

Sorry for the very long delay in response. The delay was mostly due to the fact that your feedback, in concert with feedback from other parts of the community convinced us that we should take another go at the design. When we post the updated design, I will address the remaining relevant questions and concerns here.

Note that it might make sense to remove the "already shipped" tag as it was in an Origin Trial only which has since ended.

rhiaro · 2021-08-11T15:58:52Z

Thanks @jkarlin! We'll close this issue for now then. Please either reopen this one with updates, or open a new design review when you have a new design.

jkarlin · 2022-03-25T19:51:54Z

To close the loop, I've opened a review in #726 for the Topics API that replaces FLoC. In that issue, I responded to the questions that were asked here.

xyaoinum added Progress: untriaged Review type: CG early review An early review of general direction from a Community Group labels Jan 25, 2021

torgo assigned torgo and rhiaro Feb 9, 2021

torgo added privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. Topic: privacy Venue: WICG and removed Progress: untriaged labels Feb 9, 2021

torgo added this to the 2021-02-15-week milestone Feb 9, 2021

w3cbot mentioned this issue Feb 10, 2021

Early design review for the FLoC API w3cping/tracking-issues#180

Open

plinss self-assigned this Feb 10, 2021

plinss modified the milestones: 2021-02-15-week, 2021-02-22-week Feb 15, 2021

atanassov self-assigned this Feb 15, 2021

torgo modified the milestones: 2021-02-22-week, 2021-03-08-week Feb 22, 2021

plinss added the Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review label Feb 24, 2021

dmarti mentioned this issue Mar 11, 2021

General concerns about FLoC-powered abuse WICG/floc#36

Closed

dmarti mentioned this issue Mar 25, 2021

Sites recording a user entering and leaving a sensitive category WICG/floc#77

Closed

samuelweiler mentioned this issue Apr 16, 2021

Randomly join cohorts to frustate tracking WICG/floc#59

Open

plinss modified the milestones: 2021-03-08-week, 2021-05-10-F2F-Arakeen Apr 26, 2021

dmarti mentioned this issue Apr 26, 2021

The security/privacy section should cover other uses of FLoC, such as dynamic pricing, demographic targeting of headlines, and targeted malvertising WICG/floc#105

Open

torgo added Review type: Already shipped Already shipped in at least one browser privacy-needs-resolution Issue the Privacy Group has raised and looks for a response on. and removed privacy-needs-resolution Issue the Privacy Group has raised and looks for a response on. labels May 11, 2021

rhiaro closed this as completed Aug 11, 2021

rhiaro removed the Review type: Already shipped Already shipped in at least one browser label Aug 11, 2021

jkarlin mentioned this issue Mar 25, 2022

Early design review for the Topics API #726

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early design review for the FLoC API #601

Early design review for the FLoC API #601

xyaoinum commented Jan 25, 2021 •

edited

Loading

torgo commented Feb 22, 2021

rhiaro commented Feb 23, 2021

torgo commented Mar 8, 2021

xyaoinum commented Mar 8, 2021

torgo commented Mar 9, 2021

kuro68k commented Mar 11, 2021

kuro68k commented Mar 12, 2021

lknik commented Mar 15, 2021 •

edited

Loading

lknik commented Mar 25, 2021

michaelkleber commented Mar 25, 2021

lknik commented Mar 26, 2021

kuro68k commented Mar 26, 2021

samuelweiler commented Apr 16, 2021

rhiaro commented May 13, 2021

jkarlin commented Aug 10, 2021

rhiaro commented Aug 11, 2021

jkarlin commented Mar 25, 2022

Early design review for the FLoC API #601

Early design review for the FLoC API #601

Comments

xyaoinum commented Jan 25, 2021 • edited Loading

torgo commented Feb 22, 2021

rhiaro commented Feb 23, 2021

Sensitive categories

Browser support

Opting out

Centralisation of ad targeting

Accessing cohort information

Security & privacy concerns

torgo commented Mar 8, 2021

xyaoinum commented Mar 8, 2021

torgo commented Mar 9, 2021

kuro68k commented Mar 11, 2021

kuro68k commented Mar 12, 2021

lknik commented Mar 15, 2021 • edited Loading

lknik commented Mar 25, 2021

michaelkleber commented Mar 25, 2021

lknik commented Mar 26, 2021

kuro68k commented Mar 26, 2021

samuelweiler commented Apr 16, 2021

rhiaro commented May 13, 2021

jkarlin commented Aug 10, 2021

rhiaro commented Aug 11, 2021

jkarlin commented Mar 25, 2022

xyaoinum commented Jan 25, 2021 •

edited

Loading

lknik commented Mar 15, 2021 •

edited

Loading