Permissions to observe topics in page head and body #224

dmarti · 2023-07-24T21:59:15Z

Add two permissions, default none, for sites to allow training on the HTML head (including title) and body elements.

Permissions-Policy: browsing-topics-observe-head
Permissions-Policy: browsing-topics-observe-body

This would address the problem of sensitive titles and other page content covered in #118 while still allowing large, general-interest sites to contribute fairly to Topics API audience data collection.

Related: #92 #206

The text was updated successfully, but these errors were encountered:

jkarlin · 2023-07-25T16:51:14Z

By default off, do you mean default self or default none? Default self would mean that ad-tech (in the top frame) could still enable it on the page. Default none would mean that the publisher page would have to opt in in its response header.

dmarti · 2023-07-25T17:00:05Z

Thank you @jkarlin, edited to default none. There are a lot of examples of sites where page titles and content could be totally inappropriate for training on a site that might be willing to have its domain used (such as book titles on a bookstore site, or titles of health advice articles on a general-interest consumer advice site) so any site that intends to have training on head and body should have to review their pages first and affirmatively turn it on.

jkarlin · 2023-07-25T17:06:08Z

Ack, thanks. So what is the incentive for a publisher to opt into this?

dmarti · 2023-07-25T17:17:59Z

Three possible reasons: (edited to include use case from @patmmccann)

An adtech intermediary compensates sites for providing it with additional data beyond what would be available from the domain alone
A large video site that is owned by the same company as a browser vendor chooses to opt in some or all of its pages, in order to avoid competition issues resulting from different treatment by the browser of video site channels and independent sites
An advertising service does a human review of sites before working with them and adding its third-party code. As part of the review, a human reviewer checks for sensitivity and privacy concerns in page head and/or body, and if none, adds permission for relevant topics observation by the service's own Topics API caller.

jkarlin · 2023-07-25T18:34:25Z

Let's see if some publishers ask for this feature.

dmarti · 2023-07-25T22:04:04Z

If YouTube requested opt-in HTML title or head training, are there any obstacles to giving it to them?

michaelkleber · 2023-07-25T23:10:51Z

I do think there are still some questions that would need consideration:

As discussed in Use topics from a meta tag on Special Topics Provider Sites #206, there is still the risk of what you dubbed "the Reddit problem" of sites deliberately corrupting data.
It would be easier for a malicious party to circumvent the per-caller filtering logic (which only allows a part to observe a topic if they were previously on a page about that topic) — we lose that protection if it's easy for one site to pretend to be about every topic in the world.
Of course we would need a new topic-assignment ML model that did a good job on this new input data.

None of these seems insurmountable, but each one of them would require new work.

patmmccann · 2023-08-08T22:23:49Z

Let's see if some publishers ask for this feature.

We represent 4000 publishers, we're asking for this.

michaelkleber · 2023-08-09T01:32:48Z

@patmmccann Just in the interest of clarity, are you asking for this API change because you plan to use it on some of your 4000 sites? For example, what "large, general-interest sites" do you run that you want to make "contribute fairly"?

If I understand correctly, you and Don are both from Raptive, and Don has been asking for this so that someone else might be compelled to use it.

patmmccann · 2023-08-09T14:41:50Z

I realize just now I am mixing up threads, and I moved my previous comment to the correct thread.

Our goal is absolutely to opt our publishers into using page context to better populate topics, as we have already deployed a topics network operating within all 4000 of them. Mediavine has done something similar I understand.

For example you can see https://ads.adthrive.com/builds/core/94b7c03/html/topics.html called from https://firstquarterfinance.com/

Page title or other meta data can be very helpful for a more compact network of sites to generate useful topics. For example, suppose there are five large content aggregator sites owned by newscorp; they would be much more able to have a useful network if they could give their own network permission to share headers with their own tech.

@gwhigs at Gannett is working on this in his network.

patmmccann · 2023-08-15T16:03:47Z

For example, what "large, general-interest sites" do you run that you want to make "contribute fairly"?

Encylopedia Britannica, thoughtcatalog.com, mediaite.com to name a few

patmmccann · 2023-08-15T17:11:50Z

There's another application here, which is the topics classifier just fails to generate a topic completely on many of our sites. This would allow those sites to "contribute fairly" as well, not just general sites.

michaelkleber · 2023-08-16T00:59:07Z

Thanks very much! Learning that you "have already deployed a topics network operating within all 4000 of" your sites makes this a compelling feature request.

The concerns that I mentioned above are all things we will need to figure out how to handle, so this is certainly still going to take work. But it's great to have a concrete demonstration that this would indeed be a way to add value to Topics data.

AramZS · 2023-08-30T17:47:54Z

As discussed in Use topics from a meta tag on Special Topics Provider Sites #206, there is still the risk of what you dubbed "the Reddit problem" of sites deliberately corrupting data.

To be clear I don't think this is a solvable issue. Tech companies are daily (literally) making it easier for people to generate whole sites with unique domains to focus on all sorts of topics or on specific topics. I think there is a broader 'trust' issue in terms of if Topics should work on specific sites without some level of trustworthiness from some signal, but I think that is a general problem not one that is particularly relevant to observing head and body or not.

If Topics is successful there will be significant monetary incentive to play the model. I don't think that the changes suggested here will make a meaningful difference to the effectiveness of bad actors in doing so. It may make it harder or easier, but not meaningfully enough to dissuade anyone.

michaelkleber · 2023-08-30T18:27:59Z

@AramZS I think the Topics answer to that concern has to be "curation" on the part of the API caller — that is, their deciding whether or not to observe topics on a particular page or site.

My instincts are that this probably becomes harder if the calculation expands beyond domain name. But I fully agree that this is an issue that API callers ought to think about either way.

patmmccann · 2023-12-01T19:09:57Z

Interesting update here, instead of getting no topic, the latest classifier gives the wrong topic to each of those sites. Not sure which is a better outcome.

cc @leeronisrael

patmmccann · 2024-02-20T14:33:59Z

@leeronisrael @michaelkleber icymi

patmmccann · 2024-02-24T04:29:58Z

@michaelkleber @AramZS it occurs to me the problem of sites deliberately corrupting the data with choice of site name occurs today. See for example https://www.workandmoney.com/s/actor-most-oscar-nominations-no-wins-b89d656968274d51

dmarti mentioned this issue Jul 24, 2023

Use topics from a meta tag on Special Topics Provider Sites #206

Open

jkarlin mentioned this issue Jul 26, 2023

Include URL and page content in the Topics classifier features #118

Closed

michaelkleber mentioned this issue Aug 10, 2023

Update permissions policy to support separate permissions for retrieve and observe #92

Closed

patmmccann mentioned this issue Oct 11, 2023

Some domain name have no associated topic #264

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Permissions to observe topics in page head and body #224

Permissions to observe topics in page head and body #224

dmarti commented Jul 24, 2023 •

edited

Loading

jkarlin commented Jul 25, 2023 •

edited

Loading

dmarti commented Jul 25, 2023

jkarlin commented Jul 25, 2023

dmarti commented Jul 25, 2023 •

edited

Loading

jkarlin commented Jul 25, 2023

dmarti commented Jul 25, 2023

michaelkleber commented Jul 25, 2023

patmmccann commented Aug 8, 2023

michaelkleber commented Aug 9, 2023

patmmccann commented Aug 9, 2023 •

edited

Loading

patmmccann commented Aug 15, 2023

patmmccann commented Aug 15, 2023

michaelkleber commented Aug 16, 2023

AramZS commented Aug 30, 2023 •

edited

Loading

michaelkleber commented Aug 30, 2023

patmmccann commented Dec 1, 2023 •

edited

Loading

patmmccann commented Feb 20, 2024

patmmccann commented Feb 24, 2024

Permissions to observe topics in page head and body #224

Permissions to observe topics in page head and body #224

Comments

dmarti commented Jul 24, 2023 • edited Loading

jkarlin commented Jul 25, 2023 • edited Loading

dmarti commented Jul 25, 2023

jkarlin commented Jul 25, 2023

dmarti commented Jul 25, 2023 • edited Loading

jkarlin commented Jul 25, 2023

dmarti commented Jul 25, 2023

michaelkleber commented Jul 25, 2023

patmmccann commented Aug 8, 2023

michaelkleber commented Aug 9, 2023

patmmccann commented Aug 9, 2023 • edited Loading

patmmccann commented Aug 15, 2023

patmmccann commented Aug 15, 2023

michaelkleber commented Aug 16, 2023

AramZS commented Aug 30, 2023 • edited Loading

michaelkleber commented Aug 30, 2023

patmmccann commented Dec 1, 2023 • edited Loading

patmmccann commented Feb 20, 2024

patmmccann commented Feb 24, 2024

dmarti commented Jul 24, 2023 •

edited

Loading

jkarlin commented Jul 25, 2023 •

edited

Loading

dmarti commented Jul 25, 2023 •

edited

Loading

patmmccann commented Aug 9, 2023 •

edited

Loading

AramZS commented Aug 30, 2023 •

edited

Loading

patmmccann commented Dec 1, 2023 •

edited

Loading