-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Permissions to observe topics in page head and body #224
Comments
By default off, do you mean default |
Thank you @jkarlin, edited to default |
Ack, thanks. So what is the incentive for a publisher to opt into this? |
Three possible reasons: (edited to include use case from @patmmccann)
|
Let's see if some publishers ask for this feature. |
If YouTube requested opt-in HTML title or head training, are there any obstacles to giving it to them? |
I do think there are still some questions that would need consideration:
None of these seems insurmountable, but each one of them would require new work. |
We represent 4000 publishers, we're asking for this. |
@patmmccann Just in the interest of clarity, are you asking for this API change because you plan to use it on some of your 4000 sites? For example, what "large, general-interest sites" do you run that you want to make "contribute fairly"? If I understand correctly, you and Don are both from Raptive, and Don has been asking for this so that someone else might be compelled to use it. |
I realize just now I am mixing up threads, and I moved my previous comment to the correct thread. Our goal is absolutely to opt our publishers into using page context to better populate topics, as we have already deployed a topics network operating within all 4000 of them. Mediavine has done something similar I understand. For example you can see https://ads.adthrive.com/builds/core/94b7c03/html/topics.html called from https://firstquarterfinance.com/ Page title or other meta data can be very helpful for a more compact network of sites to generate useful topics. For example, suppose there are five large content aggregator sites owned by newscorp; they would be much more able to have a useful network if they could give their own network permission to share headers with their own tech. @gwhigs at Gannett is working on this in his network. |
Encylopedia Britannica, thoughtcatalog.com, mediaite.com to name a few |
Thanks very much! Learning that you "have already deployed a topics network operating within all 4000 of" your sites makes this a compelling feature request. The concerns that I mentioned above are all things we will need to figure out how to handle, so this is certainly still going to take work. But it's great to have a concrete demonstration that this would indeed be a way to add value to Topics data. |
To be clear I don't think this is a solvable issue. Tech companies are daily (literally) making it easier for people to generate whole sites with unique domains to focus on all sorts of topics or on specific topics. I think there is a broader 'trust' issue in terms of if Topics should work on specific sites without some level of trustworthiness from some signal, but I think that is a general problem not one that is particularly relevant to observing head and body or not. If Topics is successful there will be significant monetary incentive to play the model. I don't think that the changes suggested here will make a meaningful difference to the effectiveness of bad actors in doing so. It may make it harder or easier, but not meaningfully enough to dissuade anyone. |
@AramZS I think the Topics answer to that concern has to be "curation" on the part of the API caller — that is, their deciding whether or not to observe topics on a particular page or site. My instincts are that this probably becomes harder if the calculation expands beyond domain name. But I fully agree that this is an issue that API callers ought to think about either way. |
Interesting update here, instead of getting no topic, the latest classifier gives the wrong topic to each of those sites. Not sure which is a better outcome. |
@leeronisrael @michaelkleber icymi |
@michaelkleber @AramZS it occurs to me the problem of sites deliberately corrupting the data with choice of site name occurs today. See for example https://www.workandmoney.com/s/actor-most-oscar-nominations-no-wins-b89d656968274d51 |
Add two permissions, default
none
, for sites to allow training on the HTMLhead
(including title) andbody
elements.This would address the problem of sensitive titles and other page content covered in #118 while still allowing large, general-interest sites to contribute fairly to Topics API audience data collection.
Related: #92 #206
The text was updated successfully, but these errors were encountered: