-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent or limit selective training by callers #225
Comments
Could you explain why this behavior would be a problem? Suppose an ad tech is anti-poetry, and feels that Taxonomy v1 topic 102, "/Books & Literature/Poetry", is useless to them. They decide to pass Further suppose the user really is interested in poetry, so that it is indeed one of their top-5 interests for the week. Then in subsequent weeks, the anti-poetry ad tech might call the Topics API and receive no topic at all, while another ad tech on the same page would receive the Poetry topic. This seems pretty much the same as if both ad techs received the Poetry topic, and one ad tech decided to ignore it and not use Poetry in picking which ad to show. I hope we can agree that it's fine for an ad tech to ignore unhelpful information at targeting time. So I don't see why ignoring the information earlier in the process, as you describe, is something we should worry about. |
Is there any data that shows that every topic is revenue positive or neutral? In early testing we spotted some topics or combinations of topics that are correlated with lower than average ad revenue (roleplaying game topics without parenting topics was really low). A caller would presumably only want to avoid training on just enough sites to avoid creating any revenue-negative combinations. Is "622 /Travel & Transportation/Hotels & Accommodations/Vacation Rentals & Short-Term Stays" + "102 /Books & Literature/Poetry" always worth the same as or more than "622 /Travel & Transportation/Hotels & Accommodations/Vacation Rentals & Short-Term Stays" alone? |
I have no data of the sort you ask for. But if an ad tech wants to bid less when they see topic X compared to no topic at all — if they find that bidding strategy beneficial for whatever reason — then surely they would want to see topic X precisely so that they could get whatever benefit it confers. So this doesn't seem like a use case for "ban selective calling" at all. |
Can't the party collecting the topics and the party bidding for the impression be different, though? Seems impractical for every possible bidder to check what selective training is being done by every possible caller. Unless every bidder is also a caller? |
Since Topics is relatively new, I would not even expect that the folks planning to experiment are sure of the long-term answer to which parties will end up being callers. But let's guess that SSPs get topics and pass them to DSPs (which seems very reasonable). Every SSP is already on a different set of websites, and so every SSP's topics will already be affected by what they happen to observe. How is an SSP's selective use of |
That's a good point. I think you're right on this one -- the SSP should be able to call Topics API in an optimized way, in order to collect and present the topics data for a particular user that they believe to be most likely to attract a high bid. It looks like we can close this issue and focus on #92 -- since SSPs can do retrieve without observe in order to optimize their results, making the same option available to publishers seems like a better direction. If nobody has any objections I'll close this. |
Closing, will comment at #92 |
Topics API originally required an exchange of user topics data for user topics data: if a caller wanted to see Topics API data pertaining to a user, the caller had to allow for Topics API data to be collected on the current site. However this is no longer in effect for all parties. Since #80 it has been possible for callers to retrieve topics data for a user without allowing the browser to observe topics.
A third-party caller can selectively pass
{observe:false})
in order to optimize the Topics API data collected for a given user. Because the caller knows in advance which topics the browser would assign to a given site, the caller can choose to pass{observe:false})
on sites that would yield lower-value topics. The result would be that the selective caller's audience would have disproportionately high-value sets of topics, resulting in higher revenue for that caller and pressure on other third-party callers to also observe more selectively.Meanwhile, a publisher site does not have the same level of control: an intermediary can selectively train on only the highest-value publisher data, but a publisher does not have the ability to optimize how its own audience data is collected and trained on.
There are probably several ways to address the imbalance. Some options:
Allow train/no-train control to first-party sites too, using Permissions-Policy (Update permissions policy to support separate permissions for retrieve and observe #92)
Remove
observe:false
entirely (not feasible because of performance concerns, see meeting notes)After a caller uses
observe:false
, have the browser treat subsequent calls by the same caller asobserve:false
for a certain period of time. (2 days?) to limit the rewards for selective observation by making the caller risk keeping some higher-value topics out of training as well(This issue is based on a discussion at the 31 Jul 2023 Topics API call. Notes at: https://github.com/patcg-individual-drafts/topics/tree/main/meetings )
The text was updated successfully, but these errors were encountered: