Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include URL and page content in the Topics classifier features #118

Closed
lbdvt opened this issue Nov 17, 2022 · 4 comments
Closed

Include URL and page content in the Topics classifier features #118

lbdvt opened this issue Nov 17, 2022 · 4 comments

Comments

@lbdvt
Copy link
Contributor

lbdvt commented Nov 17, 2022

The Topics classifier currently uses only the page hostname to define the corresponding topics.

This leads to large sites, with diverse content, getting a very generic topic with low advertising value.

Taking into account the URL and the content of the website as features in the classifier will allow for much more accurate classification of websites’ pages and shall improve the signaling within Topics.

@dmarti
Copy link
Contributor

dmarti commented Dec 13, 2022

This probably needs a separate permission. In many cases the page content or path can be sensitive when the domain is not. (It's possible that a user would be perfectly fine with having the drugstore.example domain used for training, but not the content of drugstore.example/my-orders)

@jkarlin
Copy link
Collaborator

jkarlin commented Jun 22, 2023

Exposing content beyond the origin to the API is tricky from a privacy perspective. Cross-origin iframes today do not have access to the page title or URL or page contents, just the origin. Giving Topics access to that data means it's possible for these third-parties to then learn topics inferred from content they couldn't otherwise access. It's quite limited data, but enough that it gives pause and would likely need some mitigations to protect against abuse. As such, closing for now as we don't intend to do this in the near future.

@patmmccann
Copy link

patmmccann commented Jul 25, 2023

@jkarlin it seems the concern you express only applies in a on by default state. By making access to the extra information opt-in via permissions, you wouldn't allow the cross-origin iframe access to information it couldn't gather today through, for example, an opt-in post message orchestrated by the publisher.

@jkarlin
Copy link
Collaborator

jkarlin commented Jul 26, 2023

#224 has continued discussion on a possible opt-in mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants