-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inquiry Regarding Use of Topics API Model for HTTP Archive #305
Comments
Hi Nurullah - I'm looking into this and will get back to you soon. |
As you know, the Topics API classification model is shipped alongside the Chrome browser, in order to facilitate the on-device generation of topics. All the code to use the model is within the Chromium source tree which is subject to the Chromium open source license. There is no technical barrier to any party utilizing the model purposes beyond Topics API. In production, Chrome uses an override list in order to improve performance - this list does not exist in the Chromium source tree. |
Thank you, Leeron! That sounds cool. I will share our data with you as well once we are done. |
Thank you once again, @leeronisrael. We processed all the URLs in our dataset and made it open-source. Check the documentation: https://har.fyi/reference/functions/get_host_categories/ There have been some discussions on the accuracy of the model, but I couldn't find any related stats. Do you have any statistics on this that you can provide? |
Hello,
I am Nurullah from HTTP Archive, and we are planning to use Topics API model to categorize webpages for the 2024 Web Almanac project.
Our goal is to utilize the Topics API model to determine the categories of the CrUX origins in HTTP Archive. We intend to classify the origins similar to the one discussed here. The results of this classification will be stored and made publicly available in BigQuery, primarily for use by the Web Almanac analysts.
Before proceeding, we want to ensure that this use case does not violate any terms of use or raise other concerns regarding the Topics API. Could you provide guidance or confirm whether there are any potential issues with utilizing the Topics API in this manner?
Appreciate your support on this matter.
The text was updated successfully, but these errors were encountered: