Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Category descriptions #89

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Category descriptions #89

wants to merge 4 commits into from

Conversation

max-ostapenko
Copy link

  • Generated category descriptions and updated a sync to BQ table

@max-ostapenko max-ostapenko marked this pull request as ready for review December 20, 2024 19:03
@max-ostapenko max-ostapenko changed the title generated descriptions Category descriptions Dec 20, 2024
@rviscomi
Copy link
Member

So these are all AI-generated?

For some reason most of them try to connect the category back to web performance, which seems strange. For example:

Domain parking solutions redirect domains to a different location or page. These should be lightweight and avoid performance issues.

@max-ostapenko
Copy link
Author

max-ostapenko commented Dec 20, 2024

So these are all AI-generated?

Yes.

For some reason most of them try to connect the category back to web performance, which seems strange.

I have mentioned:

The HTTP Archive Tracks How the Web is Built.
We periodically crawl ...

to keep these aligned with the use-cases, but yeah, it leans too much on performance topic.
Let's just drop the second part.

Copy link

WPT test run for https://almanac.httparchive.org/en/2022/

WPT test run results: http://webpagetest.httparchive.org/results.php?test=241220_8Y_E
Detected technologies:

{
    "detected": {
        "IaaS": "Google Cloud",
        "JavaScript libraries": "web-vitals",
        "RUM": "web-vitals",
        "Performance": "Priority Hints,Google Cloud Trace",
        "Security": "HSTS",
        "Webmail": "Google Workspace",
        "Email": "Google Workspace",
        "Analytics": "Google Analytics",
        "CDN": "Cloudflare",
        "Miscellaneous": "RSS,Open Graph"
    },
    "detected_apps": {
        "Google Cloud": "",
        "web-vitals": "",
        "Priority Hints": "",
        "HSTS": "",
        "Google Workspace": "",
        "Google Cloud Trace": "",
        "Google Analytics": "",
        "Cloudflare": "",
        "RSS": "",
        "Open Graph": ""
    },
    "detected_technologies": {
        "Google Cloud": {
            "name": "Google Cloud",
            "description": "Google Cloud is a suite of cloud computing services.",
            "slug": "google-cloud",
            "categories": [
                {
                    "id": 63,
                    "slug": "iaas",
                    "groups": [
                        7
                    ],
                    "name": "IaaS",
                    "priority": 8,
                    "description": "Infrastructure as a Service (IaaS) provides computing resources."
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "Google Cloud.svg",
            "website": "https://cloud.google.com",
            "pricing": [],
            "cpe": "cpe:2.3:a:google:cloud_platform:*:*:*:*:*:*:*:*"
        },
        "web-vitals": {
            "name": "web-vitals",
            "description": "The web-vitals JavaScript is a tiny, modular library for measuring all the web vitals metrics on real users.",
            "slug": "web-vitals",
            "categories": [
                {
                    "id": 59,
                    "slug": "javascript-libraries",
                    "groups": [
                        9
                    ],
                    "name": "JavaScript libraries",
                    "priority": 9,
                    "description": "JavaScript libraries provide pre-written code."
                },
                {
                    "id": 78,
                    "slug": "rum",
                    "groups": [
                        2
                    ],
                    "name": "RUM",
                    "priority": 9,
                    "description": "Real User Monitoring (RUM) tools track performance as experienced by users."
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "web-vitals.svg",
            "website": "https://github.com/GoogleChrome/web-vitals",
            "pricing": [],
            "cpe": null
        },
        "Priority Hints": {
            "name": "Priority Hints",
            "description": "Priority Hints exposes a mechanism for developers to signal a relative priority for browsers to consider when fetching resources.",
            "slug": "priority-hints",
            "categories": [
                {
                    "id": 92,
                    "slug": "performance",
                    "groups": [
                        7
                    ],
                    "name": "Performance",
                    "priority": 9,
                    "description": "Performance tools measure and optimize site speed."
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "Priority Hints.svg",
            "website": "https://wicg.github.io/priority-hints/",
            "pricing": [],
            "cpe": null
        },
        "HSTS": {
            "name": "HSTS",
            "description": "HTTP Strict Transport Security (HSTS) informs browsers that the site should only be accessed using HTTPS.",
            "slug": "hsts",
            "categories": [
                {
                    "id": 16,
                    "slug": "security",
                    "groups": [
                        11
                    ],
                    "name": "Security",
                    "priority": 9,
                    "description": "Security technologies protect websites from vulnerabilities and attacks."
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "default.svg",
            "website": "https://www.rfc-editor.org/rfc/rfc6797#section-6.1",
            "pricing": [],
            "cpe": null
        },
        "Google Workspace": {
            "name": "Google Workspace",
            "description": "Google Workspace, formerly G Suite, is a collection of cloud computing, productivity and collaboration tools.",
            "slug": "google-workspace",
            "categories": [
                {
                    "id": 30,
                    "slug": "webmail",
                    "groups": [
                        4
                    ],
                    "name": "Webmail",
                    "priority": 2,
                    "description": "Webmail systems allow users to send and receive emails through a browser."
                },
                {
                    "id": 75,
                    "slug": "email",
                    "groups": [
                        4,
                        2
                    ],
                    "name": "Email",
                    "priority": 9,
                    "description": "Email integration technologies affect user communication."
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "Google.svg",
            "website": "https://workspace.google.com/",
            "pricing": [],
            "cpe": null
        },
        "Google Cloud Trace": {
            "name": "Google Cloud Trace",
            "description": "Google Cloud Trace is a distributed tracing system that collects latency data from applications and displays it in the Google Cloud Console.",
            "slug": "google-cloud-trace",
            "categories": [
                {
                    "id": 92,
                    "slug": "performance",
                    "groups": [
                        7
                    ],
                    "name": "Performance",
                    "priority": 9,
                    "description": "Performance tools measure and optimize site speed."
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "google-cloud-trace.svg",
            "website": "https://cloud.google.com/trace",
            "pricing": [],
            "cpe": null
        },
        "Google Analytics": {
            "name": "Google Analytics",
            "description": "Google Analytics is a free web analytics service that tracks and reports website traffic.",
            "slug": "google-analytics",
            "categories": [
                {
                    "id": 10,
                    "slug": "analytics",
                    "groups": [
                        8
                    ],
                    "name": "Analytics",
                    "priority": 9,
                    "description": "Analytics tools track user behavior and provide insights into website performance."
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "Google Analytics.svg",
            "website": "https://google.com/analytics",
            "pricing": [],
            "cpe": null
        },
        "Cloudflare": {
            "name": "Cloudflare",
            "description": "Cloudflare is a web-infrastructure and website-security company, providing content-delivery-network services, DDoS mitigation, Internet security, and distributed domain-name-server services.",
            "slug": "cloudflare",
            "categories": [
                {
                    "id": 31,
                    "slug": "cdn",
                    "groups": [
                        7
                    ],
                    "name": "CDN",
                    "priority": 9,
                    "description": "Content Delivery Networks (CDNs) distribute website content globally to improve load times for users."
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "CloudFlare.svg",
            "website": "https://www.cloudflare.com",
            "pricing": [],
            "cpe": null
        },
        "RSS": {
            "name": "RSS",
            "description": "RSS is a family of web feed formats used to publish frequently updated works—such as blog entries, news headlines, audio, and video—in a standardized format.",
            "slug": "rss",
            "categories": [
                {
                    "id": 19,
                    "slug": "miscellaneous",
                    "groups": [
                        6
                    ],
                    "name": "Miscellaneous",
                    "priority": 10,
                    "description": "Miscellaneous tools and technologies encompass those that don't fit into other categories."
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "RSS.svg",
            "website": "https://www.rssboard.org/rss-specification",
            "pricing": [],
            "cpe": null
        },
        "Open Graph": {
            "name": "Open Graph",
            "description": "Open Graph is a protocol that is used to integrate any web page into the social graph.",
            "slug": "open-graph",
            "categories": [
                {
                    "id": 19,
                    "slug": "miscellaneous",
                    "groups": [
                        6
                    ],
                    "name": "Miscellaneous",
                    "priority": 10,
                    "description": "Miscellaneous tools and technologies encompass those that don't fit into other categories."
                }
            ],
            "confidence": 100,
            "version": "",
            "icon": "Open Graph.png",
            "website": "https://ogp.me",
            "pricing": [],
            "cpe": null
        }
    }
}

@max-ostapenko
Copy link
Author

@pmeenan I see category descriptions are being pulled into detected_technologies, which is nice of our test run.
But can it break anything on WPT side?

@pmeenan
Copy link
Member

pmeenan commented Dec 20, 2024

Shouldn't. Not sure what the technologies page in WPT uses but new fields are usually not a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants