Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid URLs in the catalog #4404

Closed
Vampire opened this issue Jan 29, 2025 · 4 comments
Closed

Invalid URLs in the catalog #4404

Vampire opened this issue Jan 29, 2025 · 4 comments

Comments

@Vampire
Copy link
Contributor

Vampire commented Jan 29, 2025

Area with issue?

JSON Schema

✔️ Expected Behavior

No invalid URLs in the catalog

❌ Actual Behavior

Invalid URLs in the catalog.

I did a quick CLI validation using

cat <(jq -r '.schemas[] | select(.versions) | .versions[]' src/api/json/catalog.json) <(jq -r '.schemas[].url' src/api/json/catalog.json) | xargs -ri sh -c 'echo {}; http --body --stream -F GET "{}" | jq &>/dev/null && echo OK || echo FAILED'

YAML or JSON file that does not work.

https://raw.githubusercontent.com/eclipse-che/che-server/master/wsmaster/che-core-api-workspace/src/main/resources/schema/1.0.0/devfile.json
https://github.com/bitol-io/open-data-contract-standard/blob/main/schema/odcs-json-schema-v3.0.0.json
https://github.com/bitol-io/open-data-contract-standard/blob/main/schema/odcs-json-schema-v2.2.2.json
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.72.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.73.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.74.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.75.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.76.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.77.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.78.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.79.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.80.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.81.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.82.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.83.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.84.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.85.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.86.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.87.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.88.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.89.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.90.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.91.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.92.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.93.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.94.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.95.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.96.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.97.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.98.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.99.0/rule_schema_v1.yaml
https://raw.githubusercontent.com/returntocorp/semgrep-interfaces/v1.100.0/rule_schema_v1.yaml
https://docs.gradle.com/build-cache-node/schema/build-cache-node-config-schema-1.json
https://docs.gradle.com/build-cache-node/schema/build-cache-node-config-schema-2.json
https://docs.gradle.com/build-cache-node/schema/build-cache-node-config-schema-3.json
https://coderabbit.ai/integrations/schema.v2.json
https://schemas.wp.org/trunk/block.json
https://deta.space/assets/spacefile.schema.json
https://openapi.vercel.sh/vercel.json
https://unpkg.com/@changesets/config/schema.json
https://raw.githubusercontent.com/crowdsecurity/crowdsec-yaml-schemas/main/collection_schema.yaml
https://raw.githubusercontent.com/crowdsecurity/crowdsec-yaml-schemas/main/parser_schema.yaml
https://raw.githubusercontent.com/crowdsecurity/crowdsec-yaml-schemas/main/scenario_schema.yaml
https://on.cypress.io/cypress.schema.json
https://raw.githubusercontent.com/dolittle/DotNET.Fundamentals/master/Schemas/Tenancy.Configuration/tenant-map.json
https://raw.githubusercontent.com/dolittle/DotNET.SDK/master/Schemas/Applications.Configuration/topology.json
https://github.com/devantler/ksail/blob/main/schemas/ksail-cluster-schema.json
https://raw.githubusercontent.com/tree-sitter/tree-sitter/master/docs/assets/schemas/config.schema.json
https://unpkg.com/@graphql-mesh/types/esm/config-schema.json
https://unpkg.com/graphql-config/config-schema.json
https://www.graphql-code-generator.com/config.schema.json
https://jsonapi.org/schema
https://w3id.org/linkml/meta.schema.json
https://raw.githubusercontent.com/BookkeepersMC/notebook-schemas/master/notebook.mod.json/schemas/main.json
https://noxorg.dev/schemas/NoxConfiguration.json
https://raw.githubusercontent.com/Songmu/podbard/main/schema.yaml
https://schemas.wp.org/trunk/theme.json
https://turborepo.org/schema.json
https://www.unpkg.com/wrangler/config-schema.json
https://uniswap.org/tokenlist.schema.json
https://docs.gradle.com/enterprise/admin/schema/gradle-enterprise-config-schema-11.json
https://raw.githubusercontent.com/serverlessworkflow/specification/main/schema/workflow.yaml
https://github.com/DannyBen/completely/blob/master/schemas/completely.json
https://rivet.gg/rivet.schema.json
https://www.cardgamesimulator.com/schema/cgs.json
https://deployments.allegrogroup.com/tycho/schema
https://raw.githubusercontent.com/cinnamon-spice-settings.json
https://raw.githubusercontent.com/cinnamon-spice-metadata.json
https://json.schemastore.org/winutil-preset.json

IDE or code editor.

None

Are you making a PR for this?

No, someone else must create the PR.

@hyperupcall
Copy link
Member

hyperupcall commented Jan 29, 2025

Yeah unfortunately there are quite a few schemas like this. I think I partially wrote a script in the maintenance task that checked for 404's (some responses expectedly return 301 unauthorized), but haven't replaced most of them. Ideally, I think the line with that URL should be blamed, and a comment should be added to the original PR (possibly automatically), when the route 404s (asking to update it). And if the URL points to a schema in a repository with very few stars, maybe even a PR that automatically removes it.

Dupe of #2247

@Vampire
Copy link
Contributor Author

Vampire commented Jan 30, 2025

Only checking 404 is not enough though imho.
As I said, some URLs are valid but provide non-sense like an HTML page that displays the schema instead of the schema.

301 are redirects, so if you really meant 301, not 401, those should be followed during a check instead of ignored.

If it indeed were 401, I'm wondering whether non-public schemas should be part of the schemastore catalog. 🤷‍♂️

@hyperupcall
Copy link
Member

301 are redirects, so if you really meant 301, not 401, those should be followed during a check instead of ignored.

Yeah, I meant 401

If it indeed were 401, I'm wondering whether non-public schemas should be part of the schemastore catalog. 🤷‍♂️

Currently, the Tycho schema is. Currently, these are documented in the "maintenance" task (PR imminent)

@Vampire
Copy link
Contributor Author

Vampire commented Feb 24, 2025

Ah, ok, for me the Tycho schema does not give a 401 but the whole hostname is not resolvable for me.
So for me as written in #2247, besides the Tycho schema my check finds

  • 13 schemas with 404
  • 2 schemas where an invalid GitHub URL is configured
  • 39 schemas where a 200 answer is delivered but it is not valid JSON
    • 35 of those are YAML files
    • 3 of those are GitHub links to the HTML-rendered schema instead of the raw link
    • 1 of those redirects to an HTML page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants