Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overloading of ecosystem #83

Closed
kurtseifried opened this issue Sep 4, 2022 · 7 comments
Closed

Overloading of ecosystem #83

kurtseifried opened this issue Sep 4, 2022 · 7 comments

Comments

@kurtseifried
Copy link
Contributor

Where do we put the vendor name? There are lots of packages, sometimes in the same ecosystem, or not really in any ecosystem at all, for which the vendor name is really helpful.

The osv-data seems to do things like:

"ecosystem": "Debian:5.0",
"ecosystem": "Debian:10",

is this the official way to do it? If so can we update the documentation at https://ossf.github.io/osv-schema/#affected-fields doesn't mention this explicitly but does show examples, and:

Your ecosystem here. | Send us a PR.

if so (for the data I'm currently working with) there are about 50 vendors with 100+ items, 250 with 10-99 and 300 with 5-9. What's the bar for entry here to get listed? Do they need to be listed? (E.g. you have Debian so can we add all the major Linux vendors? BSD's?).

@oliverchang
Copy link
Contributor

What's the bar for entry here to get listed? Do they need to be listed?

We just need clearly defined rules for each ecosystem. There must be no ambiguity as to what a "name" means in an ecosystem. This is not always obvious: e.g. for Debian, this must be source packages, not binary packages. For Python, the package name must be normalized. We can't just have e.g. ecosystem: "", name: "human readable text" as these are not very actionable.

Re Debian, the definition states:

The Debian package ecosystem; the name is the name of the source package. The ecosystem string might optionally have a :<RELEASE> suffix to scope the package to a particular Debian release. <RELEASE> is a numeric version specified in the [Debian distro-info-data](https://debian.pages.debian.net/distro-info-data/debian.csv). For example, the ecosystem string “Debian:7” refers to the Debian 7 (wheezy) release.

@kurtseifried
Copy link
Contributor Author

Ok so where do we put software in general? e.g. OpenSource software that isn't part of an existing ecosystem goes where? Do we create something like "opensource" or "software"? What about closed source or vendor specific software?

There is already a catchall for stuff found by oss-fuzz

OSS-Fuzz For reports from the OSS-Fuzz project that have no more appropriate ecosystem; the name field is the name assigned by the OSS-Fuzz project, as recorded in the submitted fuzzing configuration.

Do we do something similar for data from other sources, e.g. "Other"?

@kurtseifried
Copy link
Contributor Author

CVE JSON 5.90 is doing a similar thing to ecosystem with collectionURL https://github.com/CVEProject/cve-schema/blob/master/schema/v5.0/CVE_JSON_5.0_schema.json#L123:

           "collectionURL": {
                "description": "URL identifying a package collection (determines the meaning of packageName).",
                "$ref": "#/definitions/uriType",
                "examples": [
                    "https://access.redhat.com/downloads/content/package-browser",
                    "https://addons.mozilla.org",
                    "https://addons.thunderbird.net",

one major advantage of using a URL is now people know where to go look immediately, and there's no potential for overlap.

@kurtseifried
Copy link
Contributor Author

I think we should consider using URLs pointing to the package ecosystem space for the ecosystem value as it ensures no duplicates, it gives people a hint where to go, and it makes adding new ones trivial, just use the best official URL you can find, there's less need to curate them manually.

@oliverchang
Copy link
Contributor

The issue with that is that we still need clear defined rules for what a package name means as part of this ecosystem. There are subtleties in a lot of ecosystems. Some examples:

  • For Debian, our package name refers to the "source" package, not binary packages.
  • For Go, we refer to Go modules, not Go packages.
  • For Python, the name must be normalized according to Python's rules.
  • And many more...

Having these clear sets of rules allow us to perform validation so that consumers can be confident about their ingestion.

@kurtseifried
Copy link
Contributor Author

What happens if there is a flaw in the binary package and not the source? (this has happened a handful of times if my memory serves).

Also Debian vs https://packages.debian.org/, regardless of what it's called, it would be nice to be able to refer to both source and binary packages, is there some reason for not supporting referring to the binary packages? I assume the intent is to refer to the smallest component of the composition (e.g. go modules instead of packages) but why not support both?

@andrewpollock
Copy link
Collaborator

What happens if there is a flaw in the binary package and not the source? (this has happened a handful of times if my memory serves).

#202 touches on this need...

@oliverchang oliverchang closed this as not planned Won't fix, can't repro, duplicate, stale Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants