Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CVE entry management in OSV #2465

Open
hogo6002 opened this issue Aug 9, 2024 · 28 comments
Open

Improve CVE entry management in OSV #2465

hogo6002 opened this issue Aug 9, 2024 · 28 comments
Labels
enhancement New feature or request

Comments

@hogo6002
Copy link
Contributor

hogo6002 commented Aug 9, 2024

The current OSV structure combines vulnerability data from different resources (e.g., NVD, Alpine, Debian) into a single CVE entry based on shared CVE IDs. This approach leads to overly large and difficult to maintain CVE entries. With Ubuntu also publishing its security tracker data to OSV.dev using CVE IDs, this issue will likely worsen.

We probably need a better solution for managing CVE entries. One idea is to add a source-specific prefix or suffix to the CVE ID, creating separate entries for each data source. For example, Alpine-CVE-2024-0001 and Debian-CVE-2024-0001 would be displayed as two distinct records on OSV.dev.

@hogo6002 hogo6002 added the enhancement New feature or request label Aug 9, 2024
@dodys
Copy link

dodys commented Aug 9, 2024

As someone that does not have a view of the whole osv.dev machinery, I think the main question is, why would you want to combine CVE entries?

@hogo6002
Copy link
Contributor Author

why would you want to combine CVE entries?

The initial reason was that Alpine SecDB uses CVE IDs to publish their vulnerabilities. So, we created a tool to combine their data into our existing CVEs, as long as they use the same vulnerability ID. Then, we wanted to add Debian security tracker data into our OSV to better support container scanning, so we continued to combine everything into our CVE entries to keep it consistent, as the Debian security tracker also uses CVE IDs.

However, the problem now is that we continue to combine more and more data into a single record, increasing the difficulty of our maintenance. So, we are considering some better changes to this approach.

@dodys
Copy link

dodys commented Aug 12, 2024

why would you want to combine CVE entries?

The initial reason was that Alpine SecDB uses CVE IDs to publish their vulnerabilities. So, we created a tool to combine their data into our existing CVEs, as long as they use the same vulnerability ID. Then, we wanted to add Debian security tracker data into our OSV to better support container scanning, so we continued to combine everything into our CVE entries to keep it consistent, as the Debian security tracker also uses CVE IDs.

However, the problem now is that we continue to combine more and more data into a single record, increasing the difficulty of our maintenance. So, we are considering some better changes to this approach.

From your first message, the thing that is not clear to me is, if you are not going to combine CVEs anymore, then why would you need to still add the prefixes (e.g. Debian-CVE-2024-0001) ? Aren't those completely separate buckets?

@oliverchang
Copy link
Collaborator

why would you want to combine CVE entries?

The initial reason was that Alpine SecDB uses CVE IDs to publish their vulnerabilities. So, we created a tool to combine their data into our existing CVEs, as long as they use the same vulnerability ID. Then, we wanted to add Debian security tracker data into our OSV to better support container scanning, so we continued to combine everything into our CVE entries to keep it consistent, as the Debian security tracker also uses CVE IDs.
However, the problem now is that we continue to combine more and more data into a single record, increasing the difficulty of our maintenance. So, we are considering some better changes to this approach.

From your first message, the thing that is not clear to me is, if you are not going to combine CVEs anymore, then why would you need to still add the prefixes (e.g. Debian-CVE-2024-0001) ? Aren't those completely separate buckets?

For OSV, ID prefixes must be unique, and be associated with a home database. Debian security tracker, Alpine etc didn't have a unique ID before, so we were combining them into a single CVE record. If we don't combine them, then they would need their own prefix.

@dodys
Copy link

dodys commented Aug 12, 2024

why would you want to combine CVE entries?

The initial reason was that Alpine SecDB uses CVE IDs to publish their vulnerabilities. So, we created a tool to combine their data into our existing CVEs, as long as they use the same vulnerability ID. Then, we wanted to add Debian security tracker data into our OSV to better support container scanning, so we continued to combine everything into our CVE entries to keep it consistent, as the Debian security tracker also uses CVE IDs.
However, the problem now is that we continue to combine more and more data into a single record, increasing the difficulty of our maintenance. So, we are considering some better changes to this approach.

From your first message, the thing that is not clear to me is, if you are not going to combine CVEs anymore, then why would you need to still add the prefixes (e.g. Debian-CVE-2024-0001) ? Aren't those completely separate buckets?

For OSV, ID prefixes must be unique, and be associated with a home database. Debian security tracker, Alpine etc didn't have a unique ID before, so we were combining them into a single CVE record. If we don't combine them, then they would need their own prefix.

Thanks for clarifying it, and would the prefix also be necessary for the filename itself? or could we have a filename CVE-2024-0001.json and inside of it id = "UBTU-CVE-2024-0001?

@hogo6002
Copy link
Contributor Author

Thanks for clarifying it, and would the prefix also be necessary for the filename itself? or could we have a filename CVE-2024-0001.json and inside of it id = "UBTU-CVE-2024-0001?

We do prefer the filename to match the ID name to keep it consistent for other ecosystems. Are there any concerns for Ubuntu in changing the filename?

@dodys
Copy link

dodys commented Aug 13, 2024

Thanks for clarifying it, and would the prefix also be necessary for the filename itself? or could we have a filename CVE-2024-0001.json and inside of it id = "UBTU-CVE-2024-0001?

We do prefer the filename to match the ID name to keep it consistent for other ecosystems. Are there any concerns for Ubuntu in changing the filename?

No concerns, just confirming what needs to be done as it will be a mass operation.

The only thing to confirm is which should be the prefix, Ubuntu-, UBUNTU-, UBTU-?

@hogo6002
Copy link
Contributor Author

hogo6002 commented Aug 14, 2024

The only thing to confirm is which should be the prefix, Ubuntu-, UBUNTU-, UBTU-?

Thanks for confirming! We prefer UBUNTU- to align with our current naming schema. Same as https://osv.dev/vulnerability/CURL-CVE-2024-7264

@hogo6002
Copy link
Contributor Author

Hey @dodys, I have two questions about Ubuntu CVE OSV records (ubuntu-security-notices/osv).
First, I'm curious why some affected packages lack the ubuntu_priority field. This field is from the vulnerability level and should be consistent across all affected packages. But I've noticed that some packages are missing this field while others within the same vulnerability have it. Example: https://osv.dev/vulnerability/CVE-2024-21131

Second question is about the status field. I can't find it in the Ubuntu OSV records, not sure if it's an important one. But I'm wondering if there's a filtering mechanism that determines which records Ubuntu sends to OSV (like not sending all records with a status of DNE or not-affected ?).

@dodys
Copy link

dodys commented Aug 15, 2024

Hey @dodys, I have two questions about Ubuntu CVE OSV records (ubuntu-security-notices/osv). First, I'm curious why some affected packages lack the ubuntu_priority field. This field is from the vulnerability level and should be consistent across all affected packages. But I've noticed that some packages are missing this field while others within the same vulnerability have it. Example: https://osv.dev/vulnerability/CVE-2024-21131

Indeed, for now we only added the ecosystem_specific part (including ubuntu_priority) when the CVE was fixed. In the next batch of update to our scripts, we plan to add the ecosystem_specific to all.

Second question is about the status field. I can't find it in the Ubuntu OSV records, not sure if it's an important one. But I'm wondering if there's a filtering mechanism that determines which records Ubuntu sends to OSV (like not sending all records with a status of DNE or not-affected ?).

Since OSV has the affected field, which is for only packages that are vulnerable to such "vulnerability" (CVE) or "set of vulnerabilities" (USN), for CVEs, we only list the ones that are affected or are still pending investigation. That means that the following status are always included:
needed -> means vulnerable/affected
needs-triage -> means that it needs investigation, therefore we consider it vulnerable until investigation is done
deferred -> means that we tried to investigate but we still don't have enough information to do so, therefore it is vulnerable until confirmed
pending -> means that the package is vulnerable and a fix is on its way
released -> means that we patched it in such release

Some status are more complex and will depend on other factors:
not-affected (<version>) -> if the <version> exists in that specific Ubuntu release, we consider it similar to released, if the <version> is an old one, e.g. the first version to contain the fix and not vulnerable anymore, then we don't add it since that Ubuntu release is not vulnerable at all.
ignored -> it will only be included for releases that are still "alive", as ignored can be used for different reasons.

And some status we just don't include:
DNE -> package doesn't exist in such Ubuntu release
active -> even though it is in the docs we still don't use that status.

Does that help clarify?

@hogo6002
Copy link
Contributor Author

Does that help clarify?

Thanks Eduardo! It's very detailed and helpful!

Having all the ubuntu_priority info should be sufficient for us, especially considering you've already helped us filter out DNE and some not-affected ones.

@dodys
Copy link

dodys commented Aug 20, 2024

I forgot to ask one thing Holly
For the USN data, should we change the related field to refer to the UBUNTU-CVE-... or should we keep it as CVE-...?

@hogo6002
Copy link
Contributor Author

hogo6002 commented Aug 21, 2024

I forgot to ask one thing Holly For the USN data, should we change the related field to refer to the UBUNTU-CVE-... or should we keep it as CVE-...?

Are you referring to data from USN notes such as USN-6969-1? I think for this type of data, we can just keep its original USN- prefix, no change needed. Just for Ubuntu CVEs, we can add UBUNTU-CVE- as its prefix.

Misread the question (didn't see the related word 😅), please refer to Oliver's answer

@oliverchang
Copy link
Collaborator

Hmm, I think @dodys is asking about what to put inside related for USN- entries? I think under our schema definitions, it would make the most sense to:

  1. Put UBUNTU-CVE- in aliases, if it exists, given that they're both Ubuntu-specific advisories talking about the same vulnerability.
  2. Put CVE- in related.

@dodys
Copy link

dodys commented Aug 21, 2024

Thanks both!

@dodys
Copy link

dodys commented Aug 28, 2024

Holly and Oliver, I've just pushed the CVE changes to our github repo, let me know if now looks as expected and the sync to osv.dev can be re-enabled

@hogo6002
Copy link
Contributor Author

Hey @dodys, thanks for the Ubuntu CVE updates! It looks good overall. One thing to mention is the aliases/related part.

I noticed that CVEs- are added as aliases to UBUNTU-CVE-, but Oliver previously mentioned we wanted to put UBUNTU-CVE- in USN- aliases. Doing both would make OSV treat the CVE- and USN- as aliases of each other because aliases are considered symmetric and transitive. This is not exactly what we want (see #2374 (comment) for explanation). We're trying to avoid making Linux distro vulnerabilities aliases of CVEs.

In the current OSV USN- data, the aliases field is still empty, so it will not cause the problem. But putting CVE- in UBUNTU-CVE- aliases is not ideal because we think CVE- and UBUNTU-CVE- are not always the same thing and shouldn't be aliases to each other. ("Aliases should not be used to refer to vulnerabilities in packages upstream or downstream in a software supply chain from the given OSV record’s affected package(s).").

I talked with Oliver, and we think there is no perfect solution at this point. But to keep things consistent with other Linux distros, we think it's best to put all vulnerabilities in related instead of aliases. We might find a better way to show the relationship between Linux distro CVEs and CVEs later on.

With all that said, we don't need to change the current USN data as it doesn't contain any aliases. The only change we probably want is to move CVE- to the related field of UBUNTU-CVE-.

@andrewpollock
Copy link
Contributor

The new UBUNTU-CVE prefix will also need to be added here and here in the OSV Schema.

@dodys
Copy link

dodys commented Aug 29, 2024

ok, CVE files fixed and PR open: ossf/osv-schema#265

Let me know if anything needed.

@hogo6002
Copy link
Contributor Author

hogo6002 commented Sep 3, 2024

ok, CVE files fixed and PR open: ossf/osv-schema#265

CVE files look good to me. Thanks for fixing them!

@dodys
Copy link

dodys commented Sep 3, 2024

Thanks for verifying Holly!
Let me know if you need anything else from my side.
I believe for the sync to be re-enable is just a matter of addressing the ignore pattern:
https://github.com/google/osv.dev/blob/master/source.yaml#L303C20-L303C36
But I will leave it to you if you want to do some testing before re-enabling it, so we don't create another issue for you.

@hogo6002
Copy link
Contributor Author

But I will leave it to you if you want to do some testing before re-enabling it, so we don't create another issue for you.

Hey Eduardo, we have re-enabled UBUNTU-CVE- back to OSV.dev, and the linux query timeout issue has been fixed. But I am also wondering why some CVEs don't have purl field while some have it?

@dodys
Copy link

dodys commented Sep 24, 2024

But I will leave it to you if you want to do some testing before re-enabling it, so we don't create another issue for you.

Hey Eduardo, we have re-enabled UBUNTU-CVE- back to OSV.dev, and the linux query timeout issue has been fixed. But I am also wondering why some CVEs don't have purl field while some have it?

Hi Holly,
sorry, but I am confused, both examples you shared have purl. Was it a wrong paste?

@hogo6002
Copy link
Contributor Author

Hi Holly, sorry, but I am confused, both examples you shared have purl. Was it a wrong paste?

Oh yeah, sorry. That was a wrong paste. Example without purl: https://github.com/canonical/ubuntu-security-notices/blob/main/osv/cve/2023/UBUNTU-CVE-2023-0120.json, https://github.com/canonical/ubuntu-security-notices/blob/main/osv/cve/2023/UBUNTU-CVE-2023-1296.json, https://osv.dev/vulnerability/UBUNTU-CVE-2023-46726, https://osv.dev/vulnerability/UBUNTU-CVE-2024-8354 (it seems all come from Ubuntu:Pro?)

@dodys
Copy link

dodys commented Sep 24, 2024

Hi Holly, sorry, but I am confused, both examples you shared have purl. Was it a wrong paste?

Oh yeah, sorry. That was a wrong paste. Example without purl: https://github.com/canonical/ubuntu-security-notices/blob/main/osv/cve/2023/UBUNTU-CVE-2023-0120.json, https://github.com/canonical/ubuntu-security-notices/blob/main/osv/cve/2023/UBUNTU-CVE-2023-1296.json, https://osv.dev/vulnerability/UBUNTU-CVE-2023-46726, https://osv.dev/vulnerability/UBUNTU-CVE-2024-8354 (it seems all come from Ubuntu:Pro?)

We are only adding purl for vulnerabilities that were fixed. We could change that, if that's what is preferred :)

@hogo6002
Copy link
Contributor Author

hogo6002 commented Sep 25, 2024

We are only adding purl for vulnerabilities that were fixed. We could change that, if that's what is preferred :)

Yes, please. We also prefer the purl to not contain affected range information, which means it shouldn't include a version. Our purpose in using the purl field is to find the package itself. For example, in other ecosystems, you might see a purl like pkg:deb/debian/chromium?arch=source.

So for Ubuntu, regardless of whether a fix is available, the package purl should only contain the package information (distribution information is fine to keep).

Original one: pkg:deb/ubuntu/[email protected]+dfsg-4ubuntu0.1?arch=src?distro=jammy
Expected one: pkg:deb/ubuntu/py7zr?arch=src or pkg:deb/ubuntu/py7zr?arch=src?distro=jammy

@michaelkedar do you have anything to add?

@dodys
Copy link

dodys commented Sep 25, 2024

We are only adding purl for vulnerabilities that were fixed. We could change that, if that's what is preferred :)

Yes, please. We also prefer the purl to not contain affected range information, which means it shouldn't include a version. Our purpose in using the purl field is to find the package itself. For example, in other ecosystems, you might see a purl like pkg:deb/debian/chromium?arch=source.

So for Ubuntu, regardless of whether a fix is available, the package purl should only contain the package information (distribution information is fine to keep).

Original one: pkg:deb/ubuntu/[email protected]+dfsg-4ubuntu0.1?arch=src?distro=jammy Expected one: pkg:deb/ubuntu/py7zr?arch=src or pkg:deb/ubuntu/py7zr?arch=src?distro=jammy

@michaelkedar do you have anything to add?

Thanks Holly,
I've made the changes and refreshed the data.

@hogo6002
Copy link
Contributor Author

Thanks Holly, I've made the changes and refreshed the data.

Thanks Eduardo! It all looks good now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants