Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCI SBOM inheritance #3202

Closed
p5 opened this issue Sep 6, 2024 · 2 comments
Closed

OCI SBOM inheritance #3202

p5 opened this issue Sep 6, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@p5
Copy link

p5 commented Sep 6, 2024

What would you like to be added:

I would like to be able to supply Syft with the SBOM of a base image (i.e. ubuntu:22.04) for use when generating my image (my-app:ubuntu-22.04).

Why is this needed:

Some SBOMs of larger OCI images take a large amount of time to generate (sometimes up to 10-15 minutes). By supplying an SBOM of an image used as the base, Syft only needs to check the layers added ontop of that base image, and therefore will considerably speed up the generation.

Additional context:

I have no idea of the technical limitations that may be faced with implementing this.

@p5 p5 added the enhancement New feature or request label Sep 6, 2024
@kzantow
Copy link
Contributor

kzantow commented Sep 9, 2024

@p5 are you using a --scope all-layers scan? If not, Syft is going to scan the resulting filesystem after all layers are applied rather than scanning each individual layers. I don't really know what providing an SBOM would do to help this.

We have a number of requests related to partial image scanning, though, so to clarify: I think what you're asking is to speed up scanning by only scanning the changes to the image?

Unfortunately, many things wouldn't quite work to get only the new packages, even if we were able to detect which things changed. It's important to understand how layers work: each layer contains full file contents for any file which has changed -- for example, an APK DB file will change if anything is installed, so the layer doesn't contain the change but rather the list of all installed software (any change results in a completely new file with everything installed, not just the changes). Other things at new paths might be easier to determine. There are many more questions that would arise if there was a way to skip earlier layers and only surface a filesystem with files from layer X to layer Y -- what about symlinks from earlier layers? This is a very real thing to consider, because if we didn't have a symlink as part of the filesystem since it was created in an earlier layer, we might not even be able to scan Fedora images (I think it was) which put packages in a nonstandard location that has a standard location symlinked. This is all to say, the "technical limitations" are real, but we'd love to be able to speed up scanning if we could figure out how not to miss things and not get unexpected results.

That said, if you could supply an SBOM for a specific layer (or Syft could figure it out), we could subtract the SBOM to only get a list of changes, but this wouldn't speed things up, and I don't think that's what you are asking for at all -- it would make things slower due to 2 distinct scans!

Could expand on anything I've missed with the request?

@p5
Copy link
Author

p5 commented Sep 9, 2024

I think what you're asking is to speed up scanning by only scanning the changes to the image?

Yeah, this sounds about right. Essentially cache the SBOM scan result of a base image to speed up scans of downstream images. In the end, these two results would be combined into a single one.

What I would like to do is:

  • Scan an ubuntu:22.04@sha256-abcxyz image to generate a SBOM (as a first stage in an image pipeline)
  • Supply this SBOM to subsequent scans of custom images which use this exact Ubuntu image as a base, or contain the same layers
  • The scan then ignores everything in the layers provided by ubuntu:22.04@sha256-abczyc, only scanning the layers added since the SBOM was generated

But then this will bring up the issue you mentioned where there's no easy way of skipping earlier layers. The main benefit we would be looking for is for a speed increase, which you mentioned wouldn't be the case.

I am a member of the Universal Blue project, where we fetch the latest Fedora container image daily, and apply various updates on top of these to install things like VSCode, Docker and anything else you would want on a development-focused or gaming-focused Linux desktop.
Since these desktop images provided by Fedora contain thousands of RPM packages, binaries and regular files, an image scan takes 20+ minutes. Multiply this by the 100 or so variants we build, occasionally multiple times a day, and our pipelines would get saturated quickly with these SBOM scans.
The idea was we scan the image at each stage in the pipeline, and supply that SBOM for the future stages to skip anything previously already picked up.

I appreciate you taking the time to explain the limitations to me. That helps a lot in understanding where the issues would lie - even though I don't believe I would even come close to a solution.

@p5 p5 closed this as not planned Won't fix, can't repro, duplicate, stale Sep 9, 2024
@github-project-automation github-project-automation bot moved this to Done in OSS Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

No branches or pull requests

2 participants