-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata / labels document and cleanup? #3
Comments
|
An update:
The format is as such:
I don't anticipate this format to change until we implement this RFC. The RFC is focused on getting a standard format available for the bill of materials. Final comments are I believe tomorrow, and once it's finalized we can look at implementing it. The format will be standard, like SPDX or CycloneDX, so after this is implemented the format won't change & will be documented.
If there are any labels you're still concerned about let me know which ones and I can get some more details. Hope that helps! |
Yeah, so that's what this issue is about. For the format to be documented.
This is really the most interesting and crucial part. E.g. for my use case it was about discovering dependencies in metadata from the java buildpack.
It really isn't obvious, and even if it was, the fact that these things are wholly undocumented essentially raises the questions:
Without documentations spelling out what the format is, it is hard / impossible to answer such questions with any confidence. Also even if you think you have it 'figured out' you are still running the risk it may change tomorrow because there really isn't any kind of contract a consumer of this data can rely upon. |
I get what you're saying, it should be documented and explained. That makes the format an API and something that won't change out from under you. I understand that's important for anyone that wants to integrate with the tools. Being completely transparent and as direct as possible, the present format isn't going to be documented or be any sort of official or guaranteed format. It'll continue to exist in its present format for the time being. We have no plans to change it until, as I mentioned before, the RFC that's coming that will prescribe a BOM format based on industry-wide standards like SPDX and CycloneDX. We'll be implementing that RFC once it is formalized. This RFC is also important because it will unify the format across all of the buildpacks. Right now, we can only control what the Java buildpacks are contributing to the BOM & its format. Other buildpacks and the stack make contributions as well. Those formats are out of our control, at least until this RFC is implemented. I believe that will address your concerns, but if not, please let me know. I can also leave this issue open if you like until that RFC has been implemented & we roll it out. Just let me know. |
Whatever you want to do is fine, re keeping the ticket open or not. This ticket is old and sort of forgot about it. At the moment I have no real issue. That may change if something breaks in our tools and I have to look at fixing it :-). In that case I may be back here with some questions. |
I'll keep it open & I will update it when we start to and when we implement the transition for buildpacks/rfcs#166. If anyone is interested in this feature set, feel free to watch this issue. It also seems reasonable that this RFC implementation/change will trigger a major version bump, since it's changing existing established behaviors. So you can watch for that as well. |
The Paketo buildpacks have implemented the new Buildpacks RFC for SBOM. Here are some notes on the transition:
Here's an example:
I'm going to keep this issue open for a little while, so feel free to post questions/feedback here, or reach out in our Slack channel. |
Okay so I tried the command but all I get is a bom which is 'null' for both 'local' and 'remote'.
It looks some stuff did break in our tools. It is related to determining whether a given image contains a specific Java dependency. The code we had for this used to look at the labels directly. And that code broke. Also looking at all the labels on the image I do not see any label there that still has this information. So I wonder how we are supposed to access that information now. If you can provide some pointers that would be appreciated. |
Note: the image was build using |
So here is where our code is looking for the bom:
But as you can see from the output below, the 'bom' there is
|
The bad news first:
At the moment,
We cannot store BOM information on labels going forward. Labels have a hard size limit in Kubernetes and BOM information can grow to be quite large easily going over this limit. As such, the BOM information going forward is stored in the image, in its own layer. You can use this tool to extract the new BOM information, https://github.com/sclevine/cnb-sbom/. When you run it, the tool should write files in the current working directory with the BOM files (or you can look at the code for the tool, and it's an example of how you could write a custom solution to pull out that info). You can also use The good news: After further discussions with the Buildpacks team, we were able to get the lifecycle updated to have backward compatibility. In short, starting with lifecycle 0.13.3 we're now able to support both the older style label-based BOM information and the new layer-based BOM information at the same time. We do need to make some updates to the Paketo Java buildpacks before this will work. I'm hoping to have that out in next week's release cycle (Fri 2/4). I will post back here when we've made the change. This doesn't mean there will be continued long-term support for the old-style label-based BOM format. We're still considering the older label-based BOM formats to be deprecated and they will be removed at some point. I'm just glad we'll be able to offer some overlap between the two so that users have a chance to move at their own pace. I hope that's helpful for folks. As always, please reach out and let us know if you have questions/comments. Thanks |
Actually we don't really care about the pack cli. Using pack cli is kind of of the table for our use case anyway, we have to access the information from inside of a Java process using a Java library / docker client. I only tried pack cli to try and see if the information is there in the image at all, using the 'officially document way' to access the info.
Hmmm... that is really rather impractical, as mentioned above our code is written in Java. I suppose we could somehow package up the binaries for that tool and then launch it somehow from code, but it isn't a great solution and requires complex packaging to acomodate for different OS's, or else we have to request that users install that tool themselves complicating the installation process.
Okay... hmmm, that brings up a whole lot of questions. Does this mean you have to pull/download the entire image to access that info then? That would be impractical because the image can be large. We are using this library: https://github.com/docker-java/docker-java/blob/master/docs/README.md and it isn't clear to me whether we can use it to access information from layers (somehow I doubt it). If you have any advice on how we might read the info (hopefully without downloading the whole image) please share. |
For reference for those reading along and using pack, this will be supported in the 0.24.0 release. As I write this, there's an RC available and if testing goes well, a release should be official in a few days.
It's my understanding that because it's a layer you do have to fetch the image. I don't know enough about interacting with an OCI registry to know if you can only fetch a particular layer or if you're stuck fetching them all. I have heard similar complaints from others about this change. I am just the messenger here though. As a buildpack author, we don't deal with the layers directly. The buildpacks tooling does all that. The way this is stored was a decision made by the Buildpacks project. I would suggest reaching out either on their Github or on their Slack. Given they adopted this design, they might have tips on how to efficiently extract the BOM. That would also allow you to get feedback about this approach directly to them & hear any future plans they have on the topic directly. Also, updates on:
This slipped and will be out next week, Fri 2/11. Sorry for the inconvenience.
I talked with some more folks about timelines and it looks like we'll be supporting the label-based BOM format, as well as the new layer-based BOM format through the end of 2022. |
I asked around about this and you do not need to download the whole image. I am not sure about the particular library you're using, but the way it works is this:
4a. The labels 4b. If you have an image where those labels are not present (most at the time of me writing this), you can look at the
The logic above is from the tool I'd previously mentioned: https://github.com/sclevine/cnb-sbom/blob/main/main.go#L148-L190 The logic it's using to extract the layer is here: https://github.com/sclevine/cnb-sbom/blob/main/main.go#L192-L207. Here's a gist of a bash script I wrote that uses curl to download the layer. I didn't test it extensively, but the couple of images I tried did work OK. I think this breaks down the process a bit more, if you want to implement it in Java, then the go code as that's using a library that hides some of this away. Last thing. This only works against a registry. It's using the Registry API. Getting this information from a Docker daemon would be different. That said, if your docker daemon has the image already then this matters a lot less. You've already spent the time to pull the image and can copy out the SBOM in a number of ways (like There are other tools you can use to interact with the registry API too, like crane. That can more easily fetch the manifest and config. I don't think it has a command for fetching the specific layer though. Not sure if you're still looking for this info, but I wanted to understand it myself so I figured I'd write it up and post it for reference. Hope that helps! |
I'm trying to make use of the metadata produced by the buildpacks in the labels of the image that it produces.
I presume that it is possible to obtain information such as whether a given dependency/jar was included in the image and if so what version. This is definitely useful information and I want to know it!
However...
the format of the metadata needs to be clearly documented so that potential consumers of the data can
a) understand how to parse the data
b) rely on this parsing / format / structure to remain stable in the future (i.e. the documentation of the metadata format is to represent a contract of sorts that consumers of the data can rely on.
I think there may be a bit too much metadata being attached. I think this because when I use 'docker inspect' on a buildpacked container the result is a file large enough to break some editors. The file I have is 500k in size. Granted, this is 'manageble' if handled with care, but parsing that data is still costly (memory and CPU). And some tools cannot handle it at all, for example
gedit
linux text editor freezes up as soon as I try to search for text in this file). So I question whether all this data is really needed/useful. (Hard to say now as I don't fully understand yet what is actually there. Some of it though seems to be the complete textual documentation for spring boot metadata properties, I think we probably do not really need all that documentation embedded in the metadata).The text was updated successfully, but these errors were encountered: