-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: make it clear we're working with PURLs #468
Conversation
random aside - in corgi land we sidestepped the overloading of the term 'package' by just using the equally imprecise term of 'component' which always has a 1:1 relationship with a single PURL. That gave us some 'wiggle' room if ever we had to assign a purl to something like a single script or something less then a package. |
I think in my mind, a package is something that can be referenced by a pURL. A pURL is just a name for a pURL-addressible thing. The ambiguity I think I was talking about in the glossary is more around... a pURL doesn't describe the binary bytes produced. Those would be eg, a specific build of "Bob McWhirter" is a name, somewhat ambiguous (there's at least one Canadian and one Arizonan sharing my name). This particular Bob who is typing now is the the artifact with my particular All that being said, I'm not opposed to I think similarly, while other systems may store "URLs", they are ultimately speaking about the thing behind the URL, e.g., the endpoint, the webpage, the podcast you can download. Naming is hard. Naming things that name things is even harder. Do we work with names? Do we work with things, and just use names to refer to them? I dunno. |
I guess I may fall into the camp of "a package is anything that can be described by a pURL" which might as well just be a pURL. Maybe. |
Right now, all that data comes from PURLs. We also do have packages, but in the context of an SBOM. Going in the direction of what @JimFuller-RedHat mentioned before I guess. We also have CPEs. And we also have file-hashes in the context of an SBOM. All of those reference "packages of an SBOM". I think we are on the same page, PURLs are names for packages. But one package can have multiple PURLs assigned, and additionally other names of other types (like CPEs, digests, and whatever the future will bring). If we'd add functions/endpoints to search for CPEs or hashes. Where would they end up in the HTTP endpoint namespace? |
when you say 'But one package can have multiple PURLs assigned,' Do you mean a package can contain other packages which have purls or do you mean a single package can have multiple purls ? If the later then eg. a single unique id assigned to a single package creates a contract ... put another way if we do have other purls that need to map to a package I would suggest pointing to canonical purl (ultimately this is 'our purl ') rather then directly map to package. This is a subtle difference but we have already seen differences in who mints a purl ... sometimes we are the first person to do so, then sometime later upstream mints a purl. Also we should expect wholesale refactoring of purls a real possibility as that proto spec matures and upstream work more with it ... it becomes burdensome to have to update all associations in database, etc. when this happens. From a REST API (conceptual model) pov we can make the association transparent ... eg the above applies more to the logical internal model to handle unknown change later on . Lastly, I could be easily convinced I am overthinking all this.... |
I guess I was thinking CPEs are one possible name/identifier for products (the stuff Dejan is working on). |
In SPDX-land, a single package can have a list of external references, which can by of type And I do recall a conversation where that might actually make sense. Publishing the same artifact to multiple maven repositories. |
I know we're going into the weeds here, but I also think So the same jar published elsewhere would have to include |
+1 to external reference ... and maybe all this kind of stuff be represented as labels/tags ? maybe that is overloading generic thing too much and worth representing in core model as 'external-reference' ... that makes it clearly different then the canonical purl. |
I think the model that we have is actually quite good. We have base PURLs, versioned PURLs, qualified PURLs. Referencing those from SBOM packages/components. Also towards CPEs and hashes. It's only the names that feel confusing. |
just to be sure. The current way the UI is fetching packages is:
Are those step still going to be valid? I would expect "yes" right? |
Can you describe the confusion that happens quite often? What problems are we solving?
I don't see how this clarifies things. To me, I also don't understand the terms "base", "qualified" and "versioned". Will the consumers of our API understand them? |
Also, a general thought:
|
I also did stumble over this myself when navigating the code. You see something that's named "package", but indeed it's a PURL.
How would you ask then for a package by CPE?
I don't know, but they already existed. And, they only make sense on the context of a PURL. Not in the context of a "package". So it might be that you are too confused by the naming. |
That depends on what you mean by "package". Package of an SBOM no. Package referenced by a CPE, no. Package referenced by a PURLs, yes. |
Correct. But indeed what you get back is information coming from the PURL. Not from the package itself. Basically it splits the PURL into a lot of database entities and returns that.
That's why one idea was to just go for |
What is the use case for that? To me, a purl is not a resource -- it's just the name of a resource, in this case a Do we have a need to return a paginated list of names? |
Could some of the confusion be coming from the existence of both a |
Looks like that. If you take a look at e.g.
Right, the That's why I think it makes sense to rename services and entities too. But before doing all the work, I think we should agree on that. |
Before what? If you mean before we implemented this, then I might question why the existing
This sentence is confusing. A PURL is just a name. It's a logical concatenation of attributes of a package (the concept, not the table). A package will have those attributes whether a particular SBOM says it has a PURL/CPE/identifier or not, right? From the above doc, "SBOM packages are an entity of their own, but may have zero or more PURLs or CPEs (or other identifiers)". That sentence betrays a lack of shared understanding, I think. In an abstract sense, a package will always have a PURL, because it can be constructed from the package's attributes. From a literal reading of the spec, a package may not have an identifier, but is that what we should be modeling in the database? My intuition is that our database designers -- Jens and Bob -- need to agree on whether we're modeling entities in the abstract or according to a spec. |
Because it didn't fit.
I don't think it's the sentence that is confusing, but the structure in the database. The Actually the package does not. It comes from the context of the SBOM. The same package can be named by different SBOM by different names.
Again, in the database we model PURLs (with tables mentioned above). Not packages. I think the database structure works just fine. We have (actual) SBOM packages, named by PURLs and CPEs. (any maybe others in the future). It simply is the naming. |
While I think a package is "anything that can be address by a Package URL (pURL)" I don't hold it strongly enough, nor do I have enough about SBOM packages in my head to disagree with @ctron right now. My vote would be to proceed (after we winnow down other outstanding package-related PRs.......) and see how we feel with the purl-centric naming. |
), | ||
)] | ||
#[get("/v1/package/base/{uuid}")] | ||
#[get("/v1/package/by-purl/base/{id}")] | ||
pub async fn get( | ||
service: web::Data<PackageService>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to rename the service to PurlService?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do, I'd argue the api should become #[get("/v1/purl/base/{id}")]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not the service, but the functions and types that deal with PURLs. The main namespace/module still is package
, which is ok.
I updated the PR, renaming the package tables to "purl". Renaming the fields as well. |
10d7537
to
4183a2f
Compare
4183a2f
to
3cd6e71
Compare
As it feels the confusion happens quite often, even to me, I think we should rename things.
We have "packages", which have some meaning according to our glossary: https://github.com/trustification/trustify/blob/main/docs/glossary.md#package
However, in the next section that definition actually calls out PURLs:
So there are "packages", and "PURLs", which can be used to give "packages" a name.
In our current API and database structure, we have (because history) several entries, endpoints, and services which are named "packages", but actually deal with PURLs only.
This PR tries to clean up the user facing API (REST endpoints) by changing the base URL from
/api/v1/package
to/api/v1/package/by-url
, when it comes to PURL based calls.Next steps
It would also be reasonable to rename functions, services, and database structures towards PURLs. But naming is hard, so let's start small.
Alternatives
One alternative could be to use
/api/v1/purl
. However, those API still deal with packages, just by using PURLs as name. So keeping "packages" make sense to me.