-
-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow optional commit and tag metadata in Manifests and registries #3718
Comments
Another use case would be rewriting file paths in CI stacktraces so that we can provide a HTML link to the GitHub URL |
Can we split this into two separate issues, one for the General registry ( |
I think you would need the registry one first, no? |
What would actually need to be done for the registry changes? From what I can tell, it seems that: |
Oh, I was thinking that the manifest would be getting its info from a local Git clone. But yeah, if the plan is for the manifest to get the info from the registry, then we first need to implement this in the registry. |
Makes sense to me to have optional metadata tied to versions that can be used to improve various tooling. You would have to verify that the commit metadata resolves to the correct tree, right? |
The commit info we can probably get from registrator. But I was thinking we could have a cron job that periodically queries the repos and updates the registry as required. |
We do have |
Perhaps I should rename it for consistency? Should I just call it |
Yeah, maybe |
We probably need to keep a global one for non-released versions (e.g. a specific git commit) |
I'd like to better understand the use cases for the different pieces of information.
|
Trick question: What is the tooling supposed to do if the same package is found in multiple registries, with diverging values for the optional fields? |
Yes, exactly.
Two reasons
Pick the first one? In general, it shouldn't matter, only the tree hashes should, the optional fields are just there to help find the trees. |
What is the intended workflow to get the tag names into the General registry? I can see two possibilities:
Neither of those options seems great, so I hope I've missed some better approach.
A shorter URL is certainly nicer than a longer one, but it seems like a marginal win compared to the increased size of the registry, the logistics around syncing registry tag information and package repository tags, and the possibility that the nicer link suddenly breaks if someone mistakenly deletes a non-annotated tag.
This sounds contrary to the use case of investigating annotated tags. |
I don't have too many thoughts on what sort of tooling this could be useful for, but some other reasons it is useful to have tags:
This I don't have a good answer to yet. One other option would be to have a semi-regular job (say weekly), which goes through and verifies:
and if any updates are required, open a PR against the registry.
I don't think the size will increase too much: it's 1 extra field per version, this is dwarfed by the compat information per version. As for breaking things: my suspicion tags are likely to be more stable than commit hashes (e.g. if you rewrite history to remove an intermediate commit, you can still keep the same tag names, but commit hashes will change). It is up to users what they want to use it for, but they shouldn't expect either commit or tags to be completely immutable over time. |
Fair enough, those sound like decent arguments.
That depends on the amount of dependencies and changes in dependencies, but a stronger argument is that the tag name info can be expected to compress really well. Luckily this is a testable hypothesis. Starting from General registry tree hash 793278ad7a09a821cfac38e86fc150f6c9a00f7f (current about an hour ago), this has a size of 7063293 bytes from the package servers. Packing it up and repacking it with Now adding random commit hashes to all Versions.toml files increases the compressed tarball size to 9798614 bytes. Additionally adding tag names (constructed as In summary adding commit hashes to all packages increases the registry size by 38% and also adding tag names by another 3%. |
Wait, the gzip is larger? I know that hashes should not be compressible, but we store them in hex digits which should leave plenty of redundancy,
Thanks for trying this out: I guess ~40% increase in size is a reason to be hesitant. Personally, I feel it's worth it, but would understand if others feel otherwise. |
Actually that doesn't seem right:
|
Honestly, we may want to consider some sort of lightweight database to store this information: the disk usage of all these small files is getting pretty big. |
Or switch to xzip: it gives a 4.5MB file. |
No, what I'm saying is that |
I don't have a strong opinion whether this information is worth the size increase. Or rather, I do have concerns about the size, and I have in the past had timeout issues with the General registry on a company internal package server. But I also see a value in the added information. |
But we only decompress it in memory so? |
You probably mean decompress. The compressed tarball size is what matters for disk storage per installation, registry download size, and the registry part of the package server load. The decompressed tar file size matters for the in memory handling of the registry. The unpacked disk size only matters for those of us who like to look manually at the registry files or grep through them, or do other non-standard operations. |
Bump? |
An observation is that the commit hash is useful for registry maintenance and possibly specialized tooling but doesn't add any value to the primary Pkg functionality. I.e. it's hard to justify why it should add to the download size etc. when most of the time and for most of the users the information just isn't considered at all. This could possibly be solved by not distributing the full head of the registry repository or placing the commit hashes in a separate branch or in a separate repository. The latter options seem far from ideal and the first option requires some redesign and new tooling for the registry distribution. |
Currently we only identify versions by their
git-tree-sha1
. However this is sub-optimal when looking up git histories: GitHub doesn't provide a convenient way to find commits of a given tree, which means that e.g. TagBot has to jump through all sorts of funny hoops to try to link the tag back to a given commit.I propose the following:
Versions.toml
andManifest.toml
allow optionalgit-commit-sha1
,git-tag-sha1
(for Annotated Tags only) andgit-tag-name
(for the tag reference) that link to the corresponding objectsgit-tree-path
giving the path in the commit/tag to the corresponding tree.git-tree-sha1
, these should be considered mutable (i.e. registries may add/update/remove these as required)cc: @IanButterworth
Current PRs:
The text was updated successfully, but these errors were encountered: