-
-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persistent dynamic core metadata breaks user assumptions #1334
Comments
This is a standard now: https://peps.python.org/pep-0643/ Are you not encountering this with new releases of for example poetry-core? |
For context, the main reason for that PEP is so resolvers can use metadata from source distributions directly without having to go through the build process. |
Have you perhaps linked a wrong PEP? https://peps.python.org/pep-0643/ does not mention PKG-INFO at all. |
Not that I am aware of. Neither I can see it in their changelog https://github.com/python-poetry/poetry-core/blob/main/CHANGELOG.md or in the code when I search for PKG-INFO https://github.com/search?q=repo%3Apython-poetry%2Fpoetry-core%20PKG-INFO&type=code |
That file is documented in the text on top here: https://packaging.python.org/en/latest/specifications/source-distribution-format/ I could be wrong, but I think other backends are simply not following the spec. Flit I think (if it supports 2.2) would be fine because nothing there is dynamic but Poetry and others allow for certain levels of non-standard dynamism similar to Hatchling build hooks. |
I'm asking for more feedback! https://discuss.python.org/t/respecting-core-metadata-2-2-when-building-from-source-distributions/48886 |
My understanding is that the file exist so sdist can be queried for metadata statically. Not that build backends should use it when generating wheels. |
A simple fix is surely just to delete the The larger issue is that by doing this you're essentially breaking the contract of static metadata, which is that any metadata declared in the sdist as static MUST be identical in any wheel built from that sdist - item 1 in the specification here:
You have a loophole in that you're not technically building a wheel from the sdist if you're modifying the code in the sdist, but unless you also change the package name or version, you're violating the spirit of the specification (IMO as author of that spec). |
If you are hand-modifying the extracted source code originally pulled out of an sdist, I think it is exceedingly fair to say that your hand modifications preclude you being the target userbase of:
I also think that a PEP 517 build backend, itself, doesn't have a good argument to make that they are unable to rely on the data available from source distributions "and need to use the costly PEP 517 build mechanism to extract it". That being said it also seems to me the spec mandates that hatchling doesn't have a permitted option other than to parse both the dynamically generated metadata as well as the static PKG-INFO metadata, compare them and require them to match, then fatally error out if they don't match. So people patching this out would still need to delete PKG-INFO. However, at least they would know they have to do so since the build would error out rather than silently generating the wrong contents. Emitting a fatal error on mismatched contents but using the pyproject.toml as the canonical version would have also caught #1325. |
The spec doesn't mandate that backends recalculate the values and compare, although I can understand the argument that it's a reasonable check to make to catch corrupt sdists. Having said that, though, I can also understand the argument that manually editing a sdist is unsupported (just like manually editing a wheel would be - both are built distributions) and therefore there's no need to make a potentially costly and error-prone1 check. Footnotes
|
Calculating the version metadata from VCS details should, in any well-designed plugin -- I'll confess I haven't looked around at most of them but I do know how setuptools-scm works and that's basically the industry standard anyway (hatch-vcs is a wrapper on top of it, for example) -- anyways be using PKG-INFO as the primary source of truth before running git. And have done so since before metadata 2.2, since that's fundamentally "the" way to know that information if there is no git repo, as there will not be in an sdist. |
Yes. It's just not what people expect to need to do. Either way, I think the real confusion here is based on the fact that the source distributions are now treated as built distributions and if we reuse them downstream as sources we are probably misusing them. While I see where this is coming from, it is going to be particularly hard to explain to people. |
Just to be clear it's perfectly acceptable (necessary in many cases) for the wheel to have dynamic contents; it's only the metadata that is static. |
There is a lot of talk about dynamic vs. static metadata. I know that metadata in the I see I used the term "dynamic core metadata" in the issue title and perhaps I was misled to do that by the commit message of #1309. In fact, the issue we have is not relevant to whether the data is statically listed in the Perhaps I should have said Persistent core metadata breaks user assumptions instead. Should I edit the issue? |
I don't think that is a point of confusion. It is standard practice in the world of software vendors to apply patches to the software they distribute as required in order to ensure the software is correctly integrated for distribution. There is no arbitrary restriction on a software vendor that they can apply patches to a .py file but cannot apply patches to the metadata. Fedora uses the wheel dist-info metadata to generate rpm dependencies. They need the dist-info metadata to be maximally correct in order for the built rpm to be maximally correct (another option would be to give up on generating rpm dependencies from the dist-info). That means that if it is necessary to prevent users from updating their rpms of one package to something too new for a second package, for example: because the second package was published without max version pinning but after it was released its dependency (the first package, which we shall say doesn't follow semver) released a new version with a breaking change... ... the canonical rpm way to do this is to patch out pyproject.toml, and let that propagate into the rpm. This is a serious problem for many PyPI packages, and a point that conda advocates use in favor of conda: it is possible to patch the metadata of a conda package without re-releasing a new package. Many people have wished for the ability to edit old PyPI releases to add upper pins for known after-the-fact discovered incompatibilities. I believe the point @hroncok is trying to make is that this effectively means that Fedora has to stop advising people to use pypi.org sdists and start using github autogenerated archives instead, since "sdists are built distributions and cannot be modified, but only sometimes, it depends on what you want to modify" is difficult to effectively communicate to people whereas "do not use pypi.org" is easy. ... If there's an argument to be made that:
holds true when downloading an sdist, but not when git cloning the repo or downloading a github autogenerated archive, then doing the latter is the only spec-valid option. And if the promise of static metadata is bound to the package name + version, then git cloning the repo or downloading a github autogenerated archive should also be a spec violation, which would leave us at "it is a violation of the Metadata 2.2 spec to distribute third-party packages that have metadata which is in disagreement with the metadata that PyPI registers for the same name+version", in other words, it's a Metadata 2.2 spec violation to exercise one's Open Source Software right to modify and redistribute software. So I dunno what the answer here is. :) Other than that I wish tools wouldn't allow errors to happen silently. |
I'm sorry I don't have a lot of time today to dig deeper on the issue but would it be possible to modify the PKG-INFO file directly? |
As long as the wheels being made here are never for publication, but only as a step in the RPM build process, I don’t think there is a practical issue here. The static metadata guarantee is only to ensure that wheel-based installers see a consistent view. Also, we discourage using pip to install into your system site-packages, so if anything does break, I’m happy with “we warned you” as a response. |
Yes. But the person doing it would need to know that. The problem isn't that it's not possible, the problem it that it breaks expectations. |
That's why we don't want our users to do that. We use RPM packages, not pip. I don't see how is this relevant to the problem here. |
As a practical non-beautiful immediate solution, we can remove any existing We could then implement a mtime-based validity check. But that is no longer simple but rather complex :( |
By the way, that’s something of an overstatement. Of course you have a license-granted right to modify and redistribute. But you aren’t entitled to support if you do so, and you are responsible for the consequences of your changes. The spec, like every other interoperability spec, is simply a formalisation of the community’s expectations. One of which is that “if there is a Footnotes
|
I'm not sure how that's different from what I said. I didn't say that metadata 2.2 forbids users from modifying and redistributing software. I said that "if the promise of static metadata is bound to the package name + version", then metadata 2.2 tells users that if they modify and redistribute software they are "people that break the spirit of the standards and should not do so because it's being a bad citizen of the python ecosystem". My issue with this is that I find the concept of, effectively, "modifying and redistributing software is violating the spirit of python standards" to be... ... a violation of the spirit of open source. And that makes me uncomfortable since that's never previously been the impression I got from the Python leadership pronouncing their official stance as adjudicators of PEPs about normative good behavior in a python ecosystem. I'm not worried about whether I'm legally culpable if I modify and redistribute open source python software -- just about whether I should feel like a welcome member of the community if I do. |
I don't think this is ambiguous at all. You tell that a source tree came from a sdist if, and only if, you are the tool that extracted it. This coincidentally has good overlap with tools that need to process PKG-INFO for their intended use, which is "software that would like to introspect metadata without expending the time and security risk of executing a build backend and maybe compiling the entire cuda ecosystem along the way, and would like to have a guaranteed assurance that it is accurate". I have not yet grasped why a build backend should have a vested stake in not executing itself to derive metadata once it's already being executed to build a wheel. As I pointed out, the version info isn't a good counterexample (and broadly speaking, if a build backend dynamically calculates any given metadata value then it should be marked as dynamic -- version info really is a special case because one doesn't ship an sdist with the git repository metadata included). People who patch the extracted source tree which once came from an sdist and then run tools on the results are in a somewhat exclusive camp, I think, of "people who patch the contents and then immediately run a build backend in order to build a wheel". So I think "build backends shouldn't use PKG-INFO to override/ignore the values of pyproject.toml, and neither should a build frontend when running on a directory instead of a (possibly downloaded) archive file" is a pretty practical way to frame the distinction. EDIT: in both cases you need to run the backend anyway, which is the specific reason why it's also IMO perfectly reasonable to run it to get metadata too. |
I will reiterate this again: It's perfectly acceptable for the wheel to have dynamic contents; it's only the metadata that is static. The PKG-INFO file is not yours nor is it the user's, it's part of the tooling of the Python packaging ecosystem. This in no way pertains to the spirit of open source because no one gave us that file to ship, it was generated by... tooling of the Python packaging ecosystem. I'm going to close this so that we can focus on the broader discussion: https://discuss.python.org/t/respecting-core-metadata-2-2-when-building-from-source-distributions/48886 |
Would it be possible to revert this for now until the ecosystem-wide discussion progresses or at least provide an escape hatch (e.g., allow us to patch a single constant in the hatchling source to restore the old behavior)? This is a breaking change that was pushed in a minor release without notice. |
I'm not going to revert this as I don't consider it a breaking change and this is necessary for distributions on PyPI to indicate static metadata to aid resolvers and therefore users. The current workarounds are to either delete the PKG-INFO file or modify the contents. |
I can relate to that. However, this does not mean this is not a breaking change. |
How about an option you may define during patching that instructs Hatchling to ignore the PKG-INFO file? That's the best I'd be willing to do. |
What do you mean by option? Something to put into pyproject.toml? |
I belive @gotmax23 already propsoed:
So if we can easily flip that flag globally for us, then I guess yes. |
I will do this over the weekend but just FYI I'm not going to document the option. |
I still don't understand how it is necessary in any way shape or form.
A build backend that produces a PKG-INFO provides what is necessary for distributions on PyPI to indicate static metadata to aid resolvers and therefore users. It does so regardless of whether it then reads the file back when building a wheel. The spirit of Metadata 2.2 is fulfilled as long as it is created for the purpose of pip installing from PyPI in the non-patched case. The actual wording imposes some additional requirements. The only legal move for a build backend is to use pyproject.toml as the authoritative source and either:
The Metadata 2.2 standard strongly implies you should be doing the latter, and it's a good idea anyways as it will catch conformance bugs. And it avoids the complexity of having multiple codepaths to produce a wheel, which has already been the source of entrypoint bugs that produced broken wheels when pip installing an sdist (or using |
Please post in the discussion, not here. |
I am a bit worried that having an undocumented option like this would only cause more confusion. |
Then I will document it if it's truly what you prefer. Give me two days to do this and release. |
My apologies -- I didn't realize that discussion was unwanted here. I don't believe I have a Discourse account and based on previous experience with the discourse software I'm reluctant to start, so I think I'll have to exclude myself from the topic. If anyone would like clarifications or has questions for me -- feel free to reach out to me via email instead of replying here. Bye. |
To be honest, I don't actually think this is what is needed to resolve this situation. Maybe we should wait for the discussion to move somewhere? |
Can you please explain how the option will not work for you both? |
First of all, it means we would need to either:
or
|
Okay say a project's source distribution derived the version from Git and now the PKG-INFO file has that saved. Then you try to build a wheel from that and attempt to recompute the version (the maintainers definitely don't want this to happen) with heuristics and ultimately come up with something slightly different or a default value and then Hatchling throws an error because there is a mismatch between the two files. How does that help anyone at all? |
Well, the error will give me hint. Either it will help me know the version I should have used, or help me realize I want to remove the PKG-INFO file. |
I thought the goal was to prevent such friction and have a way to make everything work without your input? |
Yes. And a config option is an input. |
That's not very fair 🙂 what you're describing is something that might happen once somewhere in your infrastructure whereas the mismatch of metadata has the potential to happen frequently. I'm trying to help here. |
My opinion is that mismatch of metadata should always fail, regargless of something that happens once in our infrastructure. And I am not entirely sure that you understand why this worries me. From your comments it seems like you think I can just go and make the problem disappear on our end by doing a once time action, which is not correct. I can try explain what's happening better, but I think other have already done that in https://discuss.python.org/t/respecting-core-metadata-2-2-when-building-from-source-distributions/48886 -- where you asked to move this discussion. I only commented here to try not to waste your energy on a workaround that won't solve the problem. |
To clarify. It's late, I don't have energy to write down a complete and elaborate comment. I want to. I am glad you are trying to help. But I won't be able to do that today. And I was afraid that if I don't comment at all, you might do something that will not end up as a solution, and that might discourage you later. |
Hello,
in Fedora, @frenzymadness wanted to update hatchling from 1.21 to 1.22.2. However, we encountered several Fedora packages using hatchling to regress.
For many of the packages, we essentially do:
However, the downstream changes to metadata in pyproject.toml are now silently ignored because all the metadata is read from PKG-INFO.
I have looked up the changelog which says:
It seems like this has happened in #1309
I am aware of no prior discussion and there is no rationale provided in the PR -- it looks like the change appeared out of the blue. What problem does this solve? What is the purpose of this behavior?
I assume that when I change the metadata in pyproject.toml it will affect the wheel. And many other folks in Fedora assume the same. This change breaks that assumption.
Essentially, this seems like a cache that fails to be invalidated. Would you please consider our use case and try to mitigate the impact of this change by doing for example one of the following?
Thanks.
The text was updated successfully, but these errors were encountered: