-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pip installs a wheel even if --no-binary is given #12954
Comments
If we built the wheel previously, and the source is identical, we'll re-use the cached wheel rather than spending the time doing a build which will give the same result. Even with |
Yes, it is exactly the case and the issue. IMHO, it is for most packages with extensions the wrong behavior. There can be different reasons so that the same source would give different wheels, for example different compilers, different library implementations (like for MPI) or different build options. If the user specifically states |
I personally would find it fairly surprising to pass an option that says "do not install binary packages" and have it be interpreted as "Do not use manylinux wheels from PyPI, but do use binary packages from elsewhere". That being said, there's overall two reasons for no-binary:
My gut feeling is that the latter group doesn't mind ignoring caches. They've already committed to the possibility of needing to perform potentially lengthy builds by passing no-binary at all, so trying to spare them the extra work of recompiling a wheel they previously successfully created isn't something they needed to happen -- and if it was successfully built once, it will most likely successfully build again. :) On the other hand, the former group needs to have some way of achieving their goals, which appear to currently be entirely impossible? It seems like the safe bet is to make both cases work, even if that means making one case work less optimally. (Or I suppose add a new command-line option, but this doesn't seem appetizing.) |
IMO this is a bug, the sdist should be rebuilt unless there is an explicit option pip could provide to build or not rebuild sdists. Do maintainers disagree? Would a PR be accepted by someone willing to submit a sufficently high quality PR? (e.g. include explicit tests). I'm not volunteering (at least yet), just trying to triage this issue. |
I'm uncertain. I don't buy the argument that it's wrong to use a cached build of a sdist. Maybe we should just improve the message - rather than saying "Using cached mpi4py-4.0.0-cp311-cp311-linux_x86_64.whl" we could say "Using mpi4py-4.0.0.tar.gz (skipping rebuild as we can use the previously cached result saved as mpi4py-4.0.0-cp311-cp311-linux_x86_64.whl)". If the only use case is cache busting, maybe what we need is finer grained options than But having said this, I'm not going to block a sufficiently high quality PR. I just don't think it's necessarily the right direction to take. |
Or maybe Anyway, seem like this can be solved multiple ways, I've updated this to "awaiting PR". |
I'm -1 on changing the behaviour of We can refine the caching keys, though (see for instance #11164). I'm also open to some sort of per-distribution-name cache control. If this becomes the tracking issue for that, and that is what is expected in the awaited PR, we should consider changing the issue title. As workaround in the current state of affairs, would |
I'm removing the bug label, as things are working as designed. |
Note that the name If we cannot change the current behavior, there should be an easy way to "(re)install a package by recompiling it from source". Note that |
That is indeed what it means. I'd be happy to review a PR improving the documentation in that area.
Agreed, that is why I consider it a work around until a better solution is proposed and merged.
Actually, it should not download because, if I remember correctly, |
Like I said above as well:
As far as "improving the documentation" goes, that documentation should also be updated in the output of
I still don't really understand this. Do you use The behavior that you claim is useful and a good default doesn't seem very useful to me! If I was okay with locally built wheels I would also be okay with PyPI wheels... |
We could also alias the option, indeed, to give it a more precise name, I agree.
I see at least these scenarios where this is a good default:
Caching by default benefits these kind of scenarios and it is my impression that these are the more common. Hard to prove, of course, and as I said, I absolutely agree this does not suit all use cases. We changed how |
Then why would you be passing --no-binary for this use case?
Then why would you be passing --no-binary for this use case?
This isn't in and of itself a reason to pass --no-binary. If you passed --no-binary because you simultaneously wanted to "don't trust someone else's binaries", but also repeatedly build with the same config settings, then that scenario makes sense. I explicitly mentioned this case above -- and I made the claim that I believe it is a generally unlikely case, and that this group of people also don't mind rebuilding each time, so even if it is useful to reuse built binaries, I don't think it's sufficiently useful to justify the current behavior. |
Hm, right, I might be confusing things right now. I'll try to set aside some time to recall the reasons we did it this way. In any case won't we still end up with subjective views about the prevalence of one group of users or scenarios over another? So
is probably easier compared to making a breaking change with all the deprecation cycle etc. |
One scenario where the current behavior is preferable is where you are trying to only "allow list" sdists from an index by doing the following: I general I would reccomend |
In general, my impression is that people using From the discussion here, it's clear that this isn't true for all users of But in the absence of compelling evidence that the users relying on the existing behaviour no longer want things to work the way they currently do, changing how |
I don't really understand what you're trying to describe:
If your goal is to only allow wheels in general but permit sdists for selected packages where wheels don't currently exist, then I would humbly posit that the correct solution to this is It would also provide forward-compatibility if a project starts to upload wheels, because you'd like to start using them. All that being said, I don't actually understand the goal of this in the first place. Since:
This would work by default, so the "allowlist" is presumably a security mechanism that allows you to vet that decision in advance... which implies you trust a project to provide executable code that you're going to execute without verifying, but do not trust the project to provide an executable build system in an sdist. The notion of "supply chain security by not trusting build systems" is not really a workable end result IMO. What you actually want is pinned hashes so that you can, well, install the things you trust.
... but my understanding was that this is the literal only case where --prefer-binary makes a semantic difference to the resolver, so it seems strange to call this out as an exceptional case where --prefer-binary doesn't work? Overall I still just do not understand what people mean when they say that the currently implemented pip functionality has an advantageous circumstance over the proposed change. The only case where it makes a difference as far as I can tell is when:
I don't get how not trusting sdists fits in here at all. |
Sdists are not trustworthy either if you don't pin the hashes, in which case --no-binary is a moot point. But the people using hashes will also tend to be hashing sdists so they can guarantee the results are trustworthy. But the other reasons for no-binary are also significant, and matter even without hashing:
People concerned about "significant performance improvements" would be using a wheelhouse, surely. Like the PyPI one. I don't believe that there's significant overlap between people concerned about building everything themselves with different options but not hashes, and people who are concerned about the amount of time it takes to perform a build. And it is not safe to reuse a cached artifact when the cache key doesn't correctly reflect the inputs, such as --config-settings at minimum but also $CC, $CXX, and arguably "any environment variable that setup.py looks up via os.environ.get()". It's a correctness bug, not a performance loss. Slow but correct is better than getting a wrong answer fast. |
As mentioned above, renaming |
I don't think this discussion is particularly productive. Backward compatibility means that we won't change the existing behaviour without a clear transition plan for how people who rely on the existing behaviour will be warned of the upcoming change, and either given a way to retain the current behaviour, or given a viable transition to the new behaviour. Discussing changes without covering those points is speculation at best, pointless at worst. And claiming that "nobody could want the current behaviour" without evidence is both unhelpful and naive. From experience I can state with some certainty that any behaviour, no matter how weird we might think it is, is almost certainly being relied on by someone who has no budget to change anything, and a business that relies on upgrading pip not breaking their workflow 🙁
Doing exactly this but not renaming |
Sorry, I put the commands the wrong way around, it should have been
Agreed, but that doesn't exist, so it's a real use case of pip right now. |
For the record, this works:
Note the |
Thanks for the feedback @paugier ! This would warrant a dedicated issue, which might be easy to solve (although not entirely trivially, due to the wildcard). |
Note that there are examples on the web where SKIP_CYTHON=1 pip install --no-binary pydantic pydantic<2 which might work (if the user has no locally built wheel in cache). They should instead use pip cache remove pydantic; SKIP_CYTHON=1 pip install --no-binary pydantic pydantic<2 Same here:
I didn't even try to find examples in supercomputer documentations but I'm sure bad usage of --no-binary can be found. I just got an email from a supercomputer support team with an example of unsafe/wrong mpi4py does not get into this problem because they do not upload wheels on PyPI. Therefore, they just use in their documentation The option name So there is a problem with an option used by people for 2 different purposes and which can lead to a wrong behavior. Moreover, detecting that the behavior is wrong is not trivial at all because there is no error and no warning, just not the right wheel installed. I tend to think that the most reasonable thing to do in this situation is to depreciate Besides, even now (pip 24.3.1), the documentation is just wrong:
|
FWIW, this was called out in the Deprecations and Removals section of the 23.1 release notes. For context, that was part of the work to remove calls to
Are you willing to do a PR to address that? This is low hanging fruit. Regarding, the
|
Another problem related to
does not work (i.e. installs a wheel previously built locally) because the wheel name is pyFFTW-0.15.0-cp313-cp313-linux_x86_64.whl. One then has to use
|
I created four distinct and tractable issues about this problem: |
Thanks @paugier! I have removed the Shall we close this issue? |
Description
I like to install mpi4py with a new build (without using a wheel already locally built). I try to use
--no-binary mpi4py
but surprisingly a wheel is used.Note that there is a pip config file:
Expected behavior
mpi4py should be compiled from source when
--no-binary mpi4py
is given.See the help message:
pip version
pip 24.2
Python version
3.11
OS
Linux
How to Reproduce
Output
Code of Conduct
The text was updated successfully, but these errors were encountered: