-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Add support for listing HTTP cached packages in pip cache list #13587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi @a-sajjad72, thanks for submitting a PR to pip, please be aware all pip maintainers are currently supporting pip on a volunteer basis and therefore it may be some time before someone can review. That said I have an early comment:
Pip will not accept a PyPI specific implementation, as it's not a Python packaging standard it won't work on arbitrary indexes and there is no guarantee PyPI will continue to support it in the future. |
Hi @notatallshaw , thanks for the early feedback. I understand your concern about the current approach being considered “PyPI-specific” because it relies on the I do have an alternative, more index-agnostic idea in mind that would not depend on those headers as the primary source. I suggest we let an initial review happen first (so I know if there are any broader objections), and then we can discuss whether shifting to that alternative approach is the right next step. If you prefer, I can outline that alternative sooner. Just let me know. Thanks again for the clarification and your time. Let me know how you would like to proceed. |
For myself, I won't be reviewing this PR while it is tied to PyPI specific features, as I would not accept it, and I don't know how much of a change is required to make it index agnostic. Though I won't speak for other maintainers. |
Thanks again. I’ll convert this PR to Draft and refactor it to be index-agnostic before asking for further review. Planned minimal first step:
If any maintainer would prefer an even smaller scope (e.g. wheels only, no placeholders), please let me know; otherwise I’ll proceed on this basis and update the PR description with a concise design note. |
I would advise that the scope be kept as small as possible while still providing a helpful user experience, to be more likely to be accepted. For example, I do not think there should be any use of PyPI only features, even as optional enrichment. I'm sorry I can't contribute more to a design discussion right now, I don't have much experience here with the design of the cache. Which contributes to why a smaller scope will be easier for a maintainer to start a review. |
I agree with everything @notatallshaw said. Furthermore, I’d like some discussion of the correctness of the whole approach. The HTTP cache is just that - a cache of HTTP requests, not a cache of downloaded files. The cache includes simple index responses and possibly other information pip has requested - presenting it as just holding wheels is misleading. Also, an index has no obligation to provide any information that a downloaded file comes from a wheel - so we know that accurate data is impossible to achieve, the best we can do is provide a guess. That guess will be accurate in many cases, but we should present it clearly as a guess, and not tempt people to rely on it. Finally, I’m concerned about the cost of this. Wheels can be big. Have you done any testing of performance, on a large HTTP cache, with some big wheels (multiple copies of PyTorch would be a good start!) in it? |
Thanks @pfmoore for providing your insights on this.
Yeah, I totally agree with you that HTTP caches are just saved HTTP responses and also our required files
When I started working on it, I came to know that some of the cached directories contains responses that are
Yes I tested it, and it (the pypi specific implementation) takes approximately the same time as What will be revised approach?The core of the revised approach is to identify packages from the
This approach offers a practical and significantly more reliable way to list cached packages without making incorrect assumptions about the cache's contents. Please let me know, I will start working on it and update the PR's description. |
@pfmoore The PR description has been updated to reflect the revised HTTP cache listing implementation. Please take a look when you have time, and let me know if anything needs to be changed. |
This PR adds support for listing HTTP-cached packages in
pip cache list
and introduces new flags to control output, addressing issue #10460.Problem
Currently,
pip cache list
only shows locally built wheels stored in thewheels/
cache directory, but ignores HTTP cached packages stored in thehttp-v2/
orhttp
directory. This leads to confusing behavior where users see "No locally built wheels cached" even when pip has cached wheel files from PyPI downloads.$ pip cache info Package index page cache size: 89 MB Number of HTTP files: 815 Locally built wheels size: 8.9 MB Number of locally built wheels: 35 $ pip cache list No locally built wheels cached. # Misleading - there are cached packages!
Solution
This PR extends
pip cache list
to extract package information from cached file content by inspecting the package structure offline, and adds flags to control what is listed:cachecontrol.Serializer
to locate cached response details..dist-info/WHEEL
metadata from the ZIP; when needed, construct a filename using the firstTag:
entry; fall back to{name}-{version}.whl
.{name}-{version}.tar.gz
from the root directory name.--http
: list only HTTP cache files.--all
: list both locally built wheels and HTTP cache files in a unified list, suffixing HTTP entries with[HTTP cached]
.CLI changes
Users can control which cache types to list:
Examples
Human-readable output shows filenames and sizes. When
--http
is used alone, entries are listed under an HTTP section; with--all
, entries are unified and HTTP items are suffixed.Implementation Details
The implementation extracts filenames by reading cached package structures:
Wheel Files
.dist-info
directory to get package name and versionWHEEL
metadata file and uses the firstTag:
value when constructing a filename if needed{name}-{version}-{tag}.whl
; falls back to{name}-{version}.whl
Tarball Files
{name}-{version}/
{name}-{version}.tar.gz
File Sizes
.body
file size (actual package) instead of metadata file sizeExclusions
Additional notes:
Testing
Comprehensive test coverage includes:
All
pip cache list
tests pass and verify the offline body-content inspection approach.Feedback & Discussion
All suggestions, reviews, and discussions are welcome. If there are any concerns about naming, option design, or consistency with the existing
pip
CLI and API, I am happy to refactor or adjust the implementation. The goal is to make this feature both intuitive for end users and maintainable for contributors going forward.Related Issues
Closes #10460.
This PR directly addresses the problem reported in #10460. If there are other related issues that overlap with this functionality, please feel free to reference them here so they can be resolved by this change as well.