Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the experimental fast-deps feature #11478

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pradyunsg
Copy link
Member

This feature did not yield significant speed-ups, possibly due to insufficient optimisation of the calls being performed.

Adding a cross reference to #7049 (comment), which links to other issues that are relevant for making a decision on what to do with this feature.

FWIW, we did reuse nearly all of the code for PEP 658 though, so the effort put toward this feature was certainly not in vain. It might just be that our implementation of lazy_wheel could've been optimised further, but given that PEP 658 is on the horizon, I reckon it's OK to trim this functionality since we're going to get the similar semantics with the dist-info-metadata.

This feature did not yield significant speed-ups, possibly due to
insufficient optimisation of the calls being performed.
@pradyunsg pradyunsg added state: needs discussion This needs some more discussion C: build logic Stuff related to metadata generation / wheel generation labels Sep 30, 2022
@pradyunsg
Copy link
Member Author

I'm not sure yet if this is the route we should go down, FWIW, but I'm filing a PR to facilitate the discussion around this and simplify it to a go/no-go decision for us. :)

@dholth
Copy link
Member

dholth commented Oct 3, 2022

Something to time: a lazier lazy wheel #11481

@casassg
Copy link

casassg commented Oct 5, 2022

Given #10748 (comment) it may be good to keep. If we can make --dry-run --report avoid downloading wheels for report generation, then we would be able to have a download-less dependency closure generation via pip install --use-feature=fast-deps --dry-run --ignore-installed --report - . If we remove this feature before PEP 658 is merged, then we will be back to requiring to download wheels until the feature is implemented on PyPi (or for that matter implemented in internal index services like JFrog Artifactory which some companies use to serve internal wheel repositories).

@jbylund
Copy link
Contributor

jbylund commented Oct 5, 2022

Agree with @casassg

Warehouse doesn't yet support pep 658, and many people (myself included) use different package repositories which may not support pep 658 even if warehouse does. So I think it's at least worth profiling/looking over the existing implementation to see how it can be improved before removing it entirely.

Speaking of... anyone have a good testcase?

@github-actions github-actions bot added the needs rebase or merge PR has conflicts with current master label Oct 15, 2022
@ddelange
Copy link
Contributor

Hi 👋

xref #11512 (comment) the warehouse PR is now merged

is the plan still to remove fast-deps? it is still an opt-in feature right? do I need to enable it since #11111 mentions it 'builds on top of it'?

@cosmicexplorer
Copy link
Contributor

@ddelange what I meant by that was that in #11111, pip extends the code paths and classes that we initially developed for the purposes of fast-deps, to represent "virtual" dists which do not correspond to any concrete place on the filesystem and contain only the metadata necessary for the resolver to make progress. I meant that in the sense @pradyunsg describes in the OP of this PR, to underline that the research effort served a valuable purpose even if it didn't get the eye-popping numbers we wanted without some more effort. I did not mean that the fast-deps feature itself was load-bearing in any way, so thanks for asking so I could clarify that.

Analogy: optimizing --layout zipapp PEX files

In pex-tool/pex#2175 my goal for pex was to reduce i/o and computation from compressing very large amounts of third-party code we do not control. I figured out a hack to reduce that effort, but as @jsirois points out, it's often more robust and maintainable in the long term to develop standard protocols that are designed for the task at hand than rely on clever hacks to optimize existing ones. In that case, while I found it was possible to cache traditional --layout zipapp pex files with a clever hack, I learned that the --layout packed format had been designed from the ground up to enable that cacheability. Analogously, @uranusjr spent the time to develop PEP 658 to avoid the need for us to use clever hacks like fast-deps to get the information pip requires. In both cases, we were able to leverage features of the zip file format to put off standardization for a while, but we're in a better place now that more thoughtful infrastructure has been laid down.

Notes on Cacheability in Package Resolution

Regarding cacheability: there remains one further avenue to investigate from #7819: amortizing/memoizing certain subproblems across separate invocations of the resolver. While I was able to identify one compartmentalized improvement in #7729, in general I still think pip (and all other package managers) currently spend most of their time recomputing a lot of the same constraints against each other over and over again for most hot-path use cases; for example, after identifying the subset of wheels that are compatible with a given interpreter, that information can be recorded and cached indefinitely by the local pip client to immediately jump to the subset of compatible wheels in future resolves.

spack currently performs an analogous process, deserializing and processing all of the dependency constraints annotated in spack package recipes upon each spack invocation (cc @tgamblin @alalazo)). In contrast, spack's main package repository is maintained as part of the spack codebase itself, so it can always expect the luxury of loading everything from the local filesystem. Still, it demonstrates that amortizing dependency graph traversal while incorporating updates from the source of truth is an intrinsic issue to package resolution, as opposed to the problem solved by fast-deps which was a temporary workaround to slowly-evolving packaging standards.

Local indices to amortize resolves: learning from Cargo

My vision is for pip to eventually perform most resolves entirely locally, only calling out to pypi when it's unable to find a satisfying set and it determines that a newer release of some package might be able to do so. I think this should be feasible without needing to store anything close to the magnitude of all the package contents on pypi, because cargo does that right now, and has some useful comments (https://github.com/rust-lang/cargo/blob/263d24baca913d7ace29e2f59a3254e47272fd6b/src/cargo/sources/registry/index.rs#L25-L30):

//! One important aspect of the index is that we want to optimize the "happy
//! path" as much as possible. Whenever you type `cargo build` Cargo will
//! *always* reparse the registry and learn about dependency information. This
//! is done because Cargo needs to learn about the upstream crates.io crates
//! that you're using and ensure that the preexisting `Cargo.lock` still matches
//! the current state of the world.

As detailed in that file, cargo essentially just serializes a list of the dependency relationships to json in order to publish their index, which is exactly how I implemented the persistent dependency resolution cache in #7819 that multiplied the effect of the fast-deps technique alone (https://github.com/cosmicexplorer/pip/blob/40cec94281aa9165c689dd4bb69f953e2fd5fb06/src/pip/_internal/resolution/legacy/resolver.py#L171-L304).

fast-deps (and now PEP 658) gives you the dependency relationships needed to iterate the resolver over a single candidate without actually downloading the many-megabyte zip file, but pip still currently has to pay the cost of network calls to move the resolver forward to evaluate each candidate, and if the candidate is discarded it has to do that again until the resolver finds a satisfying dependency set. In this file, cargo describes generating an index over another index to amortize the graph traversal performed during a cargo resolve while also being able to incorporate any updates to the index that have occurred since the last resolve. There's nothing special cargo is doing here with rust that we can't do in python; as they note:

//! Consequently, Cargo "null builds" (the index that Cargo adds to each build
//! itself) need to be fast when accessing the index. The primary performance
//! optimization here is to avoid parsing JSON blobs from the registry if we
//! don't need them. Most secondary optimizations are centered around removing
//! allocations and such, but avoiding parsing JSON is the #1 optimization.

Assumptions and Requirements

Pip shares Cargo's assumption that each individual release of any package is immutable, so the dependency relationships that fast-deps/PEP 658 obtains from each wheel can therefore be cached and reused indefinitely after they are first computed. My hypothesis is that we can improve resolve performance by:

  1. (short term) mirroring every lookup of package==version#checksum => [dependency requirements] that pip extracts from pypi during a normal resolve into a local database of some form, to be reused whenever pip evaluates that installation candidate in future resolves.
    • As noted in consume packed wheel cache in zipapp creation pex-tool/pex#2175, every new cache implies a tradeoff of speedup/disk space. I suspect this will be a small enough dataset to avoid that concern, though, especially if we restrict ourselves to just indexing the packages pip has seen before instead of trying to contain all of pypi at once.
    • As with cargo, there are some dependency relationships that are not immutable, like editable installs or the most recent release of a package, and we would need to be able to incorporate that kind of information too.
  2. (longer term) optimizing the local dependency index and the new resolver in tandem.
    • Maybe this is the hubris of a beginner, but I am under the impression pip would be able to perform most resolutions in the blink of an eye if it didn't need to intersperse filesystem and/or network i/o between every step of the resolver's graph walk.
    • The reason PEP 658 works is that it cuts out the ceremony required to access the dependency relationships for a given installation candidate--I think if we continue to pare down the stuff we cache and retain from entire wheels to just the information we need for a resolve (similarly to discussions of what to include in pip install --report), we can turn this into more and more of a compute-bound rather than IO-bound problem.

Conclusion

I guess what I'm saying is: I'm really hopeful that pip resolves can be further improved, and I think that will directly leverage the fast-deps work, but you're right that PEP 658 should obviate the feature itself. However, as per the vague sketch above, I think that we might be introducing multiple other ways to virtualize the information we get from pypi, and fast-deps and LazyZipOverHTTP might not look that out of place in the end result.

That said, the claimed performance improvement leveraging fast-deps remains to be demonstrated, and if it doesn't end up going anywhere then it seems like pip would best serve its users by removing the experiment.

@cosmicexplorer
Copy link
Contributor

Just created #12184 to move forward on all the stuff in that comment instead of taking up space here.

@cosmicexplorer
Copy link
Contributor

Also take a look at #12208, which I hope makes a convincing enough case for instead turning on --use-feature=fast-deps by default.

@sbidoul
Copy link
Member

sbidoul commented Oct 1, 2023

I'm +1 for removing fast-deps. When PyPI will have backfilled PEP 658 metadata (tracked in pypi/warehouse#8254) it will not be useful anymore and removing it from the codebase will simplify reasoning about further optimizations.

@uranusjr
Copy link
Member

uranusjr commented Oct 3, 2023

Personally I feel like flag would be useful as a transition to the ”new” range fetch approach (mentioned above by @cosmicexplorer). After we merge the entire addition, we can use the usual use-feature/use-deprecated mechanism to enable it by default.

@sbidoul
Copy link
Member

sbidoul commented Oct 3, 2023

the ”new” range fetch approach

But do we need that in a world where PyPI has backfilled PEP 658 metadata?

@dholth
Copy link
Member

dholth commented Oct 3, 2023

This feature has new energy, and we don't live in a world where PyPI has backfilled PEP 658 metadata. Also, it can work in a simpler non-pypi index.

@pfmoore
Copy link
Member

pfmoore commented Oct 3, 2023

Do we have any indications from PyPI of the timescales for backfilling? Also, as @dholth mentions, fast-deps would be useful for indexes that don't support PEP 658. The pytorch binaries, for example, are served from their own index - does that support PEP 658? I just checked https://download.pytorch.org/whl/cu117, and it doesn't seem to...

@dstufft
Copy link
Member

dstufft commented Oct 3, 2023

We don't have a set schedule for backfilling, but we were talking about it last Friday and I believe we hope to do it soon. I think we may be waiting until a new version of packaging is released and we can pull in the packaging metadata API to validate the metadata at the same time (obviously invalid metadata we can't do anything about, but we can at least record which metadata is valid or invalid).

@pradyunsg
Copy link
Member Author

I think we may be waiting until a new version of packaging is released and we can pull in the packaging metadata API to validate the metadata at the same time (obviously invalid metadata we can't do anything about, but we can at least record which metadata is valid or invalid).

That came out last weekend, so this should be unblocked from that perspective I think. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: build logic Stuff related to metadata generation / wheel generation needs rebase or merge PR has conflicts with current master state: needs discussion This needs some more discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants