Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend pubgrub to match cargo's dependency resolution #110

Open
2 tasks
nikomatsakis opened this issue Jul 22, 2024 · 6 comments
Open
2 tasks

Extend pubgrub to match cargo's dependency resolution #110

nikomatsakis opened this issue Jul 22, 2024 · 6 comments
Assignees
Milestone

Comments

@nikomatsakis
Copy link
Contributor

Metadata
Owner(s) @Eh2406
Team(s) cargo
Goal document 2024h2/pubgrub-in-cargo

Summary

Implement a standalone library based on pubgrub that model's cargo dependency resolution and validate its accurate with testing against crates found on crates.io. This lays the groundwork for improved cargo error messages, extensions for hotly requested features (e.g., better MSRV support, CVE-aware resolution, etc), and support for a richer ecosystem of cargo extensions.

Tasks and status

  • Implementation work on pubgrub library ()
  • Discussion and moral support (cargo Team)
@nikomatsakis nikomatsakis added this to the 2024h2 milestone Jul 22, 2024
@rust-lang rust-lang locked and limited conversation to collaborators Jul 25, 2024
@nikomatsakis
Copy link
Contributor Author

This issue is intended for status updates only.

For general questions or comments, please contact the owner(s) directly.

@Eh2406
Copy link
Contributor

Eh2406 commented Jul 25, 2024

Almost daily updates are being posted on https://rust-lang.zulipchat.com/#narrow/stream/260232-t-cargo.2FPubGrub/topic/Progress.20report questions or feedback are absolutely welcome on that zulip stream.

@Eh2406
Copy link
Contributor

Eh2406 commented Jul 29, 2024

Since the writing of this project goal we now have achieved for all crate versions on crates.io the two resolvers accept the other one's solution. If cargo thinks there is a valid lock file for a crate version then we check that:

  1. The new PubGrub based resolver also thinks there is a valid solution for that crate version.
  2. If cargo resolver is loaded with only the packages PubGrub selected, cargo still finds a solution.
  3. If the PubGrub resolver is loaded with only the packages cargo resolver selected, PubGrub still finds a solution.
    Otherwise, when cargo thinks there is no valid solution, then the new PubGrub based resolver concurs. This effort involved fixing many bugs related to whether optional dependencies have been activated. This milestone was achieved much faster than expected!

In addition, significant progress has been made speeding up running these tests. Over 30% improvements to the average performance to the new PubGrub based resolver, and important changes to cargo resolver to better allow it to be run in parallel. Big thank you to the Cargo Team for reviewing and merging these changes to the existing resolver. Running resolution in parallel is important to working on this project goal, but not useful to cargo.

@Eh2406
Copy link
Contributor

Eh2406 commented Aug 22, 2024

There were some footnotes to last updates headline achievement "for all crate versions on crates.io the two resolvers accept the other one's solution", which all have now been addressed. There were some versions filtered out because the existing cargo resolver did not accept the record from crates.io. @tareknaser look into these cases and determined that filtering them out was the correct behavior, they also do not work when using cargo to import them as a dependency. Big thanks to him for the contribution! After resolution is complete cargo checks for cyclical dependencies. The new resolver was augmented to also have this check. Tests ensure that this behavior exactly matches between resolvers.

Performance has made another massive improvement. The new resolver is on average faster when processing all of crates.io. This is also true when including the handful of pathological crates. These pathological crates used to hit hardcoded timeouts and are now each under two minutes. All of crates.io (with a 160s timeout) took 4.30hr and now takes 2.98hr, ~0.30% improvement.

Most runs of cargo happen with lock file. Lock file performance needed a lot of improvement, and got it. Verifying the lock file of nonpathological cases on crates.io went from 61.44min to 44.90min, a ~27% improvement. (The lock files of pathological cases verify very quickly, I just happened not to record those numbers.) There are several approaches for closing the remaining gap with the existing resolver.

@Eh2406
Copy link
Contributor

Eh2406 commented Sep 4, 2024

In the past two weeks there have been some small performance improvements. If I accomplished other things I forgot to take notes on them.

@Eh2406
Copy link
Contributor

Eh2406 commented Sep 20, 2024

Work on this project is pivoted to trying to make improvements to the underlying pubgrub-rs library in light of the work on this goal. We found a 10% performance win for checking lock files. By betting that things are going to work smoothly until the first time we have to backtrack. When checking the lock file, or if we are getting lucky, complications will not occur so it is a waste of effort to avoid them.

A long-standing issue with pubgrub-rs is an inefficiency when handling situations where several versions of a package are unavailable. The inefficiency leaks out as a error message with an explanation of each version that's unavailable. Instead we would prefer an error message with one line explaining that all the versions are unavailable for the same reason. Several attempts been made to fix this issue, all unsuccessful. One possible reason may have to do with a poorly understood change in behavior introduced early in the project. I read through all history of the project discussions to better understand what was known about the change. I created branches from old versions of the code to exactly identify changes in behavior. Now that we know how to bring back the original behavior we have to determine if it is better. By investigating the unsuccessful fix on top of the old behavior I was able to figure out why the fix did not work.

Meanwhile the tool has had some important quality of life improvements. Clap is now used to control what parts of the experiment actually need to be run. Useful when a change can only affect some of the behavior. Also it now works on Windows, and thanks to mimalloc with reasonable multithreaded performance. Performance improvements were made to reading the index and preparing crates as a resolution problem, time I spend waiting that is not part of the reported benchmark numbers.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants