use intermediate satisfier causes in priority statistics #291

Eh2406 · 2024-12-05T23:56:54Z

This provides more information to the DP::prioritize call. Inspired by @x-hgg-x investigation #274 (comment), it provides the number of times the package has been involved in any backtracking conflict. Inspired by UV's prioritization, it provides the order the package was discovered. When using Priority = (conflict_count, Reverse<matched versions>, Reverse<discovery_order>) Resolving all crates (including Solana) on 10 threads went from 162.45min to 88.96min! The slowest individual crate went from [email protected] 95s to [email protected] 21s! (Note that the runtime for any individual crate may vary run to run due to system load and scheduling.)

The new fields are hidden behind a PackageResolutionStatistics with getters, to make it easier to add more in the future without additional breaking changes. Getters are used in case some value we want to return is expensive to calculate and so we only want to calculate it if someone can read it. I attempted to use that pattern to avoid the hash map look up for conflict_count, but then PackageResolutionStatistics ends up generic over P. Which seemed unpleasant. If people have other suggestions for how to future proof this API I'm all ears!

Theoretically the package name and range could also become getters on PackageResolutionStatistics. But I consider those two exponentially more important and more useful than the ones where adding now. Deserving of always being computed and getting dedicated API surface. But opinions may vary.

codspeed-hq · 2024-12-06T00:04:08Z

CodSpeed Performance Report

Merging #291 will not alter performance

_{Comparing Eh2406:conflict_count (5ff396e) with dev (a8e2ba6)}

Summary

✅ 6 untouched benchmarks

src/provider.rs

examples/caching_dependency_provider.rs

src/solver.rs

Eh2406 · 2024-12-10T16:35:17Z

@konstin what do you think? Do you want me to reduce a benchmark where this makes a big difference?

konstin

No idea how they'll benchmark on uv (as in, will we see a perf hit or not?), but i like giving users the information and the better default prioritize.

konstin · 2024-12-10T22:08:31Z

src/solver.rs

+    /// it is having the most problems with.
+    ///
+    /// Note: The exact values depend on implementation details of PubGrub. So should not be relied on and may change.
+    pub fn conflict_count(&self) -> u32 {


This may be very useful for something i'm writing right now 👀

I'm tracking two different properties atm: Which package previously chosen package A is conflicting with the currently chosen package B, because i want to deprioritize A (it's always making the latter choices fail) while prioritizing B (if i pick B first, I have a good chance at getting the right version of A, without walking through all versions of B first), which requires some mechanism of knowing which is which.

konstin · 2024-12-10T22:11:38Z

src/solver.rs

+    /// Prioritizing based on this value directly allows the resolver to focus on the packages
+    /// it is having the most problems with.
+    ///
+    /// Note: The exact values depend on implementation details of PubGrub. So should not be relied on and may change.


We should be clearer about what this value means. Something like:

Every time an incompatibility is fulfilled, the package involved are marked as conflicting. PubGrub builds incompatibilities lazily, and there construction may change over time, so the value may also change across PubGrub releases.

I don't know that a user would know what a incompatibility is. I'm not finding a good balance between too much detail and explaining why things could change.

I think a consumer of the library probably needs to know what an incompatibility is, right?

Not every user. But one that start playing with conflict counts should definitely IMO.

We can use a two-tiered explanantion: For basic users, this number is a measure for conflictingn-ess, the higher it is, the more you want to prioritize the package. For advanced users, it's helpful to know that this is the number of incompatibilities that were satisfied that this package was involved in, so a user can implement heuristics on top of it (as we want in uv), e.g. weighing of bounds-strictness vs. conflicting-ness.

konstin · 2024-12-10T22:21:57Z

src/solver.rs

+impl PackageResolutionStatistics {
+    fn new<P: Package>(pid: Id<P>, conflict_count: &Map<Id<P>, u32>) -> Self {
+        Self {
+            discovery_order: pid.into_raw() as u32,


If you manually call alloc in some order, e.g. for dependencies before your actual package for a pre-fetching related reason, would that break this order?

Technically it's the order alloc is called. So the actual values would certainly be affected by prefetching. However, prioritizes is only be called on packages that are available for decision-making, so the impact on prioritization should not be dramatically affected.

x-hgg-x · 2024-12-13T10:26:50Z

Counting the root cause instead of the first incompatibility is around 5% faster in CPU time for my implementation (https://github.com/x-hgg-x/pubgrub-bench) to resolve all crates (2789s -> 2663s), and 12% faster to resolve solana crates (385s -> 341s).

arielb1 · 2024-12-15T19:46:56Z

The latest solver change doesn't 100% fix the problem:

       Bucket:[email protected]/dep:ed25519-dalek SemverPubgrub { norml: 1.18.23 | 1.18.25 | 1.18.26, pre: <E2><88><85> }  depends on Bucket:[email protected]/default=true SemverPubgrub { norml: >=1.0.1, <1.0.2, pre: <E2><88><85> }
[2024-12-15T19:42:11Z INFO  pubgrub::internal::core] backtrack to DecisionLevel(2)
[2024-12-15T19:42:11Z INFO  pubgrub::solver] DP chose: Id::<Names>(46) = 'Bucket:[email protected]/default=true' @ Some(Version { major: 1, minor: 18, patch: 22 })
[2024-12-15T19:42:11Z INFO  pubgrub::internal::partial_solution] add_decision: Id::<Names>(46) @ 1.18.22
[2024-12-15T19:42:11Z INFO  pubgrub::solver] unit_propagation: Id::<Names>(46) = 'Bucket:[email protected]/default=true'
[2024-12-15T19:42:11Z INFO  pubgrub::solver] DP chose: Id::<Names>(279) = 'Bucket:[email protected]/default' @ Some(Version { major: 1, minor: 0, patch: 0, pre: Prerelease("pre.3") })
[2024-12-15T19:42:11Z INFO  pubgrub::solver] add_decision (not first time): Id::<Names>(279) = 'Bucket:[email protected]/default' @ 1.0.0-pre.3
[2024-12-15T19:42:11Z INFO  pubgrub::solver] unit_propagation: Id::<Names>(279) = 'Bucket:[email protected]/default'
[2024-12-15T19:42:11Z INFO  pubgrub::solver] DP chose: Id::<Names>(280) = 'Bucket:[email protected]/u64_backend' @ Some(Version { major: 1, minor: 0, patch: 0, pre: Prerelease("pre.3") })[2024-12-15T19:42:11Z INFO  pubgrub::solver] add_decision (not first time): Id::<Names>(280) = 'Bucket:[email protected]/u64_backend' @ 1.0.0-pre.3
[2024-12-15T19:42:11Z INFO  pubgrub::solver] unit_propagation: Id::<Names>(280) = 'Bucket:[email protected]/u64_backend'
[2024-12-15T19:42:11Z INFO  pubgrub::solver] DP chose: Id::<Names>(281) = 'Bucket:[email protected]/std' @ Some(Version { major: 1, minor: 0, patch: 0, pre: Prerelease("pre.3") })
[2024-12-15T19:42:11Z INFO  pubgrub::solver] add_decision (not first time): Id::<Names>(281) = 'Bucket:[email protected]/std' @ 1.0.0-pre.3
[2024-12-15T19:42:11Z INFO  pubgrub::solver] unit_propagation: Id::<Names>(281) = 'Bucket:[email protected]/std'
[2024-12-15T19:42:11Z INFO  pubgrub::solver] DP chose: Id::<Names>(284) = 'Bucket:[email protected]/rand' @ Some(Version { major: 1, minor: 0, patch: 0, pre: Prerelease("pre.3") })
[2024-12-15T19:42:11Z INFO  pubgrub::solver] add_decision (not first time): Id::<Names>(284) = 'Bucket:[email protected]/rand' @ 1.0.0-pre.3
[2024-12-15T19:42:11Z INFO  pubgrub::solver] unit_propagation: Id::<Names>(284) = 'Bucket:[email protected]/rand'
[2024-12-15T19:42:11Z INFO  pubgrub::solver] DP chose: Id::<Names>(286) = 'Bucket:[email protected]/dep:rand' @ Some(Version { major: 1, minor: 0, patch: 0, pre: Prerelease("pre.3") })
[2024-12-15T19:42:11Z INFO  pubgrub::solver] add_decision (not first time): Id::<Names>(286) = 'Bucket:[email protected]/dep:rand' @ 1.0.0-pre.3
[2024-12-15T19:42:11Z INFO  pubgrub::solver] unit_propagation: Id::<Names>(286) = 'Bucket:[email protected]/dep:rand'
[2024-12-15T19:42:11Z INFO  pubgrub::solver] DP chose: Id::<Names>(355) = 'Bucket:[email protected]/default' @ Some(Version { major: 1, minor: 18, patch: 22 })
[2024-12-15T19:42:11Z INFO  pubgrub::internal::partial_solution] add_decision: Id::<Names>(355) @ 1.18.22
[2024-12-15T19:42:11Z INFO  pubgrub::solver] unit_propagation: Id::<Names>(355) = 'Bucket:[email protected]/default'
[2024-12-15T19:42:11Z INFO  pubgrub::solver] DP chose: Id::<Names>(356) = 'Bucket:[email protected]/full' @ Some(Version { major: 1, minor: 18, patch: 22 })
[2024-12-15T19:42:11Z INFO  pubgrub::internal::partial_solution] add_decision: Id::<Names>(356) @ 1.18.22
[2024-12-15T19:42:11Z INFO  pubgrub::solver] unit_propagation: Id::<Names>(356) = 'Bucket:[email protected]/full'
[2024-12-15T19:42:11Z INFO  pubgrub::solver] DP chose: Id::<Names>(357) = 'Bucket:[email protected]/byteorder' @ Some(Version { major: 1, minor: 18, patch: 22 })
[2024-12-15T19:42:11Z INFO  pubgrub::internal::partial_solution] add_decision: Id::<Names>(357) @ 1.18.22
[2024-12-15T19:42:11Z INFO  pubgrub::solver] unit_propagation: Id::<Names>(357) = 'Bucket:[email protected]/byteorder'
[2024-12-15T19:42:11Z INFO  pubgrub::solver] DP chose: Id::<Names>(358) = 'Bucket:[email protected]/ed25519-dalek-bip32' @ Some(Version { major: 1, minor: 18, patch: 22 })
[2024-12-15T19:42:11Z INFO  pubgrub::internal::partial_solution] add_decision: Id::<Names>(358) @ 1.18.22
[2024-12-15T19:42:11Z INFO  pubgrub::solver] unit_propagation: Id::<Names>(358) = 'Bucket:[email protected]/ed25519-dalek-bip32'
[2024-12-15T19:42:11Z INFO  pubgrub::solver] DP chose: Id::<Names>(359) = 'Bucket:[email protected]/ed25519-dalek' @ Some(Version { major: 1, minor: 18, patch: 22 })
[2024-12-15T19:42:11Z INFO  pubgrub::internal::partial_solution] add_decision: Id::<Names>(359) @ 1.18.22
[2024-12-15T19:42:11Z INFO  pubgrub::solver] unit_propagation: Id::<Names>(359) = 'Bucket:[email protected]/ed25519-dalek'
[2024-12-15T19:42:11Z INFO  pubgrub::solver] DP chose: Id::<Names>(373) = 'Bucket:[email protected]/dep:ed25519-dalek' @ Some(Version { major: 1, minor: 18, patch: 22 })
[2024-12-15T19:42:11Z INFO  pubgrub::internal::partial_solution] not adding Id::<Names>(373) @ 1.18.22 because of its dependencies
[2024-12-15T19:42:11Z INFO  pubgrub::solver] unit_propagation: Id::<Names>(373) = 'Bucket:[email protected]/dep:ed25519-dalek'
[2024-12-15T19:42:11Z INFO  pubgrub::internal::core] Start conflict resolution because incompat satisfied:
       Bucket:[email protected]/dep:ed25519-dalek SemverPubgrub { norml: 1.18.22 | 1.18.23 | 1.18.25 | 1.18.26, pre: <E2><88><85> }  depends on Bucket:[email protected]/default=true SemverPubgrub { norml: >=1.0.1, <1.0.2, pre: <E2><88><85> }

I would expect it to be able to prioritize this chain without "wandering":

[email protected]/default=true -> Bucket:[email protected]/default -> [email protected]/full -> [email protected]/ed25519-dalek -> [email protected]/dep:ed25519-dalek

The "root cause" it finds is

[2024-12-15T19:44:45Z INFO  pubgrub::solver] DP chose: Id::<Names>(39) = 'Bucket:[email protected]/default=true'
[2024-12-15T19:44:45Z INFO  pubgrub::solver] DP chose: Id::<Names>(46) = 'Bucket:[email protected]/default=true'

Which is insufficient since it could find the rest of the packages in the chain to load them.

Whenever we either discard a version due to its dependencies or perform conflict resolution, we return the last conflict that led to discarding them. In cargo, we use this information for prioritization, which speeds up resolution (`cargo run -r -- -m pub --with-solana --filter solana-archiver-lib -t 16` goes from 90s to 20s on my machine). Configurations that are noticeably slower for the solana test case: * All incompatibilities unit propagation * Only the last root cause in unit propagation * No incompatibilities from unit propagation * No incompatibilities from `add_version` * Only affect counts (without culprit counts) * Backtracking with the same heuristic as astral-sh/uv#9843 (backtracking once after affected hits 5) In uv, we use this to re-prioritize and backtrack when a package decision accumulated to many conflicts. Since we have our own solver loop, we add the incompatibility to our own tracking instead. Built on #291 ## Benchmarks Main: ``` index commit hash: 82086e46740d7a9303216bfac093e7268a95121f index commit time: 2024-11-30T18:18:14Z index size: 32 solana in index: 32 Pub CPU time: 1215.49s == 20.26min Cargo CPU time: skipped Cargo check lock CPU time: skipped Pub check lock CPU time: skipped Wall time: 80.58s == 1.34min ``` With #291: ``` index commit hash: 82086e46740d7a9303216bfac093e7268a95121f index commit time: 2024-11-30T18:18:14Z index size: 32 solana in index: 32 Pub CPU time: 467.73s == 7.80min Cargo CPU time: skipped Cargo check lock CPU time: skipped Pub check lock CPU time: skipped Wall time: 34.76s == 0.58min ``` This PR: ``` index commit hash: 82086e46740d7a9303216bfac093e7268a95121f index commit time: 2024-11-30T18:18:14Z index size: 32 solana in index: 32 Pub CPU time: 271.79s == 4.53min Cargo CPU time: skipped Cargo check lock CPU time: skipped Pub check lock CPU time: skipped Wall time: 20.17s == 0.34min ```

Whenever we either discard a version due to its dependencies or perform conflict resolution, we return the last conflict that led to discarding them. In cargo, we use this information for prioritization, which speeds up resolution (`cargo run -r -- -m pub --with-solana --filter solana-archiver-lib -t 16` goes from 90s to 20s on my machine). Configurations that are noticeably slower for the solana test case: * All incompatibilities unit propagation * Only the last root cause in unit propagation * No incompatibilities from unit propagation * No incompatibilities from `add_version` * Only affect counts (without culprit counts) * Backtracking with the same heuristic as astral-sh/uv#9843 (backtracking once after affected hits 5) In uv, we use this to re-prioritize and backtrack when a package decision accumulated to many conflicts. Since we have our own solver loop, we add the incompatibility to our own tracking instead. Built on pubgrub-rs#291 ## Benchmarks Main: ``` index commit hash: 82086e46740d7a9303216bfac093e7268a95121f index commit time: 2024-11-30T18:18:14Z index size: 32 solana in index: 32 Pub CPU time: 1215.49s == 20.26min Cargo CPU time: skipped Cargo check lock CPU time: skipped Pub check lock CPU time: skipped Wall time: 80.58s == 1.34min ``` With pubgrub-rs#291: ``` index commit hash: 82086e46740d7a9303216bfac093e7268a95121f index commit time: 2024-11-30T18:18:14Z index size: 32 solana in index: 32 Pub CPU time: 467.73s == 7.80min Cargo CPU time: skipped Cargo check lock CPU time: skipped Pub check lock CPU time: skipped Wall time: 34.76s == 0.58min ``` This PR: ``` index commit hash: 82086e46740d7a9303216bfac093e7268a95121f index commit time: 2024-11-30T18:18:14Z index size: 32 solana in index: 32 Pub CPU time: 271.79s == 4.53min Cargo CPU time: skipped Cargo check lock CPU time: skipped Pub check lock CPU time: skipped Wall time: 20.17s == 0.34min ```

This PR is the child of #36 and pubgrub-rs#291, providing an implementation that works for both cargo and uv. Upstream PR: pubgrub-rs#298. Specifically, we use the returned incompatibility in astral-sh/uv#9843, but not `PackageResolutionStatistics`. --- Whenever we either discard a version due to its dependencies or perform conflict resolution, we return the last conflict that led to discarding them. In cargo, we use this information for prioritization, which speeds up resolution (`cargo run -r -- -m pub --with-solana --filter solana-archiver-lib -t 16` goes from 90s to 20s on my machine). Configurations that are noticeably slower for the solana test case: * All incompatibilities unit propagation * Only the last root cause in unit propagation * No incompatibilities from unit propagation * No incompatibilities from `add_version` * Only affect counts (without culprit counts) * Backtracking with the same heuristic as astral-sh/uv#9843 (backtracking once after affected hits 5) In uv, we use this to re-prioritize and backtrack when a package decision accumulated to many conflicts. Since we have our own solver loop, we add the incompatibility to our own tracking instead. Built on pubgrub-rs#291 ## Benchmarks Main: ``` index commit hash: 82086e46740d7a9303216bfac093e7268a95121f index commit time: 2024-11-30T18:18:14Z index size: 32 solana in index: 32 Pub CPU time: 1215.49s == 20.26min Cargo CPU time: skipped Cargo check lock CPU time: skipped Pub check lock CPU time: skipped Wall time: 80.58s == 1.34min ``` With pubgrub-rs#291: ``` index commit hash: 82086e46740d7a9303216bfac093e7268a95121f index commit time: 2024-11-30T18:18:14Z index size: 32 solana in index: 32 Pub CPU time: 467.73s == 7.80min Cargo CPU time: skipped Cargo check lock CPU time: skipped Pub check lock CPU time: skipped Wall time: 34.76s == 0.58min ``` This PR: ``` index commit hash: 82086e46740d7a9303216bfac093e7268a95121f index commit time: 2024-11-30T18:18:14Z index size: 32 solana in index: 32 Pub CPU time: 271.79s == 4.53min Cargo CPU time: skipped Cargo check lock CPU time: skipped Pub check lock CPU time: skipped Wall time: 20.17s == 0.34min ```

Eh2406 · 2024-12-16T20:22:09Z

Having spent the morning staring at the logs @arielb1 found. It's (now) clear that one of the problems here is a difference in perspective from before and after things are "loaded". Once a conflict is found, we are in the "after loading" perspective where Bucket:[email protected]/default=true is incompatible with the prior decision for Bucket:[email protected]/default=true is an obvious implication of Bucket:[email protected]/default=true depends on ... depends on Bucket:[email protected]/default=true. Because the entire chain is built out of singleton requirements (that can only match one thing), unit_propagation would be able to see that they are the same. Therefore, root_cause is generated as the shorter and clearer version. Similarly, because wearing the "after loaded" perspective we can see that we need to do one back jump all the way to Bucket:[email protected]/default=true, instead of back jumping one (inevitable) decision at a time. But after that background, we are now looking at a new version of Bucket:[email protected]/default=true and so are in the "before loading" situation. So the simplified version does not have enough information for us to tell which of these not yet evaluated packages actually reduce down to the problem.

That would suggest that counting the packages that were involved in each step of the loop in conflict_resolution would be more effective. It didn't work yesterday, but none of my performance numbers are looking the same as yesterday. So I should try again.

src/internal/core.rs

* Return and track affected and culprit on conflicts Whenever we either discard a version due to its dependencies or perform conflict resolution, we return the last conflict that led to discarding them. In cargo, we use this information for prioritization, which speeds up resolution (`cargo run -r -- -m pub --with-solana --filter solana-archiver-lib -t 16` goes from 90s to 20s on my machine). Configurations that are noticeably slower for the solana test case: * All incompatibilities unit propagation * Only the last root cause in unit propagation * No incompatibilities from unit propagation * No incompatibilities from `add_version` * Only affect counts (without culprit counts) * Backtracking with the same heuristic as astral-sh/uv#9843 (backtracking once after affected hits 5) In uv, we use this to re-prioritize and backtrack when a package decision accumulated to many conflicts. Since we have our own solver loop, we add the incompatibility to our own tracking instead. Built on #291 ## Benchmarks Main: ``` index commit hash: 82086e46740d7a9303216bfac093e7268a95121f index commit time: 2024-11-30T18:18:14Z index size: 32 solana in index: 32 Pub CPU time: 1215.49s == 20.26min Cargo CPU time: skipped Cargo check lock CPU time: skipped Pub check lock CPU time: skipped Wall time: 80.58s == 1.34min ``` With #291: ``` index commit hash: 82086e46740d7a9303216bfac093e7268a95121f index commit time: 2024-11-30T18:18:14Z index size: 32 solana in index: 32 Pub CPU time: 467.73s == 7.80min Cargo CPU time: skipped Cargo check lock CPU time: skipped Pub check lock CPU time: skipped Wall time: 34.76s == 0.58min ``` This PR: ``` index commit hash: 82086e46740d7a9303216bfac093e7268a95121f index commit time: 2024-11-30T18:18:14Z index size: 32 solana in index: 32 Pub CPU time: 271.79s == 4.53min Cargo CPU time: skipped Cargo check lock CPU time: skipped Pub check lock CPU time: skipped Wall time: 20.17s == 0.34min ``` * Use smallvec for root causes * Add more docs * Review

Eh2406 · 2024-12-19T21:08:53Z

Rebased, which removed most of the code. Added in extensively wordy comment. I'm not so sure about the quality of this comment. It may have too many words and say too little. Edits are welcome.

Overall performance, measured with cargo r -r -- --with-solana -m pub -t 36 --filter solana went from Pub CPU time: 29.64min to 21.98min. --filter solana-archiver-lib went from 2.78min to 2.46min.

konstin · 2024-12-19T22:04:49Z

I'm seeing a massive speedup with solana-archiver-lib. I'm testing with 8 threads since 16 overflow with dev.

Before (dev, 3bef331)

$ cargo run -r -- -m pub --with-solana --filter solana-archiver-lib -t 8
Running in mode Pub on 8 rayon threads.
        index commit hash: 82086e46740d7a9303216bfac093e7268a95121f
        index commit time: 2024-11-30T18:18:14Z
               index size: 32
          solana in index: 32
             Pub CPU time:   257.76s ==   4.30min
           Cargo CPU time: skipped
Cargo check lock CPU time: skipped
  Pub check lock CPU time: skipped
                Wall time:    37.05s ==   0.62min

After (PR, e08db34)

$ cargo run -r -- -m pub --with-solana --filter solana-archiver-lib -t 8
Running in mode Pub on 8 rayon threads.
        index commit hash: 82086e46740d7a9303216bfac093e7268a95121f
        index commit time: 2024-11-30T18:18:14Z
               index size: 32
          solana in index: 32
             Pub CPU time:   143.82s ==   2.40min
           Cargo CPU time: skipped
Cargo check lock CPU time: skipped
  Pub check lock CPU time: skipped
                Wall time:    20.17s ==   0.34min

It's an improvement in both measurements, thought I'm getting a very different speedup.

Counterintuitively, the should cancel count is way worse while the runtime performance improves. Is there a good non-walltime metric?

Patch:

    type Priority = (u32, Reverse<usize>);

    fn prioritize(
        &self,
        package: &Names<'c>,
        range: &RcSemverPubgrub,
        package_conflicts_counts: &PackageResolutionStatistics,
    ) -> Self::Priority {
        let matches_count = match package {
            Names::Links(_name) => {
                // PubGrub automatically handles when any requirement has no overlap. So this is only deciding a importance of picking the version:
                //
                // - If it only matches one thing, then adding the decision with no additional dependencies makes no difference.
                // - If it can match more than one thing, and it is entirely equivalent to picking the packages directly which would make more sense to the users.
                //
                // So only rubberstamp links attributes when all other decisions are made, by setting the priority as low as it will go.
                usize::MAX
            }

            Names::Wide(_, req, _, _) => self.count_wide_matches(range, &package.crate_(), req),
            Names::WideFeatures(_, req, _, _, _) | Names::WideDefaultFeatures(_, req, _, _) => self
                .count_wide_matches(range, &package.crate_(), req)
                .saturating_add(1),

            Names::Bucket(_, _, _) => self.count_matches(range, &package.crate_()),
            Names::BucketFeatures(_, _, _) | Names::BucketDefaultFeatures(_, _) => self
                .count_matches(range, &package.crate_())
                .saturating_add(1),
        };
        (
            package_conflicts_counts.conflict_count(),
            Reverse(matches_count),
        )
    }

(I'll still have to give the updated PR a proper read)

examples/caching_dependency_provider.rs

Eh2406 · 2024-12-20T00:33:31Z

I'm seeing a massive speedup with solana-archiver-lib. I'm testing with 8 threads since 16 overflow with dev.

Has memory consumption gotten worse? 16 was working for you before, right? I will need to come back to the oddly high memory usage.

It's an improvement in both measurements, thought I'm getting a very different speedup.

I mostly watch the CPU time. But it is odd how different % improved is between my computer and yours.

Counterintuitively, the should cancel count is way worse while the runtime performance improves. Is there a good non-walltime metric?

I have not found one yet. Sorry.

Patch

And I forgot to push the code I was running before I logged off the work computer. Logged back on to push this for you. I was using Eh2406/pubgrub-crates-benchmark@80048df

Eh2406 · 2024-12-20T01:13:46Z

cargo run -r -- -m pub --with-solana --filter solana-archiver-lib -t 8

with 3bef331 with Eh2406/pubgrub-crates-benchmark@80048df

             Pub CPU time:   143.91s ==   2.40min
                Wall time:    18.97s ==   0.32min

with e08db34 with Eh2406/pubgrub-crates-benchmark@80048df

             Pub CPU time:   126.50s ==   2.11min
                Wall time:    16.67s ==   0.28min

with 3bef331 and your patch

             Pub CPU time:   437.86s ==   7.30min
                Wall time:    63.07s ==   1.05min

with e08db34 with your patch

             Pub CPU time:   251.98s ==   4.20min
                Wall time:    35.26s ==   0.59min

konstin · 2024-12-20T08:27:56Z

With Eh2406/pubgrub-crates-benchmark@80048df i'm getting the same numbers too now :)

)

Eh2406 · 2024-12-20T16:04:17Z

Feel free to merge when you had a chance to review the code.

konstin · 2024-12-20T09:04:46Z

src/internal/core.rs

+    /// This will prevent us going down this path again. However when we start looking at version 2 of A,
+    /// and discover that it depends on version 2 of B, we will want to prioritize the chain of intermediate steps
+    /// to confirm if it has a problem with the same shape.
+    /// The `satisfier_causes` argument keeps track of these intermediate steps so that the caller can use.


Tried some editing:

Return the root cause or the terminal incompatibility. CF <https://github.com/dart-lang/pub/blob/master/doc/solver.md#unit-propagation> When we found a conflict, we want to learn as much as possible from it, to avoid making (or keeping) decisions that will be rejected. Say we found that the decision for X and the decision for Y are incompatible. We may find that the decisions on earlier packages B and C require us to make incompatible decisions on X and Y, so we backtrack until either B or C can be revisited. To make it practical, we really only need one of the terms to be a decision. We may as well leave the other terms general. Something like "the dependency on the package X is incompatible with the decision on C" tends to work out pretty well. Then if A turns out to also have a dependency on X the resulting root cause is still useful. Of course, this is more heuristics than science. If the output is too general, then `unit_propagation` will handle the confusion by calling us again with the next most specific conflict it comes across. If the output is too specific, then the outer `solver` loop will eventually end up calling us again until all possibilities are enumerated. To end up with a more useful incompatibility, this function combines incompatibilities into derivations. Fulfilling this derivation implies the later conflict. By banning it, we prevent the intermediate steps from occurring again, at least in the exact same way. However, the statistics collected for `prioritize` may want to analyze those intermediate steps. For example we might start with "there is no version 1 of Z", and `conflict_resolution` may be able to determine that "that was inevitable when we picked version 1 of X" which was inevitable when we picked W and so on, until version 1 of B, which was depended on by version 1 of A. Therefore the root cause may simplify all the way down to "we cannot pick version 1 of A". This will prevent us going down this path again. However when we start looking at version 2 of A, and discover that it depends on version 2 of B, we will want to prioritize the chain of intermediate steps to check if it has a problem with the same shape. The `satisfier_causes` argument keeps track of these intermediate steps so that the caller can use them for prioritization.

Then if A turns out to also have a dependency on X the resulting root cause is still
useful.

How will we apply the root cause?

If the output is too general, then unit_propagation will handle the confusion by calling
us again with the next most specific conflict it comes across. If the output is too
specific, then the outer solver loop will eventually end up calling us again until all
possibilities are enumerated.

In which direction are "too general" and "too specific" to be understood here?

Overall I like the edits. I have a nit about

Say we found that the decision for X and the decision for Y are incompatible.

I'm going to quibble about the term decision in this sentence. In my brain decision implies something that became constrained because of a call to choose_version/add_decision. Because we check for conflicts in add_version simple cases end up handled there. We generally do not end up here with a conflict that is already in terms of decisions. Instead We start with something abstract like X (in some range) depends on Y (in some range). It's the job of this function to peel back the layers, figure out why the partial solution has an assignment (which is not necessarily decision) that is disjoint with the mentioned range for Y and why the partial solution has an assignment that is a subset of the mentioned range for X.

How will we apply the root cause?

When we discover that A has a dependency on X unit_propagation will rescan the root cause
"the dependency on the package X is incompatible with the decision on C", and remove that version of C from the available range stored in partial solution.

In which direction are "too general" and "too specific" to be understood here?

I intended for to specific to mean referring to exact decided versions. "B == 123 is incompatible with C == 456" would be very specific. Extremely actionable, unit_propagation needs to do very little work to figure out if it's relevant. But also not likely to be reusable. We are likely to end up with O(# versions of B * # versions of C) of these incompatibilities laying around. "any version of X is incompatible with any version of Y" would be very general. There will only ever need to be one, and it will be relevant and useful any time X or Y come up again. But, it takes a lot of work to figure out how it relates to decisions or where we need to backtrack to.

Obviously the wording of the paragraphs needs work, thank you for helping me with it!

konstin · 2024-12-20T09:05:51Z

src/internal/core.rs

    #[allow(clippy::type_complexity)]
    #[cold]
    fn conflict_resolution(
        &mut self,
        incompatibility: IncompDpId<DP>,
+        satisfier_causes: &mut SmallVec<(Id<DP::P>, IncompDpId<DP>)>,


If this turns out to show up in anyone's profile while they don't use the output, we can replace it by a callback, but for now

konstin · 2024-12-20T16:21:21Z

Code looks good, we can iterate over the comment now or later, it's not blocking merging

Eh2406 · 2024-12-26T22:25:42Z

I accidentally did some benchmarking without this commit because I forgot we were still discussing comment. I took your language, and changed a few little things. I'm totally open to a follow-up PR to improve the comment, But for now I'm going to merge so I don't mess up my numbers again.

konstin · 2025-01-02T10:09:12Z

src/provider.rs

+            .get(package)
+            .map(|versions| versions.keys().filter(|v| range.contains(v)).count())
+            .unwrap_or(0);
+        if version_count == 0 {


Does 0 mean that there is a conflict?

Yes, if we have 0 version_count then get_dependencies will be Unavailable.

Eh2406 force-pushed the conflict_count branch from 8255e83 to b9b369d Compare December 6, 2024 03:44

x-hgg-x reviewed Dec 6, 2024

View reviewed changes

src/provider.rs Outdated Show resolved Hide resolved

x-hgg-x reviewed Dec 6, 2024

View reviewed changes

examples/caching_dependency_provider.rs Outdated Show resolved Hide resolved

x-hgg-x reviewed Dec 6, 2024

View reviewed changes

src/solver.rs Outdated Show resolved Hide resolved

konstin approved these changes Dec 10, 2024

View reviewed changes

Eh2406 force-pushed the conflict_count branch from 23f7e30 to dd3889d Compare December 12, 2024 20:35

This comment was marked as outdated.

Sign in to view

konstin mentioned this pull request Dec 16, 2024

Return and track affected and culprit on conflicts #298

Merged

konstin mentioned this pull request Dec 16, 2024

Return and track affected and culprit on conflicts astral-sh/pubgrub#39

Merged

Eh2406 force-pushed the conflict_count branch 2 times, most recently from 61c8aee to 6502125 Compare December 18, 2024 17:30

Eh2406 commented Dec 18, 2024

View reviewed changes

src/internal/core.rs Outdated Show resolved Hide resolved

Eh2406 mentioned this pull request Dec 18, 2024

when priorities are equal do breadth first search #299

Merged

Eh2406 force-pushed the conflict_count branch from 6502125 to e08db34 Compare December 19, 2024 21:00

Eh2406 changed the title ~~provide more data for prioritization~~ use intermediate satisfier causes in priority statistics Dec 19, 2024

Eh2406 commented Dec 20, 2024

View reviewed changes

examples/caching_dependency_provider.rs Outdated Show resolved Hide resolved

Eh2406 force-pushed the conflict_count branch from e08db34 to 49f3f9d Compare December 20, 2024 01:16

konstin pushed a commit to astral-sh/pubgrub that referenced this pull request Dec 20, 2024

use intermediate satisfier causes in priority statistics (pubgrub-rs#291

6a77c92

)

konstin approved these changes Dec 20, 2024

View reviewed changes

Eh2406 added 2 commits December 26, 2024 17:25

use intermediate satisfier causes in priority statistics

db77f9b

better wording for the comment

5ff396e

Eh2406 force-pushed the conflict_count branch from f6bebb6 to 5ff396e Compare December 26, 2024 22:25

Eh2406 enabled auto-merge December 26, 2024 22:26

Eh2406 added this pull request to the merge queue Dec 26, 2024

Merged via the queue into pubgrub-rs:dev with commit 7a59690 Dec 26, 2024
5 checks passed

Eh2406 deleted the conflict_count branch December 26, 2024 22:28

konstin reviewed Jan 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use intermediate satisfier causes in priority statistics #291

use intermediate satisfier causes in priority statistics #291

Eh2406 commented Dec 5, 2024 •

edited

Loading

codspeed-hq bot commented Dec 6, 2024 •

edited

Loading

Eh2406 commented Dec 10, 2024

konstin left a comment

konstin Dec 10, 2024

konstin Dec 10, 2024

Eh2406 Dec 11, 2024

zanieb Dec 11, 2024

mpizenberg Dec 11, 2024

konstin Dec 12, 2024

konstin Dec 10, 2024

Eh2406 Dec 11, 2024

x-hgg-x commented Dec 13, 2024

This comment was marked as outdated.

arielb1 commented Dec 15, 2024

Eh2406 commented Dec 16, 2024

Eh2406 commented Dec 19, 2024

konstin commented Dec 19, 2024

Eh2406 commented Dec 20, 2024 •

edited

Loading

Eh2406 commented Dec 20, 2024 •

edited

Loading

konstin commented Dec 20, 2024

Eh2406 commented Dec 20, 2024

konstin Dec 20, 2024

Eh2406 Dec 20, 2024

konstin Dec 20, 2024

konstin commented Dec 20, 2024

Eh2406 commented Dec 26, 2024

konstin Jan 2, 2025

Eh2406 Jan 2, 2025

use intermediate satisfier causes in priority statistics #291

use intermediate satisfier causes in priority statistics #291

Conversation

Eh2406 commented Dec 5, 2024 • edited Loading

codspeed-hq bot commented Dec 6, 2024 • edited Loading

Merging #291 will not alter performance

Summary

Eh2406 commented Dec 10, 2024

konstin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

x-hgg-x commented Dec 13, 2024

This comment was marked as outdated.

arielb1 commented Dec 15, 2024

Eh2406 commented Dec 16, 2024

Eh2406 commented Dec 19, 2024

konstin commented Dec 19, 2024

Eh2406 commented Dec 20, 2024 • edited Loading

Eh2406 commented Dec 20, 2024 • edited Loading

konstin commented Dec 20, 2024

Eh2406 commented Dec 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

konstin commented Dec 20, 2024

Eh2406 commented Dec 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Eh2406 commented Dec 5, 2024 •

edited

Loading

codspeed-hq bot commented Dec 6, 2024 •

edited

Loading

Eh2406 commented Dec 20, 2024 •

edited

Loading

Eh2406 commented Dec 20, 2024 •

edited

Loading