Skip to content

Commit

Permalink
Change backtracking when packages conflict too much (#9843)
Browse files Browse the repository at this point in the history
Background reading: #8157
Companion PR: astral-sh/pubgrub#36
Requires for test coverage: astral-sh/packse#230

When two packages A and B conflict, we have the option to choose a lower
version of A, or a lower version of B. Currently, we determine this by
the order we saw a package (assuming equal specificity of the
requirement): If we saw A before B, we pin A until all versions of B are
exhausted. This can lead to undesirable outcomes, from cases where it's
just slow (sentry) to others cases without lower bounds where be
backtrack to a very old version of B. This old version may fail to build
(terminating the resolution), or it's a version so old that it doesn't
depend on A (or the shared conflicting package) anymore - but also is
too old for the user's application (fastapi). #8157 collects such cases,
and the `wrong-backtracking` packse scenario contains a minimized
example.

We try to solve this by tracking which packages are "A"s, culprits, and
"B"s, affected, and manually interfering with project selection and
backtracking. Whenever a version we just chose is rejected, we give the
current package a counter for being affected, and the package it
conflicted with a counter for being a culprit. If a package accumulates
more counts than a threshold, we reprioritize: Undecided after the
culprits, after the affected, after packages that only have a single
version (URLs, `==<version>`). We then ask pubgrub to backtrack just
before the culprit. Due to the changed priorities, we now select package
B, the affected, instead of package A, the culprit.

To do this efficiently, we ask pubgrub for the incompatibility that
caused backtracking, or just the last version to be discarded (due to
its dependencies). For backtracking, we use the last incompatibility
from unit propagation as a heuristic. When a version is discarded
because one of its dependencies conflicts with the partial solution, the
incompatibility tells us the package in the partial solution that
conflicted.

We only backtrack once per package, on the first time it passes the
threshold. This prevents backtracking loops in which we make the same
decisions over and over again. But we also changed the priority, so that
we shouldn't take the same path even after the one time we backtrack (it
would defeat the purpose of this change).

There are some parameters that can be tweaked: Currently, the threshold
is set to 5, which feels not too eager with so me of the conflicts that
we want to tolerate but also changes strategies quickly. The relative
order of the new priorities can also be changed, as for each (A, B) pair
the priority of B is afterwards lower than that for A. Currently,
culprits capture conflict for the whole package, but we could limit that
to a specific version. We could discard conflict counters after
backtracking instead of keeping them eternally as we do now. Note that
we're always taking about pairs (A, B), but in practice we track
individual packages, not pairs.

A case that we wouldn't capture is when B is only introduced to the
dependency graph after A, but I think that would require cyclical
dependency for A and B to conflict? There may also be cases where
looking at the last incompatibility is insufficient.

Another example that we can't repair with prioritization is
urllib3/boto3/botocore: We actually have to check all the newer versions
of boto3 and botocore to identify the version that allows with the older
urllib3, no shortcuts allowed.

```
urllib3<1.25.4
boto3
```

All examples I tested were cases with two packages where we only had to
switch the order, so I've abstracted them into a single packse case.

This PR changes the resolution for certain paths, and there is the risk
for regressions.

Fixes #8157

---

All tested examples improved.

Input fastapi:
```text
starlette<=0.36.0
fastapi<=0.115.2
```

```
# BEFORE
$ uv pip --no-progress compile -p 3.11 --exclude-newer 2024-10-01 --no-annotate debug/fastapi.txt
annotated-types==0.7.0
anyio==4.6.0
fastapi==0.1.17
idna==3.10
pydantic==2.9.2
pydantic-core==2.23.4
sniffio==1.3.1
starlette==0.36.0
typing-extensions==4.12.2

# AFTER
$ cargo run --profile fast-build --no-default-features pip compile -p 3.11 --no-progress --exclude-newer 2024-10-01 --no-annotate debug/fastapi.txt 
annotated-types==0.7.0
anyio==4.6.0
fastapi==0.109.1
idna==3.10
pydantic==2.9.2
pydantic-core==2.23.4
sniffio==1.3.1
starlette==0.35.1
typing-extensions==4.12.2
```


Input xarray:
```text
xarray[accel]
```

```
# BEFORE
$ uv pip --no-progress compile -p 3.11 --exclude-newer 2024-10-01 --no-annotate debug/xarray-accel.txt
bottleneck==1.4.0
flox==0.9.13
llvmlite==0.36.0
numba==0.53.1
numbagg==0.8.2
numpy==2.1.1
numpy-groupies==0.11.2
opt-einsum==3.4.0
packaging==24.1
pandas==2.2.3
python-dateutil==2.9.0.post0
pytz==2024.2
scipy==1.14.1
setuptools==75.1.0
six==1.16.0
toolz==0.12.1
tzdata==2024.2
xarray==2024.9.0

# AFTER
$ cargo run --profile fast-build --no-default-features pip compile -p 3.11 --no-progress --exclude-newer 2024-10-01 --no-annotate debug/xarray-accel.txt
bottleneck==1.4.0
flox==0.9.13
llvmlite==0.43.0
numba==0.60.0
numbagg==0.8.2
numpy==2.0.2
numpy-groupies==0.11.2
opt-einsum==3.4.0
packaging==24.1
pandas==2.2.3
python-dateutil==2.9.0.post0
pytz==2024.2
scipy==1.14.1
six==1.16.0
toolz==0.12.1
tzdata==2024.2
xarray==2024.9.0
```


Input sentry: The resolution is identical, but arrived at much faster:
main tries 69 versions (sentry-kafka-schemas: 63), PR tries 12 versions
(sentry-kafka-schemas: 6; 5 times conflicting, then once the right
version).

```text
python-rapidjson<=1.20,>=1.4
sentry-kafka-schemas<=0.1.113,>=0.1.50
```

```
# BEFORE
$ uv pip --no-progress compile -p 3.11 --exclude-newer 2024-10-01 --no-annotate debug/sentry.txt
fastjsonschema==2.20.0
msgpack==1.1.0
python-rapidjson==1.8
pyyaml==6.0.2
sentry-kafka-schemas==0.1.111
typing-extensions==4.12.2

# AFTER
$ cargo run --profile fast-build --no-default-features pip compile -p 3.11 --no-progress --exclude-newer 2024-10-01 --no-annotate debug/sentry.txt
fastjsonschema==2.20.0
msgpack==1.1.0
python-rapidjson==1.8
pyyaml==6.0.2
sentry-kafka-schemas==0.1.111
typing-extensions==4.12.2
```


Input apache-beam
```text
# Run on Python 3.10
dill<0.3.9,>=0.2.2
apache-beam<=2.49.0
```

```
# BEFORE
$ uv pip --no-progress compile -p 3.10 --exclude-newer 2024-10-01 --no-annotate debug/apache-beam.txt
  × Failed to download and build `apache-beam==2.0.0`
  ╰─▶ Build backend failed to determine requirements with `build_wheel()` (exit status: 1)

# AFTER
$ cargo run --profile fast-build --no-default-features pip compile -p 3.10 --no-progress --exclude-newer 2024-10-01 --no-annotate debug/apache-beam.txt
apache-beam==2.49.0
certifi==2024.8.30
charset-normalizer==3.3.2
cloudpickle==2.2.1
crcmod==1.7
dill==0.3.1.1
dnspython==2.6.1
docopt==0.6.2
fastavro==1.9.7
fasteners==0.19
grpcio==1.66.2
hdfs==2.7.3
httplib2==0.22.0
idna==3.10
numpy==1.24.4
objsize==0.6.1
orjson==3.10.7
proto-plus==1.24.0
protobuf==4.23.4
pyarrow==11.0.0
pydot==1.4.2
pymongo==4.10.0
pyparsing==3.1.4
python-dateutil==2.9.0.post0
pytz==2024.2
regex==2024.9.11
requests==2.32.3
six==1.16.0
typing-extensions==4.12.2
urllib3==2.2.3
zstandard==0.23.0
```
  • Loading branch information
konstin authored Dec 16, 2024
1 parent 2b61a67 commit 431ddc1
Show file tree
Hide file tree
Showing 9 changed files with 332 additions and 54 deletions.
4 changes: 2 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ petgraph = { version = "0.6.5" }
platform-info = { version = "2.0.3" }
proc-macro2 = { version = "1.0.86" }
procfs = { version = "0.17.0", default-features = false, features = ["flate2"] }
pubgrub = { git = "https://github.com/astral-sh/pubgrub", rev = "57832d0588fbb7aab824813481104761dc1c7740" }
pubgrub = { git = "https://github.com/astral-sh/pubgrub", rev = "05e8d12cea8d72c6d2d017900e478d0abd28fef4" }
quote = { version = "1.0.37" }
rayon = { version = "1.10.0" }
reflink-copy = { version = "0.1.19" }
Expand Down Expand Up @@ -175,7 +175,7 @@ unicode-width = { version = "0.1.13" }
unscanny = { version = "0.1.0" }
url = { version = "2.5.2", features = ["serde"] }
urlencoding = { version = "2.1.3" }
version-ranges = { git = "https://github.com/astral-sh/pubgrub", rev = "57832d0588fbb7aab824813481104761dc1c7740" }
version-ranges = { git = "https://github.com/astral-sh/pubgrub", rev = "05e8d12cea8d72c6d2d017900e478d0abd28fef4" }
walkdir = { version = "2.5.0" }
which = { version = "7.0.0", features = ["regex"] }
windows-registry = { version = "0.3.0" }
Expand Down
14 changes: 9 additions & 5 deletions crates/uv-resolver/src/dependency_provider.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use std::convert::Infallible;

use pubgrub::{Dependencies, DependencyProvider, Range};
use pubgrub::{Dependencies, DependencyProvider, PackageResolutionStatistics, Range};

use uv_pep440::Version;

Expand All @@ -17,13 +17,17 @@ impl DependencyProvider for UvDependencyProvider {
type V = Version;
type VS = Range<Version>;
type M = UnavailableReason;
type Priority = Option<PubGrubPriority>;
type Err = Infallible;

fn prioritize(&self, _package: &Self::P, _range: &Self::VS) -> Self::Priority {
fn prioritize(
&self,
_package: &Self::P,
_range: &Self::VS,
_stats: &PackageResolutionStatistics,
) -> Self::Priority {
unimplemented!()
}
type Priority = Option<PubGrubPriority>;

type Err = Infallible;

fn choose_version(
&self,
Expand Down
112 changes: 104 additions & 8 deletions crates/uv-resolver/src/pubgrub/priority.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use std::cmp::Reverse;

use pubgrub::Range;
use rustc_hash::FxHashMap;
use std::cmp::Reverse;
use std::collections::hash_map::OccupiedEntry;

use crate::fork_urls::ForkUrls;
use uv_normalize::PackageName;
Expand Down Expand Up @@ -40,19 +40,22 @@ impl PubGrubPriorities {
match self.0.entry(name.clone()) {
std::collections::hash_map::Entry::Occupied(mut entry) => {
// Preserve the original index.
let index = match entry.get() {
PubGrubPriority::Unspecified(Reverse(index)) => *index,
PubGrubPriority::Singleton(Reverse(index)) => *index,
PubGrubPriority::DirectUrl(Reverse(index)) => *index,
PubGrubPriority::Root => next,
};
let index = Self::get_index(&entry).unwrap_or(next);

// Compute the priority.
let priority = if urls.get(name).is_some() {
PubGrubPriority::DirectUrl(Reverse(index))
} else if version.as_singleton().is_some() {
PubGrubPriority::Singleton(Reverse(index))
} else {
// Keep the conflict-causing packages to avoid loops where we seesaw between
// `Unspecified` and `Conflict*`.
if matches!(
entry.get(),
PubGrubPriority::ConflictEarly(_) | PubGrubPriority::ConflictLate(_)
) {
return;
}
PubGrubPriority::Unspecified(Reverse(index))
};

Expand All @@ -77,6 +80,17 @@ impl PubGrubPriorities {
}
}

fn get_index(entry: &OccupiedEntry<PackageName, PubGrubPriority>) -> Option<usize> {
match entry.get() {
PubGrubPriority::ConflictLate(Reverse(index))
| PubGrubPriority::Unspecified(Reverse(index))
| PubGrubPriority::ConflictEarly(Reverse(index))
| PubGrubPriority::Singleton(Reverse(index))
| PubGrubPriority::DirectUrl(Reverse(index)) => Some(*index),
PubGrubPriority::Root => None,
}
}

/// Return the [`PubGrubPriority`] of the given package, if it exists.
pub(crate) fn get(&self, package: &PubGrubPackage) -> Option<PubGrubPriority> {
match &**package {
Expand All @@ -88,6 +102,79 @@ impl PubGrubPriorities {
PubGrubPackageInner::Package { name, .. } => self.0.get(name).copied(),
}
}

/// Mark a package as prioritized by setting it to [`PubGrubPriority::ConflictEarly`], if it
/// doesn't have a higher priority already.
///
/// Returns whether the priority was changed, i.e., it's the first time we hit this condition
/// for the package.
pub(crate) fn mark_conflict_early(&mut self, package: &PubGrubPackage) -> bool {
let next = self.0.len();
let Some(name) = package.name_no_root() else {
// Not a correctness bug
if cfg!(debug_assertions) {
panic!("URL packages must not be involved in conflict handling")
} else {
return false;
}
};
match self.0.entry(name.clone()) {
std::collections::hash_map::Entry::Occupied(mut entry) => {
if matches!(
entry.get(),
PubGrubPriority::ConflictEarly(_) | PubGrubPriority::Singleton(_)
) {
// Already in the right category
return false;
};
let index = Self::get_index(&entry).unwrap_or(next);
entry.insert(PubGrubPriority::ConflictEarly(Reverse(index)));
true
}
std::collections::hash_map::Entry::Vacant(entry) => {
entry.insert(PubGrubPriority::ConflictEarly(Reverse(next)));
true
}
}
}

/// Mark a package as prioritized by setting it to [`PubGrubPriority::ConflictLate`], if it
/// doesn't have a higher priority already.
///
/// Returns whether the priority was changed, i.e., it's the first time this package was
/// marked as conflicting above the threshold.
pub(crate) fn mark_conflict_late(&mut self, package: &PubGrubPackage) -> bool {
let next = self.0.len();
let Some(name) = package.name_no_root() else {
// Not a correctness bug
if cfg!(debug_assertions) {
panic!("URL packages must not be involved in conflict handling")
} else {
return false;
}
};
match self.0.entry(name.clone()) {
std::collections::hash_map::Entry::Occupied(mut entry) => {
// The ConflictEarly` match avoids infinite loops.
if matches!(
entry.get(),
PubGrubPriority::ConflictLate(_)
| PubGrubPriority::ConflictEarly(_)
| PubGrubPriority::Singleton(_)
) {
// Already in the right category
return false;
};
let index = Self::get_index(&entry).unwrap_or(next);
entry.insert(PubGrubPriority::ConflictLate(Reverse(index)));
true
}
std::collections::hash_map::Entry::Vacant(entry) => {
entry.insert(PubGrubPriority::ConflictLate(Reverse(next)));
true
}
}
}
}

#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
Expand All @@ -101,6 +188,15 @@ pub(crate) enum PubGrubPriority {
/// in the dependency graph.
Unspecified(Reverse<usize>),

/// Selected version of this package were often the culprit of rejecting another package, so
/// it's deprioritized behind `ConflictEarly`. It's still the higher than `Unspecified` to
/// conflict before selecting unrelated packages.
ConflictLate(Reverse<usize>),

/// Selected version of this package were often rejected, so it's prioritized over
/// `ConflictLate`.
ConflictEarly(Reverse<usize>),

/// The version range is constrained to a single version (e.g., with the `==` operator).
Singleton(Reverse<usize>),

Expand Down
Loading

0 comments on commit 431ddc1

Please sign in to comment.