Skip to content

Implement --exclude-newer-than #13520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Implement --exclude-newer-than #13520

wants to merge 13 commits into from

Conversation

notatallshaw
Copy link
Member

@notatallshaw notatallshaw commented Aug 2, 2025

Closes #6257
Supplants #12717

Changes from the previous PR:

  • upload-before -> exclude-newer-than
  • Uses UTC by default so that is reproducible across environments
  • Pass to isolated build install
  • Adds tests

@notatallshaw
Copy link
Member Author

I've made some changes:

  • I've reverted to using local timezone based on @BurntSushi feedback
  • I've added help for users on how to explicitly specify a timezone
  • I am now passing --exclude-newer-than to the isolated build install command

Copy link
Member

@ichard26 ichard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my first review pass. I'll need to think about the larger design more, but I like where this is going. Thank you for working on this!

Comment on lines +307 to +311
upload_time: datetime.datetime | None
if upload_time_data := file_data.get("upload-time"):
upload_time = parse_iso_datetime(upload_time_data)
else:
upload_time = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that the JSON version of the Simple API is used in the vast majority of installs, but we should probably support this feature with the HTML Simple API if possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature doesn't exist in the HTML version of the API, it only exists as addition JSON fields in the spec: https://peps.python.org/pep-0700/#specification

For the HTML version of the API, there is no change from version 1.0. For the JSON version of the API, the following changes are made:

IMO this was a short sighted choice by the spec authors, as it has prevented certain simple index libraries from supporting this feature. But we can't change the past.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thanks for the information. /me wonders if it'd be useful to propose a v1.1 of the HTML spec to bring it to feature parity with the JSON API, but I don't have the time for that, haha.

Comment on lines +146 to +148
self,
options: Values,
session: PipSession,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems superfluous.

@@ -328,6 +329,7 @@ def _build_package_finder(
session: PipSession,
target_python: TargetPython | None = None,
ignore_requires_python: bool | None = None,
exclude_newer_than: datetime.datetime | None = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the history of this method, but it seems weird to accept options and then also pass specific options with their parameters. It would reduce the code churn if this method read the exclude_newer_than value from options.

I'm fine with either approach. I just find this inconsistent.

Comment on lines +62 to +82
@pytest.mark.parametrize(
"value, expected_check",
[
# Test with timezone info (should be preserved exactly)
(
"2023-01-01T00:00:00+00:00",
lambda dt: dt
== datetime.datetime(2023, 1, 1, 0, 0, 0, tzinfo=datetime.timezone.utc),
),
(
"2023-01-01T12:00:00-05:00",
lambda dt: (
dt
== datetime.datetime(
*(2023, 1, 1, 12, 0, 0),
tzinfo=datetime.timezone(datetime.timedelta(hours=-5)),
)
),
),
],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, I am not a fan of how Black explodes large collection literals. Could we instead only pass the expected datetime object via the decorator and perform the equality check in the test?

Comment on lines +124 to +131
(
expected_year,
expected_month,
expected_day,
expected_hour,
expected_minute,
expected_second,
) = expected_date_time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, I'd suggest something like this:

@pytest.mark.parametrize(
    "value, expected_date_time",
    [
        # Test basic ISO 8601 formats (timezone-naive, will get local timezone)
        ("2023-01-01T00:00:00", datetime(2023, 1, 1, 0, 0, 0)),
        ("2023-12-31T23:59:59", datetime(2023, 12, 31, 23, 59, 59)),
        # Test date only (will be extended to midnight)
        ("2023-01-01", datetime(2023, 1, 1, 0, 0, 0)),
    ],
)
def test_handle_exclude_newer_than_naive_dates(
    value: str, expected_dt: datetime
) -> None:
    assert result.year == expected_dt.year
    assert result.month == expected_dt.month
    assert result.day == expected_dt.day
    assert result.hour == expected_dt.hour
    assert result.minute == expected_dt.minute
    assert result.second == expected_dt.second

or

@pytest.mark.parametrize(
    "value, expected_date_time",
    [
        # Test basic ISO 8601 formats (timezone-naive, will get local timezone)
        ("2023-01-01T00:00:00", (2023, 1, 1, 0, 0, 0)),
        ("2023-12-31T23:59:59", (2023, 12, 31, 23, 59, 59)),
        # Test date only (will be extended to midnight)
        ("2023-01-01", (2023, 1, 1, 0, 0, 0)),
    ],
)
def test_handle_exclude_newer_than_naive_dates(
    value: str, expected_date_time: tuple[int, int, int, int, int, int]
) -> None:
    assert result.timetuple()[:6] == expected_date_time

("2023-01-01T00:00:00", (2023, 1, 1, 0, 0, 0)),
("2023-12-31T23:59:59", (2023, 12, 31, 23, 59, 59)),
# Test date only (will be extended to midnight)
("2023-01-01", (2023, 1, 1, 0, 0, 0)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but this means if I specify --exclude-newer-than 2025-08-14 (aka today), then pip will exclude candidates published on August 14, right? That behaves opposite to what I'd expect by the name --exclude-newer-than <day> -> exclude anything published after this day.

Should this not be extended to (2023, 1, 2, 0, 0, 0) or (2023, 1, 1, 23, 59, 59) ...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datetime 2025-08-14 is equivalent so 2025-08-14 00:00:00 in the same way the float 1 is equivalent to the float 1.000.

To me your suggestion is the same as saying "not larger than 1" and allowing 1.1 because it is less than 2.

I agree though on the boundary "exclude newer than 2025-08-14" should allow 2025-08-14 00:00:00, I'll take a look at that.

Happy to accept a better phrase than "exclude newer than", but no one seems to have been confused by uv's "exclude newer"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me your suggestion is the same as saying "not larger than 1" and allowing 1.1 because it is less than 2.

This is personal, but I don't read --exclude-newer-than like that. I read it like "ignore any packages published after <date/time>". exclude means ignore and than is exclusive (as is, it won't ignore... double negatives are hard).

The difference with uv's --exclude-newer is that the word "than" is missing, thus it's ambiguous whether it's inclusive or exclusive. I do agree that uv's option name looks too much like a boolean flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my explanation doesn't make sense. I agree with @ofek's opinion on the inclusive/exclusive question: #12717 (comment)

For better or worse, I do think it's important that we match uv in semantics. Otherwise, the two tools will function subtly differently which is bound to become a footgun. I guess that means I am most in favour of --uploaded-before. Fun 🙂

Copy link
Member Author

@notatallshaw notatallshaw Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good suggestion or strong option on the naming or semantics of the opinion, and am open to alternatives.

But I am strongly against mutating the user input or changing the logic to anything other than package datetime <= user datetime or exlusive <. It might not be semantically clear at first but it is logically simple.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. That leaves me at renaming the option as --uploaded-before and leaving the functionality unchanged. It's semantically clear and maintains the logical simplicity IMO (famous last words). cc @pfmoore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I agree with @ichard26 on the semantics of a user-supplied option. To me, --exclude-newer-than 2025-08-15 means "newer than today", i.e., the 16th or later. The problem here is that conceptually, "today" (or any date without a time component) represents a timespan, from midnight to midnight. That's not easy to model with Python's datetime objects, and it creates a weird distinction between 2025-08-15 and 2025-08-15T00:00:00, but no-one ever claimed UI was easy... 🙁

The right way, in my view, is to explicitly try to parse as a date object first, before parsing as a datetime. If the user input is a date, treat it as the end of the day, i.e., YYYY-MM-DD 23:59:59. If it's a datetime, assume the user was deliberately being precise, and do what they ask.

The alternative of naming the option --uploaded-before works for me, as well (as the ambiguity for date-only arguments doesn't exist there), although I can't say I like the option name as much.

I have no interest in conforming to uv's usage here, as IMO it's just wrong. I'd rather we implement correct behaviour, and then (if anyone cares enough) raise an issue request on uv pip install asking them to follow pip's semantics (on the basis that uv pip is supposed to be a pip compatibility layer).

Copy link
Member Author

@notatallshaw notatallshaw Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right way, in my view, is to explicitly try to parse as a date object first, before parsing as a datetime. If the user input is a date, treat it as the end of the day, i.e., YYYY-MM-DD 23:59:59. If it's a datetime, assume the user was deliberately being precise, and do what they ask.

In Python if I say my_datetime < datetime(2025, 8, 14) then datetime(2025, 8, 14, 6, 0, 0) is not a valid datetime, the same is true for SQL my_datetime < '2025-08-14.

And the underlying comparison is a datetime, not a date, so doing an implicit type conversion is going to create logical inconsistencies.

I'm happy to change the name, or I'm willing to disallow passing in a datetime without the time specified. I am strongly -1 on mutating the input or doing implicit type conversion.

Comment on lines +590 to +596
session = PipSession()
search_scope = SearchScope([], [], no_index=False)
link_collector = LinkCollector(session, search_scope)
selection_prefs = SelectionPreferences(
allow_yanked=False,
allow_all_prereleases=False,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if we extended make_test_finder() to support exclude_newer_than than create a finder manually. These finder unit tests don't seem too useful—although I do appreciate the effort you're putting in to make sure this feature well tested—so let's keep this as simple as it needs to be (as a smoke test).

Comment on lines +30 to +42
def test_exclude_newer_than_help_text(self, script: PipTestEnvironment) -> None:
"""Test that --exclude-newer-than appears in help text."""
result = script.pip("install", "--help")
assert "--exclude-newer-than" in result.stdout
assert "datetime" in result.stdout

@pytest.mark.parametrize("command", ["install", "download", "wheel"])
def test_exclude_newer_than_available_in_commands(
self, script: PipTestEnvironment, command: str
) -> None:
"""Test that --exclude-newer-than is available in relevant commands."""
result = script.pip(command, "--help")
assert "--exclude-newer-than" in result.stdout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I appreciate the extensive testing, but I will take issue with these tests. I see the value, but I do not think it is worth the test execution time (remember that functional tests are slow since they need to invoke new pip subprocesses) to double check our CLI help or we've added the option to the right commands. Especially since it's not going to help if we add a new command down the line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I purposefully added too many tests and then whittled them down, I missed this one.

) -> None:
"""Test exclude-newer functionality against real PyPI with upload times."""
# Use a small package with known old versions for testing
# requests 2.0.0 was released in 2013
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we pass --no-deps or use a package w/o dependencies? (e.g., six) Again, this is nit-picky, but I'd like to keep the tests as quick as reasonably possible. I know our test suite is by design not fast, but every little bit helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Install packages up to a certain date
6 participants