Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce resumable downloads with --resume-retries #12991

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

gmargaritis
Copy link

@gmargaritis gmargaritis commented Oct 4, 2024

Resolves #4796

Introduced the --resume-retries option in order to allow resuming incomplete downloads incase of dropped or timed out connections.

This option additionally uses the values specified for --retries and --timeout for each resume attempt, since they are passed in the session.

Used 0 as the default in order to keep backwards compatibility.

This PR is based on #11180

The downloader will make new requests and attempt to resume downloading using a Range header. If the initial response includes an ETag (preferred) or Date header, the downloader will ask the server to resume downloading only when it is safe (i.e., the file hasn't changed since the initial request) using an If-Range header.

If the server responds with a 200 (e.g. if the server doesn't support partial content or can't check if the file has changed), the downloader will restart the download (i.e. start from the very first byte); if the server responds with a 206 Partial Content, the downloader will resume the download from the partially downloaded file.

yichi-yang and others added 3 commits September 26, 2024 21:26
- Added —resume-retries option to allow resuming incomplete downloads
- Setting —resume-retries=N allows pip to make N attempts to resume downloading, in case of dropped or timed out connections
- Each resume attempt uses the values specified for —retries and —timeout internally

Signed-off-by: gmargaritis <[email protected]>
@gmargaritis
Copy link
Author

I'm guessing the CI fails because of the new linter rules introduced in 102d818

@thk686
Copy link

thk686 commented Oct 4, 2024

Does this do rsync-style checksums? That would increase reliability.

@notatallshaw
Copy link
Member

I'm guessing the CI fails because of the new linter rules introduced in 102d818

This is CI fix, failing until it's merged: #12964

@gmargaritis
Copy link
Author

Hey @notatallshaw 👋

Is there anything that I can do to move this one forward?

@notatallshaw
Copy link
Member

notatallshaw commented Dec 11, 2024

Is there anything that I can do to move this one forward?

A pip maintainer needs to take up the task of reviewing it, as we're all volunteers it's a matter of finding time.

I think my main concern would be the behavior when interacting with index servers that behave badly, e.g. give the wrong content length (usually 0). Your description looks good to me, but I haven't had time to look over the code yet.

@gmargaritis
Copy link
Author

A pip maintainer needs to take up the task of reviewing it, as we're all volunteers it's a matter of finding time.

Yeah, I know how it goes, so no worries!

If you need any clarifications or would like me to make changes, I'd be happy to help!

@art-ignatev
Copy link

any chances that it'll be merged soon?

@notatallshaw notatallshaw added this to the 25.1 milestone Feb 1, 2025
@notatallshaw
Copy link
Member

I've had an initial cursory glace at this PR and it appears to be sufficiently high quality.

I've also run the functionality locally (select a large wheel to download and then disconnect my WiFi midway through the download) and it has a good UX.

My main concern, although this is a ship that has probably sailed, is it would be nice for pip not to have to directly handle HTTP intricacies and leave that to a separate library.

I can’t promise a full review or other maintainers will agree, but I am adding it to the 25.1 milestone for it to be tracked.

@pfmoore
Copy link
Member

pfmoore commented Feb 1, 2025

The PR looks good, although I’m not a http expert so I can’t comment on details like status and header handling. Like @notatallshaw I wish we could leave this sort of detail to a 3rd party library, but that would be a major refactoring. Add this PR (along with cert handling, parallel downloads, etc) to the list of reasons we should consider such a refactoring, but in the meantime I’m in favour of adding this.

@pfmoore
Copy link
Member

pfmoore commented Feb 1, 2025

There isn’t an “approve with conditions” button, but I approve this change on the basis that someone who understands http should check the header and status handling.

@ichard26
Copy link
Member

ichard26 commented Feb 1, 2025

I'll tack this onto my to-do list. Not sure if I can call myself a HTTP expert, but I've done a fair bit of webdev as a hobby so I'm decently familiar with HTTP statuses and header handling.

Sorry for taking so long to review. Large PRs like these are appreciated since they do often implement major improvements, but they're also tedious to review and pretty daunting. Not really a good excuse, but that's how it feels. Thanks @notatallshaw for the initial pass and confirming this is worth the look.

@ichard26 ichard26 self-requested a review February 1, 2025 19:23
@gmargaritis
Copy link
Author

Awesome! Thank you for all your efforts!

Don’t worry about it, I know how it feels! Let me know if you need anything ✌️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Pip could resume download package at halfway the connection is poor
7 participants