-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linkcheck builder: close streamed HTTP response objects when no content reads are required #11318
linkcheck builder: close streamed HTTP response objects when no content reads are required #11318
Conversation
sphinx/builders/linkcheck.py
Outdated
@@ -318,6 +318,7 @@ def check_uri() -> tuple[str, str, int]: | |||
# Read the whole document and see if #anchor exists | |||
response = requests.get(req_url, stream=True, config=self.config, | |||
auth=auth_info, **kwargs) | |||
response.close() # no HTTP body reads required; close the response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the content is required here, as part of check_anchor
.
(so a context-manager approach might be more appropriate here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better explanation, because I forgot what was going on here:
In the general case, we want to ensure that response.raise_for_status
and response.close
are both called after receiving a response from the server. They provide error-checking and resource cleanup, respectively.
Most of the time, we can call close
as soon as the HTTP status line (200 OK, etc) is received.
The anchor-checking case (looking for a specific HTML ID within a documentation webpage) is different though: it doesn't seem to make sense to anchor-check on error responses (HTTP 4xx), even though theoretically the server could include HTML and the anchors we're looking for in the response. Documentation references shouldn't be served from pages that produce 4xx responses.
So in that case we want to call response.raise_for_status
first.
It might be possible to move the response.close
call into the finally
clause of the try...except
handler.
The context manager approach provided by the requests
library calls response.close
when the response
context-manager exits, though: that's convenient and could be easier than writing a finally
clause that has to handle multiple different arrival paths.
3115b20
to
9ec505f
Compare
Converting to draft status while learning more about the possible effects on local socket re-use (the linkchecker is likely to contact a host multiple times while checking URL/anchor validity, and so it'd be worthwhile to be cautious about changes here). |
So my understanding of this at the moment is:
Based on those, I do think that this is sensible, and I don't think that it should cause any regressions. Implementing #11324 at a later date could improve network-traffic utilisation; this pull request should instead only affect local client socket resource usage (and cleanup). Update: draft implementation of #11324 is available in #11340 (and builds upon this pull request). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
Side note, if we want to speedup linkcheck tests, we may replace links going to servers on the internet with going to the local HTTP server, its latency should be lower and very likely faster.
@AA-Turner this branch should resolve the linkcheck test flakiness short-term, until I can figure out and clear up the mess I'm making of #11340. |
…nse-content reads are required
…RNINGS configuration
d74c6d3
to
77da299
Compare
Thanks! |
Thank you, @AA-Turner! |
Feature or Bugfix
Purpose
ResourceWarning
output) during linkcheck build testsDetail
requests
doesn't know whether the caller may intend to read content from a streamed HTTP response object, it holds the socket openexplicitly closeinvoke each request using a context manager,since we know for certain that we are not going to read data from it in these casesto allow the corresponding__exit__
method to perform cleanupRelates